55:035 Computer Architecture and Organization
-
Upload
paul-pickett -
Category
Documents
-
view
30 -
download
0
description
Transcript of 55:035 Computer Architecture and Organization
![Page 1: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/1.jpg)
55:035 Computer Architecture and Organization
Lecture 9
![Page 2: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/2.jpg)
Outline Building a CPU
Basic Components MIPS Instructions Basic 5 Steps for CPU Single-Cycle Design Multi-cycle Design Comparison of Single and Multi-cycle Designs
255:035 Computer Architecture and Organization
![Page 3: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/3.jpg)
Overview Brief look
Digital logic
CPU Datapath MIPS Example
355:035 Computer Architecture and Organization
![Page 4: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/4.jpg)
Digital Logic
D Q
D-type Flip-flop
Clock(edge-triggered)
S (Select input)
A
BF
0
1
Multiplexer
D-type Flip-flop with Enable
Clock(edge-triggered)
D QEN
0
1D Q
DQ
EN(enable)
Clock(edge-triggered)
455:035 Computer Architecture and Organization
![Page 5: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/5.jpg)
Digital Logic
1 Bit
D Q
Clock(edge-triggered)
EN
4 Bits
Clock(edge-triggered)
D3 Q3
EN
D2 Q2D1 Q1D0 Q0
Registers
N Bits
D Q
Clock(edge-triggered)
EN
555:035 Computer Architecture and Organization
![Page 6: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/6.jpg)
Digital Logic
outin
drive
Tri-state Driver (Buffer)In Drive Out
0 0 Z
1 0 Z
0 1 0
1 1 1
What is Z ??
655:035 Computer Architecture and Organization
![Page 7: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/7.jpg)
Digital Logic
Adder/Subtractor or ALU
A B
F
Carry-out
Add/sub or ALUop
Carry-in
755:035 Computer Architecture and Organization
![Page 8: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/8.jpg)
Overview Brief look
Digital logic
How to Design a CPU Datapath MIPS Example
855:035 Computer Architecture and Organization
![Page 9: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/9.jpg)
Designing a CPU: 5 Steps Analyze the instruction set datapath requirements
MIPS: ADD, SUB, ORI, LW, SW, BR Meaning of each instruction given by RTL (register transfers) 2 types of registers: CPU/ISA registers, temporary registers
Datapath requirements select the datapath components ALU, register file, adder, data memory, etc
Assemble the datapath Datapath must support planned register transfers Ensure all instructions are supported
Analyze datapath control required for each instruction Assemble the control logic
955:035 Computer Architecture and Organization
![Page 10: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/10.jpg)
Step 1a: Analyze ISA All MIPS instructions are 32 bits long. Three instruction formats:
R-type
I-type
J-type
R: registers, I: immediate, J: jumps These formats intentionally chosen to simplify design
op target address
02631
6 bits 26 bits
op rs rt rd shamt funct
061116212631
6 bits 6 bits5 bits5 bits5 bits5 bits
op rs rt immediate
016212631
6 bits 16 bits5 bits5 bits
1055:035 Computer Architecture and Organization
![Page 11: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/11.jpg)
Step 1b: Analyze ISA
Meaning of the fields: op: operation of the instruction rs, rt, rd: the source and destination register specifiers
Destination is either rd (R-type), or rt (I-type) shamt: shift amount funct: selects the variant of the operation in the “op” field immediate: address offset or immediate value target address: target address of the jump instruction
op target address02631
6 bits 26 bits
op rs rt rd shamt funct061116212631
6 bits 6 bits5 bits5 bits5 bits5 bits
op rs rt immediate016212631
6 bits 16 bits5 bits5 bits
R-type
I-type
J-type
1155:035 Computer Architecture and Organization
![Page 12: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/12.jpg)
MIPS ISA: subset for today ADD and SUB
addU rd, rs, rt subU rd, rs, rt
OR Immediate: ori rt, rs, imm16
LOAD and STORE Word lw rt, rs, imm16 sw rt, rs, imm16
BRANCH: beq rs, rt, imm16
op rs rt rd shamt funct
061116212631
6 bits 6 bits5 bits5 bits5 bits5 bits
op rs rt immediate
016212631
6 bits 16 bits5 bits5 bits
op rs rt immediate
016212631
6 bits 16 bits5 bits5 bits
op rs rt immediate
016212631
6 bits 16 bits5 bits5 bits
1255:035 Computer Architecture and Organization
![Page 13: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/13.jpg)
Step 2: Datapath RequirementsREGISTER FILE
MIPS ISA requires 32 registers, 32b each
Called a register file Contains 32 entries Each entry is 32b
AddU rd,rs,rt or SubU rd,rs,rt Read two sources rs, rt Operation rs + rt or rs – rt Write destination rd ← rs+/-rt
Requirements Read two registers (rs, rt) Perform ALU operation Write a third register (rd)
RdReg1
RdReg2
WrReg
WrData
RdData1
RdData2
RegWrite
REGFILE
RegisterNumbers(5 bits ea)
How toimplement?
ALU
ALUop
Result
Zero?
1355:035 Computer Architecture and Organization
![Page 14: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/14.jpg)
Step 3: Datapath Assembly ADDU rd, rs, rt SUBU rd, rs, rt
Need an ALU Hook it up to REGISTER FILE REGFILE has 2 read ports (rs,rt), 1 write port (rd)
rsParametersCome FromInstructionFields
rt
rd
Control Signals DependUpon Instruction Fields
Eg:ALUop = f(Instruction) = f(op, funct)
RdReg1
RdReg2
WrReg
WrData
RdData1
RdData2
RegWrite
REGFILE
ALU
ALUop
Result
Zero?
1455:035 Computer Architecture and Organization
![Page 15: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/15.jpg)
Steps 2 and 3: ORI Instruction ORI rt, rs, Imm16
Need new ALUop for ‘OR’ function, hook up to REGFILE 1 read port (rs), 1 write port (rt), 1 const value (Imm16)
rs
FromInstruction
rt
rt rdX
RdReg1
RdReg2
WrReg
WrData
RdData1
RdData2
RegWrite
REGFILE
ZERO-EXTEND
ALU
ALUop
Result
Zero?
16-bitsImm16
ALUsrc
0
1Control SignalsDepend UponInstruction Fields
E.g.:ALUsrc = f(Instruction) = f(op, funct)
1555:035 Computer Architecture and Organization
![Page 16: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/16.jpg)
Steps 2 and 3 Destination Register Must select proper destination, rd or rt
Depends on Instruction Type R-type may write rd I-type may write rt
FromInstruction
RdReg1
RdReg2
WrReg
WrData
RdData1
RdData2
REGFILE
rs
rt
rd
ZERO-EXTEND
ALU
ALUop
Result
Zero?
ALUsrc
0
1
RegDst
1
0
16-bitsImm16
RegWrite
1655:035 Computer Architecture and Organization
![Page 17: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/17.jpg)
Steps 2 and 3: Load Word LW rt, rs, Imm16
Need Data Memory: data ← Mem[Addr] Addr is rs+Imm16, Imm16 is signed, use ALU for +
Store in rt: rt ← Mem[rs+Imm16]
RdReg1
RdReg2
WrRegWrData
RdData1
RdData2REGFILE
rs
rt
rd
SIGN/ZERO-
EXTEND
ALU
ALUop
Result
Zero?
ALUsrc
0
1
RegDst
1
0
Imm16
RegWrite
AddrRdData
MemtoReg
0
1
DATAMEM
ExtOp
1755:035 Computer Architecture and Organization
![Page 18: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/18.jpg)
Steps 2 and 3: Store Word SW rt, rs, Imm16
Need Data Memory: Mem[Addr] ← data Addr is rs+Imm16, Imm16 is signed, use ALU for +
Store in Mem: Mem[rs+Imm16] ← rt
RdReg1
RdReg2
WrReg
WrData
RdData1
RdData2
REGFILE
rs
rt
rd
SIGN/ZERO-
EXTEND
ALU
ALUop
Result
Zero?
ALUsrc
0
1
RegDst
1
0
Imm16
RegWrite
AddrRdData
WrData
MemtoReg
1
0
DATAMEM
ExtOp
MemWrite
1855:035 Computer Architecture and Organization
![Page 19: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/19.jpg)
Writes: Need to Control Timing Problem: write to data memory
Data can come anytime Addr must come first MemWrite must come after Addr
Else? writes to wrong Addr!
Solution: use ideal data memory Assume everything works ok How to fix this for real? One solution: synchronous memory Another solution: delay MemWr to come late
Problems?: write to register file Does RegWrite signal come after WrReg number? When does the write to a register happen? Read from same register as being written?
1955:035 Computer Architecture and Organization
![Page 20: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/20.jpg)
Missing Pieces: Instruction Fetching Where does the Instruction come from?
From instruction memory, of course!
Recall: stored-program concept Alternatives? How about hard-coding wires and switches…? This
is how ENIAC was programmed!
How to branch? BEQ rs, rt, Imm16
2055:035 Computer Architecture and Organization
![Page 21: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/21.jpg)
Instruction Processing Fetch instruction Execute instruction
Fetch next instruction Execute next instruction
Fetch next instruction Execute next instruction
Etc…
How to maintain sequence? Use a counter! Branches (out of sequence) ? Load the counter!
2155:035 Computer Architecture and Organization
![Page 22: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/22.jpg)
Instruction Processing Program Counter
Points to current instruction
Address to instruction memory Instr ← InstrMem[PC]
Next instruction: counts up by 4 Remember: memory is byte-addressable, instructions are 4 bytes
PC ← PC + 4
Branch instruction: replace PC contents
2255:035 Computer Architecture and Organization
![Page 23: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/23.jpg)
Step 1: Analyze Instructions Register Transfer Language…
op | rs | rt | rd | shamt | funct = InstrMem[ PC ]
op | rs | rt | Imm16 = InstrMem[ PC ]
Instr Register Transfers
ADDU R[rd] ← R[rs] + R[rt]; PC ← PC + 4
SUBU R[rd] ← R[rs] – R[rt]; PC ← PC + 4
ORI R[rt] ← R[rs] + zero_ext(Imm16); PC ← PC + 4
LOAD R[rt] ← MEM[ R[rs] + sign_ext(Imm16)]; PC ← PC + 4
STORE MEM[ R[rs] + sign_ext(Imm16) ] ← R[rt]; PC ← PC + 4
BEQ if ( R[rs] == R[rt] ) then PC ← PC + 4 + { sign_ext(Imm16)] || b’00’ } else
PC ← PC + 42355:035 Computer Architecture and Organization
![Page 24: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/24.jpg)
Steps 2 and 3: Datapath & Assembly
PC: a register Counter, counts by +4 Provides address to Instruction Memory
Add
Readaddress
InstructionMemory
Instruction[31:0]
PC
Instruction[31:0]
4
2455:035 Computer Architecture and Organization
![Page 25: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/25.jpg)
Steps 2 and 3: Datapath & Assembly
Add AddAdd
result
Readaddress
InstructionMemory
Instruction[31:0]
PC
0Mux1
Sign/Zero
Extend
Instruction[25:21]
Instruction[20:16]
Instruction[15:11]
Instruction[15:0] (Imm16)
16 32
PCSrcShiftLeft 2
4
PC: a register Counter, counts by +4 Sometimes, must add
SignExtend{Imm16||b’00’} for branch instructionsNote: the sign-extender for Imm16
is already in the datapath(everything else is new)
ExtOp25
![Page 26: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/26.jpg)
Steps 2 and 3: Add Previous Datapath
Add Add
ALU
Addresult
ALUresult
Zero
Readaddress
InstructionMemory
Instruction[31:0]
RegisterFile
DataMemory
PC
Addr-ess
Readdata
Writedata
0Mux1
1Mux0
0Mux1
0Mux1
ALUControl
Sign/Zero
Extend
Writereg.
Readreg. 1
Readreg. 2
Readdata 2
Readdata 1
Writedata
Instruction[25:21]
Instruction[20:16]
Instruction[15:11]
Instruction[15:0] (Imm16)
Instruction[5:0] (funct)
16 32
RegWrite
RegDst
ALUSrc
MemWrite
PCSrc
MemtoReg
ALUOp
ShiftLeft 2
4
ExtOp
![Page 27: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/27.jpg)
What have we done? Created a simple CPU datapath
Control still missing (next slide)
Single-cycle CPU Every instruction takes 1 clock cycle Clocking ?
2755:035 Computer Architecture and Organization
![Page 28: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/28.jpg)
One Clock Cycle Clock Locations
PC, REGFILE have clocks
Operation On rising edge, PC will get new value
Maybe REGFILE will have one value updated as well After rising edge
PC and REGFILE can’t change New value out of PC Instruction out of INSTRMEM Instruction selects registers to read from REGFILE Instruction controls ALUop, ALUsrc, MemWrite, ExtOp, etc ALU does its work DataMem may be read (depending on instruction) Result value goes back to REGFILE New PC value goes back to PC Await next clock edge
Lots to do in only1 clockcycle !!
2855:035 Computer Architecture and Organization
![Page 29: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/29.jpg)
Missing Steps? Control is missing (Steps 4 and 5 we mentioned earlier)
Generate the green signals ALUsrc, MemWrite, MemtoReg, PCSrc, RegDst, etc
These are all f(Instruction), where f() is a logic expression Will look at control strategies in upcoming lecture
Implementation Details How to implement REGFILE?
Read port: tristate buffers? Multiplexer? Memory? Two read ports: two of above? Write port: how to write only 1 register?
How to control writes to memory? To register file?
More instructions Shift instructions Jump instruction Etc
2955:035 Computer Architecture and Organization
![Page 30: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/30.jpg)
1-Cycle CPU Datapath
Add Add
ALU
Addresult
ALUresult
Zero
Readaddress
InstructionMemory
Instruction[31:0]
RegisterFile
DataMemory
PC
Addr-ess
Readdata
Writedata
0Mux1
1Mux0
0Mux1
0Mux1
ALUControl
Sign/Zero
Extend
Writereg.
Readreg. 1
Readreg. 2
Readdata 2
Readdata 1
Writedata
Instruction[25:21]
Instruction[20:16]
Instruction[15:11]
Instruction[15:0] (Imm16)
Instruction[5:0] (funct)
16 32
RegWrite
RegDst
ALUSrc
MemWrite
PCSrc
MemtoReg
ALUOp
ShiftLeft 2
4
ExtOp
![Page 31: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/31.jpg)
1-cycle CPU Datapath + Control
PCSrc
Instruction[25:21]
Instruction[20:16]
Instruction[15:11]
Instruction[15:0]
Instruction[5:0]
Instruction[31:26]
Sign/Zero
Extend
DataMemory
Addr-ess
Readdata
Writedata
ALUALU
result
Zero
Readaddress
InstructionMemory
Instruction[31:0]
Add
PC
4
AddAdd
resultShiftLeft 2
RegisterFile
Writereg.
Readreg. 1
Readreg. 2
Readdata 2
Readdata 1
Writedata
RegDst
BranchMemReadMemtoRegALUOpMemWriteALUSrcRegWrite
ALUcontrol
Con-trol
![Page 32: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/32.jpg)
Input or Output Signal Name R-format Lw Sw Beq
Inputs
Op5 0 1 1 0
Op4 0 0 0 0
Op3 0 0 1 0
Op2 0 0 0 1
Op1 0 1 1 0
Op0 0 1 1 0
Outputs
RegDst 1 0 X X
ALUSrc 0 1 1 0
MemtoReg 0 1 X X
RegWrite 1 1 0 0
MemRead 0 1 0 0
MemWrite 0 0 1 0
Branch 0 0 0 1
ALUOp1 1 0 0 0
ALUOp0 0 0 0 1
Also: I-type instructions (ORI) & ExtOp (sign-extend control), etc.
1-cycle CPU Control – Lookup Table
![Page 33: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/33.jpg)
1-cycle CPU + Jump Instruction
Instruction[31:26]
Instruction[25:0]
PC + 4 [31..28]
Jump address [31..0]
Instruction[25:21]
Instruction[20:16]
Instruction[15:11]
Instruction[15:0]
Instruction[5:0]
![Page 34: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/34.jpg)
1-cycle CPU Problems? Every instruction 1 cycle Some instructions “do more work”
Eg, lw must read from DATAMEM All instructions must have same clock period…
Many instructions run slower than necessary
Tricky timing on MemWrite, RegWrite(?) signals Write signal must come *after* address is stable
Need extra resources… PC+4 adder, ALU for BEQ instruction, DATAMEM+INSTRMEM
3455:035 Computer Architecture and Organization
![Page 35: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/35.jpg)
Performance! Single-Cycle CPU Performance
Execute one instruction per clock cycle (CPI=1) Clock cycle time? Note dataflow includes:
INSTRMEM read REGFILE access Sign extension ALU operation DATAMEM read REGFILE/PC write
Not every instruction uses all resources (eg, DATAMEM read) Can we change clock period for each instruction?
No! (Why not?) One clock period: the worst case! This is why a single-cycle CPU is not good for performance
3555:035 Computer Architecture and Organization
![Page 36: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/36.jpg)
1-cycle CPU Datapath + Controller
Instruction[31:26]
Instruction[25:0]
PC + 4 [31..28]
Jump address [31..0]
Instruction[25:21]
Instruction[20:16]
Instruction[15:11]
Instruction[15:0]
Instruction[5:0]
![Page 37: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/37.jpg)
1-cycle CPU Summary Operation
1 cycle per instruction Control signals held fixed during entire cycle (except BRANCH) Only 2 registers
PC, updated every clock cycle REGFILE, updated when required
During clock cycle, data flows from register-outputs to register-inputs Fixed clock frequency / period
Performance 1 instruction per cycle Slowest instruction determines clock frequency
Outstanding issue: MemWrite timing Assume this signal writes to memory at end of clock cycle
3755:035 Computer Architecture and Organization
![Page 38: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/38.jpg)
Multi-cycle CPU Goals Improve performance
Break each instruction into smaller steps / multiple cycles LW instruction 5 cycles SW instruction 4 cycles R-type instruction 4 cycles Branch, Jump 3 cycles
Aim for 5x clock frequency Complex instructions (eg, LW) 5 cycles same performance as before Simple instructions (eg, ADD) fewer cycles faster
Save resources (gates/transistors) Re-use ALU over multiple cycles Put INSTR + DATA in same memory
MemWrite timing solved?
3855:035 Computer Architecture and Organization
![Page 39: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/39.jpg)
Multi-cycle CPU Datapath
Instruction[25:21]
Instruction[20:16]
Instruction[15:11]
Instruction[15:0]
Instruction[5:0]
Instr[15:0]
InstructionRegister
MemoryData
Register
ALUOut
A
B
MemoryMemData
Address
Writedata
Registers
RdData1
RdData2
RdReg2
RdReg1
Writereg
Writedata
Add multiplexers + control signals (IorD, MemtoReg, ALUSrcA, ALUSrcB) Move signal paths (+4, Shift Left 2)
4
ShiftLeft 2
SignExtend
PC
Mux
Mux
ALU
ALUresult
Zero
Mux
Mux
Mux
![Page 40: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/40.jpg)
Multi-cycle CPU Datapath
Instruction[25:21]
Instruction[20:16]
Instruction[15:11]
Instruction[15:0]
Instruction[5:0]
Instr[15:0]
ALUOut
A
B
MemoryMemData
Address
Writedata
Registers
RdData1
RdData2
RdReg2
RdReg1
Writereg
Writedata
Add registers + control signals (IR, MDR, A, B, ALUOut) Registers with no control signal load value every clock cycle (eg, PC)
4
ShiftLeft 2
SignExtend
PC
Mux
Mux
ALU
ALUresult
Zero
Mux
Mux
Mux
InstructionRegister
MemoryData
Register
![Page 41: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/41.jpg)
Instruction Execution Example Execute a “Load Word” instruction
LW rt, 0(rs)
5 Steps1. Fetch instruction2. Read registers3. Compute address4. Read data5. Write registers
4155:035 Computer Architecture and Organization
![Page 42: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/42.jpg)
Load Word Instruction Sequence
1. Fetch InstructionInstructionRegister ← Mem[PC]
Instruction[25:21]
Instruction[20:16]
Instruction[15:11]
Instruction[5:0]
Instr[15:0]
ALUOut
A
BWritedata
Registers
RdData1
RdData2
RdReg2
RdReg1
Writereg
Writedata
4
ShiftLeft 2
SignExtend
PC
Mux
Mux
ALU
ALUresult
Zero
Mux
Mux
Mux
InstructionRegister
MemoryData
Register
Instruction[15:0]
MemoryMemData
Address
![Page 43: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/43.jpg)
Load Word Instruction Sequence
2. Read RegistersA ← Registers[Rs]
Instruction[20:16]
Instruction[15:11]
Instruction[15:0]
Instruction[5:0]
Instr[15:0]
ALUOut
A
B
MemoryMemData
Address
Writedata
Registers
RdData2
RdReg2
Writereg
Writedata
4
ShiftLeft 2
SignExtend
PC
Mux
Mux
ALU
ALUresult
Zero
Mux
Mux
Mux
InstructionRegister
MemoryData
Register
Instruction[25:21]
RdData1
RdReg1
![Page 44: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/44.jpg)
Load Word Instruction Sequence
3. Compute AddressALUOut ← A + {SignExt(Imm16),b’00’}
Instruction[25:21]
Instruction[20:16]
Instruction[15:0]
Instruction[5:0]
Instr[15:0]
B
MemoryMemData
Address
Writedata
Registers
RdData1
RdData2
RdReg2
RdReg1
Writereg
Writedata
4
ShiftLeft 2
SignExtend
PC
Mux
Mux
ALU
ALUresult
Zero
Mux
Mux
Mux
InstructionRegister
MemoryData
Register
Instruction[15:11]
ALUOut
A
![Page 45: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/45.jpg)
Load Word Instruction Sequence
4. Read DataMDR ← Memory[ALUOut]
Instruction[25:21]
Instruction[20:16]
Instruction[15:11]
Instruction[15:0]
Instruction[5:0]
Instr[15:0]
A
BWritedata
Registers
RdData1
RdData2
RdReg2
RdReg1
Writereg
Writedata
4
ShiftLeft 2
SignExtend
PC
Mux
Mux
ALU
ALUresult
Zero
Mux
Mux
Mux
InstructionRegister
MemoryData
Register
ALUOut
MemoryMemData
Address
![Page 46: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/46.jpg)
Load Word Instruction Sequence
5. Write RegistersRegisters[Rt] ← MDR
Instruction[25:21]
Instruction[20:16]
Instruction[15:11]
Instruction[15:0]
Instruction[5:0]
Instr[15:0]
ALUOut
A
B
MemoryMemData
Address
Writedata
Registers
RdData1
RdData2
RdReg2
RdReg1
4
ShiftLeft 2
SignExtend
PC
Mux
Mux
ALU
ALUresult
Zero
Mux
Mux
Mux
InstructionRegister
MemoryData
Register
Writereg
Writedata
![Page 47: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/47.jpg)
Load Word Instruction Sequence
All 5 Steps Shown
Instruction[5:0]
Instr[15:0]
BWritedata
Registers
RdData2
RdReg2
4
ShiftLeft 2
SignExtend
PC
Mux
Mux
ALU
ALUresult
Zero
Mux
Mux
Mux
InstructionRegister
MemoryData
Register
Instruction[25:21]
Instruction[20:16]
Instruction[15:11]
Instruction[15:0]
ALUOut
MemoryMemData
AddressRdData1
RdReg1
Writereg
Writedata
A
![Page 48: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/48.jpg)
Multi-cycle Load Word: Recap1. Fetch Instruction InstructionRegister ← Mem[PC]
2. Read Registers A ← Registers[Rs]
3. Compute Address ALUOut ← A + {SignExt(Imm16)}
4. Read Data MDR ← Memory[ALUOut]
5. Write Registers Registers[Rt] ← MDR
Missing Steps?
4855:035 Computer Architecture and Organization
![Page 49: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/49.jpg)
Multi-cycle Load Word: Recap1. Fetch Instruction InstructionRegister ← Mem[PC]; PC ← PC + 4
2. Read Registers A ← Registers[Rs]
3. Compute Address ALUOut ← A + {SignExt(Imm16)}
4. Read Data MDR ← Memory[ALUOut]
5. Write Registers Registers[Rt] ← MDR
Missing Steps? Must increment the PC Do it as part of the instruction fetch (in step 1) Need PCWrite control signal
4955:035 Computer Architecture and Organization
![Page 50: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/50.jpg)
Multi-cycle R-Type Instruction1. Fetch Instruction InstructionRegister ← Mem[PC]; PC ← PC + 4
2. Read Registers A ← Registers[Rs]; B ← Registers[Rt]
3. Compute Value ALUOut ← A op B
4. Write Registers Registers[Rd] ← ALUOut
RTL describes data flow action in each clock cycle Control signals determine precise data flow Each step implies unique control values
5055:035 Computer Architecture and Organization
![Page 51: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/51.jpg)
Multi-cycle R-Type Instruction: Control Signal Values1. Fetch Instruction InstructionRegister ← Mem[PC]; PC ← PC + 4
MemRead=1, ALUSrcA=0, IorD=0, IRWrite, ALUSrcB=01, ALUop=00, PCWrite, PCSource=00
2. Read Registers A ← Registers[Rs]; B ← Registers[Rt]ALUSrcA=0, ALUSrcB=11, ALUop=00
3. Compute Value ALUOut ← A op BALUSrcA=1, ALUSrcB=00, ALUop=10
4. Write Registers Registers[Rd] ← ALUOutRegDst=1, RegWrite, MemtoReg=0
Each step implies unique control values Fixed for entire cycle “Default value” implied if unspecified
5155:035 Computer Architecture and Organization
![Page 52: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/52.jpg)
Check Your Work – Is RTL Valid ? 1. Datapath check
Within one cycle… Each cycle has valid data flow path (path exists) Each register gets only one new value
Across multiple cycles… Register value is defined before use in previous (earlier in time) clock cycle
Eg, “A 3” must occur before “B A” Make sure register value doesn’t disappear if set >1 cycle earlier
2. Control signal check Each cycle, RTL describing the datapath flow implies a value for each control
signal 0 or 1 or default or don’t care
Each control signal gets only one fixed value the entire cycle
3. Overall check Does the sequence of steps work ?
5255:035 Computer Architecture and Organization
![Page 53: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/53.jpg)
Multi-cycle BEQ Instruction
1. Fetch InstructionInstructionRegister ← Mem[PC]; PC ← PC + 4
2. Read Registers, Precompute TargetA ← Registers[Rs] ; B ← Registers[Rt] ; ALUOut ← PC + {SignExt{Imm16},b’00’}
3. Compare Registers, Conditional Branchif( (A – B) ==0 ) PC ← ALUOut
Green shows PC calculation flow (in parallel with other operations)
5355:035 Computer Architecture and Organization
![Page 54: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/54.jpg)
Multi-cycle Datapath with Control Signals
Instr[25:21]
Instr[20:16]
Instr[15:0]
Instr[15:0]
Instruction[5:0]
In[15:11]
Instr[25:0]
PC[31..28]
Jumpaddress
[31..0]
PCWrite
IorDMemRead
MemWrite
MemtoReg
IRWritePCSrc
ALUOp
ALUSrcA
ALUSrcB
RegWrite
RegDst
ALUControl
5455:035 Computer Architecture and Organization
![Page 55: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/55.jpg)
Multi-cycle Datapath with Controller
Instr.[31:26]
Instr[31:26]
Instr[25:21]
Instr[20:16]
Instr[15:0]
Instr[15:0]
Instruction[5:0]
In[15:11]
Instr[25:0]
PC[31..28]
Jumpaddress
[31..0]
![Page 56: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/56.jpg)
Multi-cycle BEQ Instruction
1. Fetch InstructionInstructionRegister ← Mem[PC]; PC ← PC + 4
2. Read Registers, Precompute TargetA ← Registers[Rs] ; B ← Registers[Rt] ; ALUOut ← PC + {SignExt{Imm16},b’00’}
3. Compare Registers, Conditional Branchif( (A – B) ==0 ) PC ← ALUOut
Green shows PC calculation flow (in parallel with other operations)
5655:035 Computer Architecture and Organization
![Page 57: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/57.jpg)
Multi-cycle Datapath with Control Signals
Instr[25:21]
Instr[20:16]
Instr[15:0]
Instr[15:0]
Instruction[5:0]
In[15:11]
Instr[25:0]
PC[31..28]
Jumpaddress
[31..0]
PCWrite
IorDMemRead
MemWrite
MemtoReg
IRWritePCSrc
ALUOp
ALUSrcA
ALUSrcB
RegWrite
RegDst
ALUControl
5755:035 Computer Architecture and Organization
![Page 58: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/58.jpg)
Multi-cycle Datapath with Controller
Instr.[31:26]
Instr[31:26]
Instr[25:21]
Instr[20:16]
Instr[15:0]
Instr[15:0]
Instruction[5:0]
In[15:11]
Instr[25:0]
PC[31..28]
Jumpaddress
[31..0]
![Page 59: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/59.jpg)
Multi-cycle CPU Control: Overview
General approach: Finite State Machine (FSM) Need details in each branch of control…
Precise outputs for each state (Mealy depends on inputs, Moore does not) Precise “next state” for each state (can depend on inputs)
ControlSignalOutputs
ControlSignalOutputs
5955:035 Computer Architecture and Organization
![Page 60: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/60.jpg)
How to Implement FSM ? Manually with logic gates + FFs
Bubble diagram, next-state table, state assignment Karnaugh map for each state bit, each output bit (painful!)
High-level language description (eg, Verilog, VHDL) Describe FSM bubble diagram (next-states, output values) Automatically synthesized into gates + FFs
Microcode (µ-code) description Sequence through many µ-ops for each CPU instruction
One µ-op (µ-instruction) sends correct control signal for 1 cycle µ-op similar to one bubble in FSM
Acts like a mini-CPU within a CPU µPC: microcode program counter Microcode storage memory contains µ-ops
Can look similar to RTL or some new “assembly language”
6055:035 Computer Architecture and Organization
![Page 61: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/61.jpg)
FSM Specification: Bubble Diagram
Can build thisby examiningRTL
It is possible toautomaticallyconvert RTLinto this form !
61
![Page 62: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/62.jpg)
FSM: Gates + FFs Implementation
FSMHigh-level
Organization
6255:035 Computer Architecture and Organization
![Page 63: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/63.jpg)
FSM: Microcode Implementation
Adder
1
Datapathcontroloutputs
Sequencingcontrol
Inputs from instructionregister opcode field
MicrocodeStorage
(memory)
Inputs
Outputs
Microprogram Counter
Address Select Logic
6355:035 Computer Architecture and Organization
![Page 64: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/64.jpg)
Multi-cycle CPU with Control FSM
Instr.[31:26]
Instr[31:26]
Instr[25:21]
Instr[20:16]
Instr[15:0]
Instr[15:0]
Instruction[5:0]
In[15:11]
Instr[25:0]
PC[31..28]
Jumpaddress
[31..0]
FSMControlOutputs
ConditionalBranch
![Page 65: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/65.jpg)
Control FSM: Overview
General approach: Finite State Machine (FSM) Need details in each branch of control…
6555:035 Computer Architecture and Organization
![Page 66: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/66.jpg)
Detailed FSM
66
![Page 67: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/67.jpg)
Detailed FSMInstruction
Fetch
MemoryReference
Branch JumpR-Type
67
![Page 68: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/68.jpg)
Detailed FSM: Instruction Fetch
6855:035 Computer Architecture and Organization
![Page 69: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/69.jpg)
Detailed FSM: Memory Reference
LW SW
69
![Page 70: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/70.jpg)
Detailed FSM: R-Type Instruction
7055:035 Computer Architecture and Organization
![Page 71: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/71.jpg)
Detailed FSM: Branch Instruction
7155:035 Computer Architecture and Organization
![Page 72: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/72.jpg)
Detailed FSM: Jump Instruction
7255:035 Computer Architecture and Organization
![Page 73: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/73.jpg)
Performance Comparison
Single-cycle CPU
vs
Multi-cycle CPU
7355:035 Computer Architecture and Organization
![Page 74: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/74.jpg)
Simple Comparison
Single-cycle CPU
1 clock cycle
5 clock cycles
Multi-cycle CPU
4 clock cycles
Multi-cycle CPU
3 clock cycles
Multi-cycle CPU
SW, R-type
BEQ, J
LW
All
![Page 75: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/75.jpg)
What’s really happening?
Single-cycle CPU
Multi-cycle CPU
( Load Word Instruction )
Fetch Decode Memory WriteCalcAddr
Ideally:
7555:035 Computer Architecture and Organization
![Page 76: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/76.jpg)
In practice, steps differ in speeds…
Single-cycle CPU
Multi-cycle CPU
Fetch Decode MemoryCalcAddr
Fetch Decode MemoryCalcAddr
Write
Write
Violation!Wasted time!
Load Word Instruction
7655:035 Computer Architecture and Organization
![Page 77: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/77.jpg)
Single-cycle vs Multi-cycleLW instruction faster for single-cycle
Single-cycle CPU
Fetch Decode MemoryCalcAddr
Fetch Decode MemoryCalcAddr
Write
Write
Violation fixed!
Multi-cycle CPU
Now wasted time is larger!
7755:035 Computer Architecture and Organization
![Page 78: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/78.jpg)
Single-cycle vs Multi-cycleSW instruction ~ same speed
Single-cycle CPU
Fetch Decode MemoryCalcAddr
Fetch Decode MemoryCalcAddr
Multi-cycle CPU
Wasted time!
Speed diff
7855:035 Computer Architecture and Organization
![Page 79: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/79.jpg)
Single-cycle vs Multi-cycleBEQ, J instruction faster for multi-cycle
Single-cycle CPU
Fetch DecodeCalcAddr
Fetch DecodeCalcAddr
Wasted time!
Speed diff
Multi-cycle CPU
7955:035 Computer Architecture and Organization
![Page 80: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/80.jpg)
Performance Summary Which CPU implementation is faster?
LW single-cycle is faster SW,R-type about the same BEQ,J multi-cycle is faster
Real programs use a mix of these instructions
Overall performance depends instruction frequency !
8055:035 Computer Architecture and Organization
![Page 81: 55:035 Computer Architecture and Organization](https://reader031.fdocuments.in/reader031/viewer/2022032804/56812a75550346895d8dfa84/html5/thumbnails/81.jpg)
Implementation Summary Single-cycle CPU
1 instruction per cycle (eg, 1MHz 1 MIPS) No “wasted time” on most complex instruction Large wasted time on simpler instructions Simple controller (just a lookup table or memory) Simple instructions
Multi-cycle CPU << 1 instruction per cycle (eg, 1MHz 0.2 MIPS) Small time wasted on most complex instruction
Hence, this instruction always slower than single-cycle CPU Small time wasted on simple instructions
Eliminates “large wasted time” by using fewer clock cycles Complex controller (FSM) Potential to create complex instructions
8155:035 Computer Architecture and Organization