EECC550 - ShaabanEECC550 - Shaaban#1 Lec # 5 Winter 2005 1-10-2006
Major CPU Design StepsMajor CPU Design Steps1. Analyze instruction set operations using independent RTN
ISA => RTN => datapath requirements.– This provides the the required datapath components and how they are connected to meet ISA
requirements.
2. Select required datapath components, connections & establish clock methodology (e.g clock edge-triggered).
3. Assemble datapath meeting the requirements.
4. Identify and define the function of all control points or signals needed by the datapath.– Analyze implementation of each instruction to determine setting of control points that affects its operations and
register transfer.
5. Design & assemble the control logic.– Hard-Wired: Finite-state machine implementation.– Microprogrammed.
(Chapter 5.5)
EECC550 - ShaabanEECC550 - Shaaban#2 Lec # 5 Winter 2005 1-10-2006
Single Cycle MIPS Datapath: Single Cycle MIPS Datapath: CPI = 1, Long Clock CycleCPI = 1, Long Clock Cycleim
m16
32
ALUop (2-bits)
Clk
busW
RegWr
32
32
busA
32busB
55 5
Rw Ra Rb32 32-bitRegisters
Rs
Rt
Rt
RdRegDst
Exten
der
Mu
x
3216imm16
ALUSrcExtOp
Mu
x
MemtoReg
Clk
Data InWrEn32 Adr
DataMemory
MemWrA
LU
Zero
Instruction<31:0>
0
1
0
1
01
<21:25>
<16:20>
<11:15>
<0:15>
Imm16RdRtRs
=
Ad
der
Ad
der
PC
Clk
00
Mu
x
4
PCSrc
PC
Ext
Adr
InstMemory
BranchZero
0
1
PC+4
BranchTarget
R[rs]
R[rt]
MainALU
(Includes ORInot in book version)
ALUControlFunction
Field
Jump Not Included
EECC550 - ShaabanEECC550 - Shaaban#3 Lec # 5 Winter 2005 1-10-2006
Drawbacks of Single-Cycle ProcessorDrawbacks of Single-Cycle Processor1. Long cycle time:
– All instructions must take as much time as the slowest:• Cycle time for load is longer than needed for all other instructions.
– Real memory is not as well-behaved as idealized memory• Cannot always complete data access in one (short) cycle.
2. Impossible to implement complex, variable-length instructions and complex addressing modes in a single cycle.
• e.g indirect memory addressing.
3. High and duplicate hardware resource requirements– Any hardware functional unit cannot be used more than once in a single cycle (e.g. ALUs).
4. Cannot pipeline (overlap) the processing of one instruction with the previous instructions.– (instruction pipelining, chapter 6).
EECC550 - ShaabanEECC550 - Shaaban#4 Lec # 5 Winter 2005 1-10-2006
Abstract View of Single Cycle CPUAbstract View of Single Cycle CPU
PC
Nex
t P
C
Reg
iste
rF
etch ALU Reg
. W
rt
Mem
Acc
ess
Dat
aM
emInst
ruct
ion
Fet
ch
Res
ult
Sto
re
AL
Uct
r
Reg
Dst
AL
US
rc
Ext
Op
Mem
Wr
Eq
ual
Bra
nch,
Jum
p
Reg
Wr
Mem
Wr
Mem
Rd
MainControl
ALUcontrol
op
fun
Ext
One CPU Clock CycleDuration C = 8ns
One instruction per cycle CPI = 1
Assuming the following datapath/control hardware components delays:Memory Units: 2 ns ALU and adders: 2 nsRegister File: 1 ns Control Unit < 1 ns
EECC550 - ShaabanEECC550 - Shaaban#5 Lec # 5 Winter 2005 1-10-2006
Single Cycle Instruction TimingSingle Cycle Instruction Timing
PC Inst Memory mux ALU Data Mem mux
PC Reg FileInst Memory mux ALU mux
PC Inst Memory mux ALU Data Mem
PC Inst Memory cmp mux
Reg File
Reg File
Reg File
Arithmetic & Logical
Load
Store
Branch
Critical Path
setup
setup
(Determines CPU clock cycle, C)
EECC550 - ShaabanEECC550 - Shaaban#6 Lec # 5 Winter 2005 1-10-2006
Clock Cycle Time & Critical PathClock Cycle Time & Critical Path
• Critical path: the slowest path between any two storage devices
• Clock Cycle time is a function of the critical path, and must be greater than:
– Clock-to-Q + Longest Delay Path through the Combination Logic + Setup + Clock Skew
Clk
.
.
.
.
.
.
.
.
.
.
.
.
One CPU Clock CycleDuration C = 8ns here
Critical Path
Assuming the following datapath/control hardware components delays:Memory Units: 2 ns ALU and adders: 2 nsRegister File: 1 ns Control Unit < 1 ns
EECC550 - ShaabanEECC550 - Shaaban#7 Lec # 5 Winter 2005 1-10-2006
Reducing Cycle Time: Multi-Cycle DesignReducing Cycle Time: Multi-Cycle Design• Cut combinational dependency graph by inserting registers / latches.• The same work is done in two or more shorter cycles, rather than one long
cycle.
storage element
Acyclic CombinationalLogic
storage element
storage element
Acyclic CombinationalLogic (A)
storage element
storage element
Acyclic CombinationalLogic (B)
=>
Place registers to:• Get a balanced clock cycle length• Save any results needed for the remaining cycles
One longcycle
Two shortercycles
Cycle 1
Cycle 2
e.g CPI =1
EECC550 - ShaabanEECC550 - Shaaban#8 Lec # 5 Winter 2005 1-10-2006
Basic MIPS Instruction Processing StepsBasic MIPS Instruction Processing Steps
Obtain instruction from program storage
Determine instruction type
Obtain operands from registers
Compute result value or status
Store result in register/memory if needed
(usually called Write Back).
Update program counter to address
of next instruction } Commonsteps for all instructions
Instruction
Fetch
Instruction
Decode
Execute
Result
Store
Next
Instruction
Instruction Mem[PC]
PC PC + 4
Done by Control Unit
Instruction Memory
EECC550 - ShaabanEECC550 - Shaaban#9 Lec # 5 Winter 2005 1-10-2006
Partitioning The Single Cycle DatapathPartitioning The Single Cycle Datapath Add registers between steps to break into cycles
PC
Nex
t P
C
Ope
rand
Fet
ch Exec Reg
. F
ile
Mem
Acc
ess
Dat
aM
em
Inst
ruct
ion
Fet
ch
Res
ult
Sto
re
AL
Uct
r
Reg
Dst
AL
US
rc
Ext
Op
Mem
Wr
Bra
nch,
Ju
mp
Reg
Wr
Mem
Wr
Mem
Rd
Instruction Fetch Cycle (IF)
Instruction Decode Cycle (ID)
Execution Cycle (EX)
Data Memory Access Cycle (MEM)
Write back Cycle (WB)
1 2 3 4 5
Place registers to:• Get a balanced clock cycle length• Save any results needed for the remaining cycles
EECC550 - ShaabanEECC550 - Shaaban#10 Lec # 5 Winter 2005 1-10-2006
Example Multi-cycle DatapathExample Multi-cycle Datapath
PC
Nex
t P
C
Ext
ALU Reg
. F
ile
Mem
Acc
ess
Dat
aM
em
AL
Uct
r
Reg
Dst
AL
US
rc
Ext
Op
Bra
nch,
Jum
p
Reg
Wr
Mem
Wr
Mem
Rd
IR
A
B
R
M
RegFile
Mem
ToR
eg
Equ
al
Registers added: All clock-edge triggered (not shown register write enable control lines)
IR: Instruction registerA, B: Two registers to hold operands read from register file.R: or ALUOut, holds the output of the main ALUM: or Memory data register (MDR) to hold data read from data memoryCPU Clock Cycle Time: Worst cycle delay = C = 2ns (ignoring MUX, CLK-Q delays)
Instruction Fetch (IF) 2ns
Instruction Decode (ID) 1ns
Execution (EX) 2ns
Memory (MEM) 2ns
Write Back (WB) 1ns
To Control Unit
Assuming the following datapath/control hardware components delays:Memory Units: 2 ns ALU and adders: 2 nsRegister File: 1 ns Control Unit < 1 ns
Inst
ruct
ion
Fet
ch
EECC550 - ShaabanEECC550 - Shaaban#11 Lec # 5 Winter 2005 1-10-2006
Operations (Dependant RTN) for Each CycleOperations (Dependant RTN) for Each Cycle
Instruction Fetch
Instruction Decode
Execution
Memory
WriteBack
R-Type
IR Mem[PC]
A R[rs]
B R[rt]
R A funct B
R[rd] R
PC PC + 4
Logic Immediate
IR Mem[PC]
A R[rs]
B R[rt
R A OR ZeroExt[imm16]
R[rt] R
PC PC + 4
Load
IR Mem[PC]
A R[rs]B R[rt
R A + SignEx(Im16)
M Mem[R]
R[rt] M
PC PC + 4
Store
IR Mem[PC]
A R[rs]
B R[rt]
R A + SignEx(Im16)
Mem[R] B
PC PC + 4
Branch
IR Mem[PC]
A R[rs]
B R[rt]
Zero A - B
If Zero = 1:
PC PC + 4 +
(SignExt(imm16) x4)
else (i.e Zero =0):
PC PC + 4
IF
ID
EX
MEM
WB
Instruction Fetch (IF) & Instruction Decode cycles are common for all instructions
EECC550 - ShaabanEECC550 - Shaaban#12 Lec # 5 Winter 2005 1-10-2006
MIPS Multi-Cycle Datapath:MIPS Multi-Cycle Datapath: Five Cycles of LoadFive Cycles of Load
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5
IF ID EX MEM WBLoad
1- Instruction Fetch (IF): Fetch the instruction from instruction Memory.
2- Instruction Decode (ID): Operand Register Fetch and Instruction Decode.
3- Execute (EX): Calculate the effective memory address.
4- Memory (MEM): Read the data from the Data Memory.
5- Write Back (WB): Write the loaded data to the register file. Update PC.
EECC550 - ShaabanEECC550 - Shaaban#13 Lec # 5 Winter 2005 1-10-2006
Multi-cycle Datapath Instruction CPIMulti-cycle Datapath Instruction CPI• R-Type/Immediate: Require four cycles, CPI = 4
– IF, ID, EX, WB
• Loads: Require five cycles, CPI = 5– IF, ID, EX, MEM, WB
• Stores: Require four cycles, CPI = 4– IF, ID, EX, MEM
• Branches/Jumps: Require three cycles, CPI = 3– IF, ID, EX
• Average or effective program CPI: 3 CPI 5 depending on program profile (instruction mix).
EECC550 - ShaabanEECC550 - Shaaban#14 Lec # 5 Winter 2005 1-10-2006
Single Cycle Vs. Multi-Cycle CPUSingle Cycle Vs. Multi-Cycle CPU
Clk
Cycle 1
Multiple Cycle Implementation:
IF ID EX MEM WB
Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10
IF ID EX MEM
Load Store
Clk
Single Cycle Implementation:
Load Store Waste
IF
R-type
Cycle 1 Cycle 2
8 ns
2ns (500 MHz)
Single-Cycle CPU:CPI = 1 C = 8nsOne million instructions take = I x CPI x C = 106 x 1 x 8x10-9 = 8 msec
Multi-Cycle CPU:CPI = 3 to 5 C = 2nsOne million instructions take from 106 x 3 x 2x10-9 = 6 msecto 106 x 5 x 2x10-9 = 10 msecdepending on instruction mix used.
8ns (125 MHz)
Assuming the following datapath/control hardware components delays:Memory Units: 2 ns ALU and adders: 2 nsRegister File: 1 ns Control Unit < 1 ns
EECC550 - ShaabanEECC550 - Shaaban#15 Lec # 5 Winter 2005 1-10-2006
Finite State Machine (FSM) Control ModelFinite State Machine (FSM) Control Model• State specifies control points (outputs) for Register Transfer.• Control points (outputs) are assumed to depend only on the current state
and not inputs (i.e. Moore finite state machine)• Transfer (register/memory writes) and state transition occur upon exiting
the state on the falling edge of the clock.
State X
Register TransferControl Points
State Transition Depends on Inputs
Control State
Next StateLogic
Output Logic
inputs (opcode, conditions)
outputs (control points)
Next State
Last State
To datapath
Current State
EECC550 - ShaabanEECC550 - Shaaban#16 Lec # 5 Winter 2005 1-10-2006
Control Specification For Multi-cycle CPUControl Specification For Multi-cycle CPUFinite State Machine (FSM) - State Transition DiagramFinite State Machine (FSM) - State Transition Diagram
IR MEM[PC]
R-type
A R[rs]B R[rt]
R A fun B
R[rd] RPC PC + 4
R A or ZX
R[rt] RPC PC + 4
ORi
R A + SX
R[rt] MPC PC + 4
M MEM[R]
LW
R A + SX
MEM[R] BPC PC + 4
BEQ & Zero
BEQ & ~Zero
PC PC + 4 PC PC + 4+ SX || 00
SW
“instruction fetch”
“decode / operand fetch”
Execute
Memory
Write-back
To instruction fetch
To instruction fetchTo instruction fetch
13 states:4 State Flip-Flops needed
(Start state)
EECC550 - ShaabanEECC550 - Shaaban#17 Lec # 5 Winter 2005 1-10-2006
Traditional FSM ControllerTraditional FSM Controller
State
6
4
11nextState
op
Equal
control points
state op condnextstate control points
Truth or Transition Table
datapath StateTo datapath
Outputs (Control points)
OpcodeCurrent State
State register (4 Flip=flops)
Output Logic
Next StateLogic
EECC550 - ShaabanEECC550 - Shaaban#18 Lec # 5 Winter 2005 1-10-2006
Traditional FSM ControllerTraditional FSM Controller
datapath + state diagram => controldatapath + state diagram => control
• Translate RTN statements into control points.
• Assign states.
• Implement the controller.
EECC550 - ShaabanEECC550 - Shaaban#19 Lec # 5 Winter 2005 1-10-2006
Mapping RTNs To Control Points ExamplesMapping RTNs To Control Points Examples& State Assignments& State Assignments
IR MEM[PC]
0000
R-type
A R[rs]B R[rt] 0001
R A fun B 0100
R[rd] RPC PC + 4
0101
R A or ZX 0110
R[rt] RPC PC + 4
0111
ORi
R A + SX 1000
R[rt] MPC PC + 4
1010
M MEM[R] 1001
LW
R A + SX 1011
MEM[R] BPC PC + 4 1100
BEQ & Zero
BEQ & ~Zero
PC PC + 4 0011
PC PC + 4+SX || 00 0010
SW
“instruction fetch”
“decode / operand fetch”
Execute
Memory
Write-back
imem_rd, IRen
Aen, Ben
ALUfun, Sen
RegDst,RegWr,PCen To instruction fetch
state 0000
To instruction fetch state 0000To instruction fetch state 0000
0
1
2
3
4
5 7
8
9
10
116
12
EECC550 - ShaabanEECC550 - Shaaban#20 Lec # 5 Winter 2005 1-10-2006
Detailed Control Specification - State Transition TableCurrent Op field Z Next IR PC Ops Exec Mem Write-BackState en sel A B Ex Sr ALU S R W M M-R Wr
Dst0000 ?????? ? 0001 10001 BEQ 0 0011 1 10001 BEQ 1 0010 1 10001 R-type x 0100 1 10001 orI x 0110 1 10001 LW x 1000 1 10001 SW x 1011 1 10010 xxxxxx x 0000 1 10011 xxxxxx x 0000 1 00100 xxxxxx x 0101 0 1 fun 10101 xxxxxx x 0000 1 0 0 1 10110 xxxxxx x 0111 0 0 or 10111 xxxxxx x 0000 1 0 0 1 01000 xxxxxx x 1001 1 0 add 11001 xxxxxx x 1010 1 0 11010 xxxxxx x 0000 1 0 1 1 01011 xxxxxx x 1100 1 0 add 11100 xxxxxx x 0000 1 0 0 1
R
ORI
LW
SW
BEQ
IF
ID
Can be combines in one state
EECC550 - ShaabanEECC550 - Shaaban#21 Lec # 5 Winter 2005 1-10-2006
Alternative Multiple Cycle Datapath (In Textbook)• Miminizes Hardware: 1 memory, 1 ALU
IdealMemory
Din
Address
32
32
32Dout
MemWr32
AL
U
3232
ALUOp
ALUControl
32
IRWr
Instru
ction R
eg
32
Reg File
Ra
Rw
busW
Rb5
5
32busA
32busB
RegWr
Rs
Rt
Mu
x
0
1
Rt
Rd
PCWr
ALUSrcA
Mux 01
RegDst
Mu
x
0
1
32
PC
MemtoReg
Extend
Mu
x
0
132
0
123
4
16Imm 32
ALUSrcB
Mu
x1
0
32
Zero
ZeroPCWrCond PCSrc
32
IorD
Mem
Data R
eg
AL
U O
ut
B
A
<< 2
MemRd
EECC550 - ShaabanEECC550 - Shaaban#22 Lec # 5 Winter 2005 1-10-2006
Alternative Multiple Cycle Datapath (In Textbook)
•Shared instruction/data memory unit• A single ALU shared among instructions• Shared units require additional or widened multiplexors• Temporary registers to hold data between clock cycles of the instruction:
• Additional registers: Instruction Register (IR), Memory Data Register (MDR), A, B, ALUOut
(Figure 5.27 page 322)
rs
rt
rd
imm16
EECC550 - ShaabanEECC550 - Shaaban#23 Lec # 5 Winter 2005 1-10-2006
Alternative Multiple Cycle Datapath With Control Lines (Fig 5.28 In Textbook)
(ORI not supported, Jump supported)
PC+ 4
BranchTarget
rs
rt
rd
2
2
2
(Figure 5.28 page 323)
imm16
EECC550 - ShaabanEECC550 - Shaaban#24 Lec # 5 Winter 2005 1-10-2006
The Effect of The 1-bit Control Signals Signal Name
RegDst
RegWrite
ALUSrcA
MemRead
MemWrite
MemtoReg
IorD
IRWrite
PCWrite
PCWriteCond
Effect when deasserted (=0)
The register destination number for thewrite register comes from the rt field(instruction bits 20:16).
None
The first ALU operand is the PC
None
None
The value fed to the register write data input comes from ALUOut register.
The PC is used to supply the address to thememory unit.
None
None
None
Effect when asserted (=1)
The register destination number for thewrite register comes from the rd field(instruction bits 15:11).The register on the write register inputis written with the value on the Write data input.
The First ALU operand is register A (I.e R[rs])
Content of memory specified by the address input are put on the memory data output.
Memory contents specified by the address input is replaced by the value on the Write data input.
The value fed to the register write data input comes from data memory register (MDR).
The ALUOut register is used to supply the the address to the memory unit.
The output of the memory is written into Instruction Register (IR)
The PC is written; the source is controlled by PCSource
The PC is written if the Zero output of the ALU isalso active.
(Figure 5.29 page 324)
EECC550 - ShaabanEECC550 - Shaaban#25 Lec # 5 Winter 2005 1-10-2006
The Effect of The 2-bit Control Signals Signal Name
ALUOp
ALUSrcB
PCSource
Value (Binary)
00
01
10
00
01
10
11
00
01
10
Effect
The ALU performs an add operation
The ALU performs a subtract operation
The funct field of the instruction determines the ALU operation (R-Type)
The second input of the ALU comes from register B
The second input of the ALU is the constant 4
The second input of the ALU is the sign-extended 16-bitimmediate field of the instruction in IR
The second input of the ALU is is the sign-extended 16-bitimmediate field of IR shifted left 2 bits
Output of the ALU (PC+4) is sent to the PC for writing
The content of ALUOut (the branch target address) is sent to the PC for writing
The jump target address (IR[25:0] shifted left 2 bits and concatenated with PC+4[31:28] is sent to the PC for writing
(Figure 5.29 page 324)
EECC550 - ShaabanEECC550 - Shaaban#26 Lec # 5 Winter 2005 1-10-2006
Instruction Fetch
Instruction Decode
Execution
Memory
WriteBack
R-Type
IR Mem[PC]PC PC + 4
A R[rs]
B R[rt]
ALUout PC + (SignExt(imm16) x4)
ALUout
A funct B
R[rd] ALUout
Load
IR Mem[PC]PC PC + 4
A R[rs]
B R[rt]
ALUout PC +
(SignExt(imm16) x4)
ALUout
A + SignEx(Im16)
M Mem[ALUout]
R[rt] M
Store
IR Mem[PC]PC PC + 4
A R[rs]
B R[rt]
ALUout PC +
(SignExt(imm16) x4)
ALUout
A + SignEx(Im16)
Mem[ALUout] B
Branch
IR Mem[PC]PC PC + 4
A R[rs]
B R[rt]
ALUout PC +
(SignExt(imm16) x4)
Zero A - B
Zero: PC ALUout
Jump
IR Mem[PC]PC PC + 4
A R[rs]
B R[rt]
ALUout PC +
(SignExt(imm16) x4)
PC Jump Address
IF
ID
EX
MEM
WB
Instruction Fetch (IF) & Instruction Decode cycles are common for all instructions
Operations (Dependant RTN) for Each CycleOperations (Dependant RTN) for Each Cycle
EECC550 - ShaabanEECC550 - Shaaban#27 Lec # 5 Winter 2005 1-10-2006
High-Level View of Finite State High-Level View of Finite State Machine ControlMachine Control
• First steps are independent of the instruction class• Then a series of sequences that depend on the instruction opcode• Then the control returns to fetch a new instruction.• Each box above represents one or several state.
(Figure 5.32)
(Figure 5.33) (Figure 5.34) (Figure 5.35) (Figure 5.36)
(Figure 5.31 page 332)
EECC550 - ShaabanEECC550 - Shaaban#28 Lec # 5 Winter 2005 1-10-2006
Instruction Fetch (IF) and Decode (ID) Instruction Fetch (IF) and Decode (ID) FSM StatesFSM States
IFID
(Figure 5.33) (Figure 5.34) (Figure 5.35) (Figure 5.36)
(Figure 5.32 page 333)
IR Mem[PC]PC PC + 4
A R[rs]
B R[rt]
ALUout PC + (SignExt(imm16) x4)
EECC550 - ShaabanEECC550 - Shaaban#29 Lec # 5 Winter 2005 1-10-2006
Load/Store Instructions FSM StatesLoad/Store Instructions FSM States
EX
MEM
WB To Instruction Fetch(Figure 5.32)
(From Instruction Decode)
(Figure 5.33 page 334)
ALUout A + SignEx(Im16)
M Mem[ALUout]
Mem[ALUout] B
R[rt] M
EECC550 - ShaabanEECC550 - Shaaban#30 Lec # 5 Winter 2005 1-10-2006
R-Type Instructions R-Type Instructions FSM StatesFSM States
EX
WB
To State 0 (Instruction Fetch) (Figure 5.32)
(From Instruction Decode)
(Figure 5.34 page 335)
ALUout A funct B
R[rd] ALUout
EECC550 - ShaabanEECC550 - Shaaban#31 Lec # 5 Winter 2005 1-10-2006
Jump Instruction Jump Instruction Single EX StateSingle EX State
Branch Instruction Branch Instruction Single EX StateSingle EX State
EXEX
To State 0 (Instruction Fetch) (Figure 5.32)
(From Instruction Decode)
To State 0 (Instruction Fetch) (Figure 5.32)
(From Instruction Decode)
(Figures 5.35, 5.36 page 337)
PC Jump AddressZero A - B
Zero : PC ALUout
EECC550 - ShaabanEECC550 - Shaaban#32 Lec # 5 Winter 2005 1-10-2006
FSM State TransitionDiagram (From Book) IF ID
EX
MEM WB
WB
(Figure 5.38 page 339)
EECC550 - ShaabanEECC550 - Shaaban#33 Lec # 5 Winter 2005 1-10-2006
MIPS Multi-cycle Datapath MIPS Multi-cycle Datapath Performance EvaluationPerformance Evaluation
• What is the average CPI?– State diagram gives CPI for each instruction type.
– Workload (program) below gives frequency of each type.
Type CPIi for type Frequency CPIi x freqIi
Arith/Logic 4 40% 1.6
Load 5 30% 1.5
Store 4 10% 0.4
branch 3 20% 0.6
Average CPI: 4.1
Better than CPI = 5 if all instructions took the same number of clock cycles (5).
Top Related