Gary MarsdenSlide 1University of Cape Town Stages.
-
Upload
homer-weaver -
Category
Documents
-
view
220 -
download
0
Transcript of Gary MarsdenSlide 1University of Cape Town Stages.
Gary Marsden Slide 1University of Cape Town
Stages
PC
Instructionmemory
Readaddress
Instruction[31– 0]
Instruction [20– 16]
Instruction [25– 21]
Add
Instruction [5– 0]
MemtoReg
ALUOp
MemWrite
RegWrite
MemRead
BranchRegDst
ALUSrc
Instruction [31– 26]
4
16 32Instruction [15– 0]
0
0Mux
0
1
Control
Add ALUresult
Mux
0
1
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
Shiftleft 2
Mux
1
ALUresult
Zero
Datamemory
Writedata
Readdata
Mux
1
Instruction [15– 11]
ALUcontrol
ALUAddress
PC
Instructionmemory
Readaddress
Instruction[31– 0]
Instruction [20– 16]
Instruction [25– 21]
Add
Instruction [5– 0]
MemtoReg
ALUOp
MemWrite
RegWrite
MemRead
BranchRegDst
ALUSrc
Instruction [31– 26]
4
16 32Instruction [15– 0]
0
0Mux
0
1
Control
Add ALUresult
Mux
0
1
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
Shiftleft 2
Mux
1
ALUresult
Zero
Datamemory
Writedata
Readdata
Mux
1
Instruction [15– 11]
ALUcontrol
ALUAddress
PC
Instructionmemory
Readaddress
Instruction[31– 0]
Instruction [20 16]
Instruction [25 21]
Add
Instruction [5 0]
MemtoReg
ALUOp
MemWrite
RegWrite
MemRead
BranchRegDst
ALUSrc
Instruction [31 26]
4
16 32Instruction [15 0]
0
0Mux
0
1
ALUcontrol
Control
Add ALUresult
Mux
0
1
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
Mux
1
ALUresult
Zero
Datamemory
ReaddataAddress
Writedata
Mux
1
Instruction [15 11]
ALU
Shiftleft 2
PC
Instructionmemory
Readaddress
Instruction[31– 0]
Instruction [20 16]
Instruction [25 21]
Add
Instruction [5 0]
MemtoReg
ALUOp
MemWrite
RegWrite
MemRead
BranchRegDst
ALUSrc
Instruction [31 26]
4
16 32Instruction [15 0]
0
0Mux
0
1
ALUcontrol
Control
Shiftleft 2
Add ALUresult
Mux
0
1
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
Mux
1
ALUresult
Zero
Datamemory
Writedata
Readdata
Mux
1
Instruction [15 11]
ALUAddress
Gary Marsden Slide 2University of Cape Town
Load instruction - lw$1, offset($2)
PC
Instructionmemory
Readaddress
Instruction[31– 0]
Instruction [15– 11]
Instruction [20– 16]
Instruction [25– 21]
Add
Instruction [5– 0]
MemtoReg
ALUOp
MemWrite
RegWrite
MemRead
BranchRegDst
ALUSrc
Instruction [31– 26]
4
16 32Instruction [15– 0]
0
0Mux
0
1
ALUcontrol
Control
Shiftleft 2
Add ALUresult
Mux
0
1
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
Mux
1
ALUresult
Zero
Datamemory
Writedata
Readdata
Mux
1ALU
Address
Gary Marsden Slide 3University of Cape Town
Beq $1, $2, offset
PC
Instructionmemory
Readaddress
Instruction[31– 0]
Instruction [15– 11]
Instruction [20– 16]
Instruction [25– 21]
Add
Instruction [5– 0]
MemtoReg
ALUOp
MemWrite
RegWrite
MemRead
BranchRegDst
ALUSrc
Instruction [31– 26]
4
16 32Instruction [15– 0]
Shiftleft 2
0Mux
0
1
ALUcontrol
Control
RegistersWriteregister
Writedata
Readdata 1
Readregister 1
Readregister 2
Signextend
1
ALUresult
Zero
Datamemory
Writedata
ReaddataM
ux
Readdata 2
Add ALUresult
Mux
0
1
Mux
1
0
ALUAddress
Gary Marsden Slide 4University of Cape Town
Finalising control
Actual Op code
Gary Marsden Slide 5University of Cape Town
Final truth table
Gary Marsden Slide 6University of Cape Town
PLA implementation
Gary Marsden Slide 7University of Cape Town
Limitations of single cycle
Clock cycle identical for every instruction– CPI = 1
Bound by longest instruction (load word)– Inst., register, ALU, data memory, register
Not all instructions will take this long– Memory access: 8 ns– Register access: 2 ns– ALU: 4 ns
Gary Marsden Slide 8University of Cape Town
Instruction timing
Inst.Class
Inst. Mem
Reg Read
ALU op
Data Mem
Reg Write
Total
R-type 200 50 100 0 50 400ps
Load Word
200 50 100 200 50 600ps
Store Word
200 50 100 200 0 550ps
Branch 200 50 100 0 0 350ps
Jump 200 0 0 0 0 200ps
Gary Marsden Slide 9University of Cape Town
Variable timing
If we looked at a typical instruction profile, we could estimate how inefficient this scheme is:
CPU clock cycle = 600 x 25% + 550 x 10% + 400 x 45% + 350 x 15% + 200 x 5%
CPU clock cycle = 447.5 ps
Gary Marsden Slide 10University of Cape Town
Multicycle implementation
Previously, instruction broken in to a series of steps corresponding to the functional unit operations need
Can use these steps to create a multi-cycle implementation where each step is the execution takes one clock cycle– Unit can be used more than once (on different
cycles)– Can help reduce the total amount of hardware
required• Trade-off with complex control
Gary Marsden Slide 11University of Cape Town
Differences
Single instruction / data memorySingle ALUSome extra registers for buffers (more
later)
PC
Memory
Address
Instructionor data
Data
Instructionregister
Registers
Register #
Data
Register #
Register #
ALU
Memorydata
register
A
B
ALUOut
Gary Marsden Slide 12University of Cape Town
Implications
Need to add more Muxs and registers (cheap)
New control signals– Write signal for each state element (PC,
memory, register file, instruction register)– Read signal for memory– ALU control unit (as before)
But we can ditch two adders and memory unit
Gary Marsden Slide 13University of Cape Town
New Instruction Path
Shiftleft 2
MemtoReg
IorD MemRead MemWrite
PC
Memory
MemData
Writedata
Mux
0
1
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Instruction[15– 11]
Mux
0
1
Mux
0
1
4
ALUOpALUSrcB
RegDst RegWrite
Instruction[15– 0]
Instruction [5– 0]
Signextend
3216
Instruction[25– 21]
Instruction[20– 16]
Instruction[15– 0]
Instructionregister
1 Mux
0
3
2
ALUcontrol
Mux
0
1ALU
resultALU
ALUSrcA
ZeroA
B
ALUOut
IRWrite
Address
Memorydata
register
Gary Marsden Slide 14University of Cape Town
With Control Unit
Shiftleft 2
PCMux
0
1
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Instruction[15– 11]
Mux
0
1
Mux
0
1
4
Instruction[15– 0]
Signextend
3216
Instruction[25– 21]
Instruction[20– 16]
Instruction[15– 0]
Instructionregister
ALUcontrol
ALUresult
ALUZero
Memorydata
register
A
B
IorD
MemRead
MemWrite
MemtoReg
PCWriteCond
PCWrite
IRWrite
ALUOp
ALUSrcB
ALUSrcA
RegDst
PCSource
RegWrite
Control
Outputs
Op[5– 0]
Instruction[31-26]
Instruction [5– 0]
Mux
0
2
Jumpaddress [31-0]Instruction [25– 0] 26 28
Shiftleft 2
PC [31-28]
1
1 Mux
0
3
2
Mux
0
1ALUOut
Memory
MemData
Writedata
Address
Gary Marsden Slide 15University of Cape Town
Breaking into Clock Cycles
Examine what happens in each clock cycle of each instruction to make sure we have enough elements (e.g. registers, control lines)
Registers introduced when– Value computed in one cycle and used in
another– Inputs to a block change before output can be
written to a state element• Mem -> ALU -> Mem
Gary Marsden Slide 16University of Cape Town
Goal of execution cycles
Balance the amount of work done each cycle to minimize the cycle time
In our case, we use 5 stepsEach step limited to
– At most one ALU op– One register access– One memory access
Clock cycle will be same as the longest of these
Gary Marsden Slide 17University of Cape Town
Instruction steps
1. Instruction fetch2. Instruction decode and register fetch3. Execution, mem address completion or
branch completion4. Memory access or R-type write back5. Write back
Using this information we can determine what control must do in each clock cycle
Gary Marsden Slide 18University of Cape Town
Control line effects
Gary Marsden Slide 19University of Cape Town
Shiftleft 2
MemtoReg
IorD MemRead MemWrite
PC
Memory
MemData
Writedata
Mux
0
1
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Instruction[15– 11]
Mux
0
1
Mux
0
1
4
ALUOpALUSrcB
RegDst RegWrite
Instruction[15– 0]
Instruction [5– 0]
Signextend
3216
Instruction[25– 21]
Instruction[20– 16]
Instruction[15– 0]
Instructionregister
1 Mux
0
3
2
ALUcontrol
Mux
0
1ALU
resultALU
ALUSrcA
ZeroA
B
ALUOut
IRWrite
Address
Memorydata
register
Gary Marsden Slide 20University of Cape Town
Instruction fetch
Load instruction from memory IR = Memory [PC]
– Set Read address mux (IorD) = 0 select instruction– Set MemRead = 1
Increment PC PC = PC + 4
– Set ALUSrcA = 0 get operand from IR– Set ALUSrcB = 01 get operand '4'– Set ALUOp = 00 add– Allow storing new PC in PC register
Gary Marsden Slide 21University of Cape Town
Instruction decode and fetch
Switch registers to the output of the register block– A = register [IR [25-21]] rs– B = register [IR [20-16]] rt– No signal setting required
Calculate the branch target address target PC = (sign-ext. (IR [15-0]) << 2)– Stored in the ALUOut register– Set ALUSrcB = 11– Set ALUOp = 00 add
Gary Marsden Slide 22University of Cape Town
Shiftleft 2
MemtoReg
IorD MemRead MemWrite
PC
Memory
MemData
Writedata
Mux
0
1
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Instruction[15– 11]
Mux
0
1
Mux
0
1
4
ALUOpALUSrcB
RegDst RegWrite
Instruction[15– 0]
Instruction [5– 0]
Signextend
3216
Instruction[25– 21]
Instruction[20– 16]
Instruction[15– 0]
Instructionregister
1 Mux
0
3
2
ALUcontrol
Mux
0
1ALU
resultALU
ALUSrcA
ZeroA
B
ALUOut
IRWrite
Address
Memorydata
register
Gary Marsden Slide 23University of Cape Town
Memory access Execution
Step depends on the instructionSelection performed by interpretation of
the op + function field of the instructionCalculate memory reference addressALUOut = A + sign-ext. (IR[15-0])
– Set ALUSrcA = 1 get operand from A– Set ALUSrcB = 10 get operand from sign
extension unit– Set ALUOp = 00 add
Gary Marsden Slide 24University of Cape Town
Shiftleft 2
MemtoReg
IorD MemRead MemWrite
PC
Memory
MemData
Writedata
Mux
0
1
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Instruction[15– 11]
Mux
0
1
Mux
0
1
4
ALUOpALUSrcB
RegDst RegWrite
Instruction[15– 0]
Instruction [5– 0]
Signextend
3216
Instruction[25– 21]
Instruction[20– 16]
Instruction[15– 0]
Instructionregister
1 Mux
0
3
2
ALUcontrol
Mux
0
1ALU
resultALU
ALUSrcA
ZeroA
B
ALUOut
IRWrite
Address
Memorydata
register
Gary Marsden Slide 25University of Cape Town
Execution II
Arithmetic-logical instruction (R-type)– ALUOut = A op B– Set ALUSrcA = 1 get operand from A– Set ALUSrcB = 00 get operand from B– Set ALUOp = 10 code from IR
Branch: if (A == B) PC = ALUOut– Set ALUSrcA = 1 get operand from A– Set ALUSrcB = 00 get operand from B– Set ALUOp = 01 subtraction– Write ALUOut to PC register
Gary Marsden Slide 26University of Cape Town
Mem access complete
Memory access– ALU controls must remain stable– Set IorD = 1 address from ALU
memory-data = memory [ALUOut] load from memory
– Set MemRead = 1
memory [ALUOut] = B store to memory
– Set MemWrite = 1
Gary Marsden Slide 27University of Cape Town
Shiftleft 2
MemtoReg
IorD MemRead MemWrite
PC
Memory
MemData
Writedata
Mux
0
1
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Instruction[15– 11]
Mux
0
1
Mux
0
1
4
ALUOpALUSrcB
RegDst RegWrite
Instruction[15– 0]
Instruction [5– 0]
Signextend
3216
Instruction[25– 21]
Instruction[20– 16]
Instruction[15– 0]
Instructionregister
1 Mux
0
3
2
ALUcontrol
Mux
0
1ALU
resultALU
ALUSrcA
ZeroA
B
ALUOut
IRWrite
Address
Memorydata
register
Gary Marsden Slide 28University of Cape Town
R-type complete
Arithmetic-logical instruction completeRegister [IR [15-11]] = ALUOut
– Set RegDst = 1 Select write register– Set RegWrite = 1 Allow write operation– Set MemToReg = 0 Select ALU data– ALUOp, ALUSrcA, ALUSrcB = constant
Gary Marsden Slide 29University of Cape Town
Write-back
Write data from memory to the register– Reg [IR[20-16]] = memory-data– Set RegDst = 0 Select write rt as target register– Set RegWrite = 1 Allow write operation– Set MemToReg = 1 Select Memory data– ALUOp, ALUSrcA, ALUSrcB = constant
Gary Marsden Slide 30University of Cape Town
Summary
Gary Marsden Slide 31University of Cape Town
Defining Control
Single cycle path– Construct a truth table and mapped them to
logic gates
Multi-cycle– Tricky because of temporal aspect– Control must specify
• Signal settings• Next step in execution
– Two techniques• Finite State machines (usually graphically
represented)• Microprogramming (code representation)
Gary Marsden Slide 32University of Cape Town
Finite State Machines
Consists of– Set of states– Rules for moving between states
Details– Each state has a set of asserted outputs
• Those not explicitly asserted are de-asserted
– States correspond to the 5 stages of execution– Each step takes one clock cycle– Initial two states are common
Gary Marsden Slide 33University of Cape Town
Overview
Gary Marsden Slide 34University of Cape Town
FSM for fetch
Gary Marsden Slide 35University of Cape Town
Complete diagram
PCWritePCSource = 10
ALUSrcA = 1ALUSrcB = 00ALUOp = 01PCWriteCond
PCSource = 01
ALUSrcA =1ALUSrcB = 00ALUOp= 10
RegDst = 1RegWrite
MemtoReg = 0
MemWriteIorD = 1
MemReadIorD = 1
ALUSrcA = 1ALUSrcB = 10ALUOp = 00
RegDst = 0RegWrite
MemtoReg =1
ALUSrcA = 0ALUSrcB = 11ALUOp = 00
MemReadALUSrcA = 0
IorD = 0IRWrite
ALUSrcB = 01ALUOp = 00
PCWritePCSource = 00
Instruction fetchInstruction decode/
register fetch
Jumpcompletion
BranchcompletionExecution
Memory addresscomputation
Memoryaccess
Memoryaccess R-type completion
Write-back step
(Op = 'LW') or (Op = 'SW') (Op = R-type)
(Op =
'BEQ
')
(Op = 'J')
(Op = 'SW
')
(Op = 'LW')
4
01
9862
753
Start
Gary Marsden Slide 36University of Cape Town
FSM Implementation
A register to hold current state
A block of combinational logic to determine:– Datapath signals to be
asserted– The next state
Datapath control outputs
State registerInputs from instructionregister opcode field
Outputs
Combinationalcontrol logic
Inputs
Next state
Gary Marsden Slide 37University of Cape Town
Microprogramming
Design the control as a program that implements the machine instructions in terms of simpler microinstructions– For our subset, FSM are fine– For full instruction set (>100) which vary from 1
to 20 cycles more complexity is required (diagrams insufficient)
– Use ideas from programming to create a simpler way to define control
– Control instructions are referred to as microinstructions (as opposed to MIPS inst.)
Gary Marsden Slide 38University of Cape Town
More Microprogramming
Each instruction defines ‘the set of datapath control signals that must be asserted in a given state’
‘executing’ a microinstruction has the effect of asserting the specified control lines
Format– Symbolic representation of the control that is
translated in to control logic– Can choose number of mInstruction fields and
what control signals are affected by each field
Gary Marsden Slide 39University of Cape Town
Fields
Gary Marsden Slide 40University of Cape Town
Choices
Format is chosen to simplify representation– Improving programmer comprehension– A lot better than pure binary to specify how a
Mux is set
Besides the format of the instruction, we need to figure out the order of execution
Gary Marsden Slide 41University of Cape Town
Choosing next MicroInstruction
Increment address of current mInstruction to get next mInstruction (Seq) - default
Branch to the mInstruction that begins execution of the next MIPS instruction (Fetch)
Choose next instruction based on control unit (Dispatch)– Implemented via a lookup (dispatch) table
containing addresses of target mInstructions– Often multiple tables– Kind of like a switch statement
Gary Marsden Slide 42University of Cape Town
Sample mInstruction
Gary Marsden Slide 43University of Cape Town
Full program
Gary Marsden Slide 44University of Cape Town
Finally - exceptions
Hardest part of control: implementing exceptions and interrupts (events other than branches that change flow of execution)
Interrupt– Unexpected change in flow of control generated
by event outside processor (usually I/O device)Exception
– Any unexpected change of flow control regardless of source
Often, interrupt and exception are not distinguished
Gary Marsden Slide 45University of Cape Town
Exception Handling
Samples include– Invocation of operating system from user– Arithmetic overflow– Undefined instruction– Hardware malfunction
In our subset– Undefined instruction– Arithmetic overflow
Gary Marsden Slide 46University of Cape Town
Responding to an exception
Save address of offending instruction in EPC (exception program counter)
Transfer control to operating system with error handling code
Return to original code (using EPC) and continue. Could be:– Providing service to the user program– Coping with overflow– Stopping execution to report and error
Gary Marsden Slide 47University of Cape Town
Extra info
Operating system must know why the exception happened, not just where. Therefore could have either:– Cause register: a status register which holds
field indicating reason for exception– Vectored interrupts: pair of cause and address
to which control is transferred
Gary Marsden Slide 48University of Cape Town
Implication
Can perform exception handling by adding some control lines and some registers to the processor– EPC - 32 bit obviously (with EPC write control
line)– Cause - 32 bit (with CauseWrite and IntCause
control lines)• IntCause is 0 for undefined and 1 for overflow
– Also need to write to EPC (PC - 4)
Gary Marsden Slide 49University of Cape Town
Gratuitous scary picture
Shiftleft 2
Memory
MemData
Writedata
Mux
0
1
Instruction[15– 11]
Mux
0
1
4
Instruction[15– 0]
Signextend
3216
Instruction[25– 21]
Instruction[20– 16]
Instruction[15– 0]
Instructionregister
ALUcontrol
ALUresult
ALUZero
Memorydata
register
A
B
IorD
MemRead
MemWrite
MemtoReg
PCWriteCond
PCWrite
IRWrite
Control
Outputs
Op[5– 0]
Instruction[31-26]
Instruction [5– 0]
Mux
0
2
Jumpaddress [31-0]Instruction [25– 0] 26 28
Shiftleft 2
PC [31-28]
1
Address
EPC
CO 00 00 00 3
Cause
ALUOp
ALUSrcB
ALUSrcA
RegDst
PCSource
RegWrite
EPCWriteIntCauseCauseWrite
1
0
1 Mux
0
3
2
Mux
0
1
Mux
0
1
PC
Mux
0
1
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
ALUOut
Gary Marsden Slide 50University of Cape Town
Into Practice - Pentium Datapath
Pentium based on complex (CISC) IA-32 instruction set– Some instructions take over 100 clock cycles!– Some only take 3 or 4 clock cycles
Trick is to support the long instructions without impacting the common core of instructions
Control works by– Using MicroCode for the control of long instructions– Hard-wired control for short instructions
Gary Marsden Slide 51University of Cape Town
Summary
Single cycle path has low control overhead but needs a lot of resources and is slow
Multi-cycle much more efficient (speed and resources) but has more complex control
Can use FSM or microcode to specify control– FSM not good for large instruction sets
Also need mechanism to handle interrupts and exceptions