Post on 03-Apr-2018
7/29/2019 8_3501_07_2
1/19
CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 1
Designing
MIPS Processor(Single-Cycle)
Dr. Arjan DurresiLouisiana State University
Baton Rouge, LA 70810Durresi@Csc.LSU.Edu
These slides are available at:http://www.csc.lsu.edu/~durresi/CSC3501_07/
CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 2
OverviewOverview
Datapath
Control Unit
Problems with single cycle Datapath
7/29/2019 8_3501_07_2
2/19
CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 3
PerformancePerformance Instruction Count Clock cycle time Clock cycle per Instruction
Implementation of Processor
CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 4
We're ready to look at an implementation of the MIPS
Simplified to contain only:
memory-reference instructions: lw, sw
arithmetic-logical instructions: add, sub, and, or, slt
control flow instructions: beq, j
Generic Implementation:
use the program counter (PC) to supply instruction address
get the instruction from memory
read registers use the instruction to decide exactly what to do
All instructions use the ALU after reading the registersWhy? memory-reference? arithmetic? control flow?
The Processor: Datapath &The Processor: Datapath &ControlControl
7/29/2019 8_3501_07_2
3/19
CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 5
Abstract / Simplified View:
Two types of functional units: elements that operate on data values (combinational) elements that contain state (sequential)
More Implementation DetailsMore Implementation Details
Data
Register #
Register #
Register #
PC Address Instruction
Instruction
memory
Registers ALU Address
Data
Data
memory
AddAdd
4
CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 6
Building the DatapathBuilding the Datapath
7/29/2019 8_3501_07_2
4/19
CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 7
Our Implementation An edge triggered methodology
Typical execution: read contents of some state elements at the beginning of the
clock cycle,
send values through some combinational logic,
write results to one or more state elements at the end of theclock cycle.
An edge triggered methodology allows a state element to be readand written in the same clock cycle without creating a race thatcould to indeterminate data.
Stateelement
1
Stateelement
2
Combinational logic
Clock cycle
CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 8
Single Cycle Design We shall first design a simpler processor that executes each
instruction in only one clock cycle time.
This is not efficient from performance point of view, since:
a clock cycle time (i.e. clock rate) must be chosen such thatthe longest instruction can be executed in one clock cycle
makes shorter instructions execute in one unnecessary longcycle.
Additionally, no resource in the design may be used more thanonce per instruction, thus some resources will be duplicated.
Because of that, the singe cycle design will require:
two memories (instruction and data),
two additional adders.
7/29/2019 8_3501_07_2
5/19
CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 9
Elements for Datapath Design
Include the functional units we need for eachinstruction
Why do we need this stuff?
PC
Instructionaddress
Instruction
Instruction
memory
Add Sum
a. Instruction memory b. Program counter c. Adder
Readregister 1
Readregister 2
Writeregister
WriteData
Registers ALUData
Data
Zero
ALUresult
RegWrite
a. Registers b. ALU
5
5
5
Registernumbers
Readdata 1
Readdata 2
ALU operation4
AddressReaddata
Data
memory
a. Data memory unit
Writedata
MemRead
MemWrite
b. Sign-extension unit
Signextend
16 32
CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 10
Abstract /Simplified View (1stlook)
Generic implementation:
use the program counter (PC) to supply instruction address, get the instruction from memory,
read registers,
use the instruction to decide exactly what to do.
7/29/2019 8_3501_07_2
6/19
CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 11
Abstract /Simplified View (2nd
look)
PC is incremented by 4, by most instructions, and by 4+ 4offset, by branch instructions.
Jump instructions change PC differently (not shown).
Data
Register #
Register #
Register #
PC Address Instruction
Instruction
memory
Registers ALU Address
Data
Data
memory
AddAdd
4
CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 12
Incrementing PC & FetchingInstruction
7/29/2019 8_3501_07_2
7/19
CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 13
Two elements needed toTwo elements needed to
implement Rimplement R
--formant ALUformant ALU
operationsoperations
add $t1,$t2,$t3
CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 14
Complete Datapath for R-typeInstructions
7/29/2019 8_3501_07_2
8/19
CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 15
Two elements needed toTwo elements needed to
implement loads and storesimplement loads and stores
lw $t1, offset_value($t2)
Sw $t1, offset_value($t2)
Compute a memory address by adding the base register ($t2)
to the 16-bit signed offset field contained in the instruction
CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 16
Datapath for LW and SWInstructions
7/29/2019 8_3501_07_2
9/19
CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 17
Datapath for R-type, LW & SW
Instructions
CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 18
The datapath for a branchThe datapath for a branch
beq $t1, $t2, offset
Compute the branch target address by adding the sign-extended
offset of the instruction to PC
7/29/2019 8_3501_07_2
10/19
CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 19
Datapath for R-type, LW, SW
& BEQ
CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 20
ALUALU
Generate the 4-bit ALU control input using a small controlunit that has as inputs the function field of the instruction and
a 2-bit control field ALUOp
7/29/2019 8_3501_07_2
11/19
CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 21
ControlControl Selecting the operations to perform (ALU, read/write, etc.)
Controlling the flow of data (multiplexor inputs)
Information comes from the 32 bits of the instruction
Example:
add $8, $17, $18 Instruction Format:
000000 10001 10010 01000 00000 100000
op rs rt rd shamt funct
ALU's operation based on instruction type and function code
31:26 25:21 20:16 15:11 10:6 5:0
CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 22
Load, store and branchLoad, store and branchinstructionsinstructions
35 or 43 rs rt address
31:26 25:21 20:16 15:0
Load or store instruction
4 rs rt address
31:26 25:21 20:16 15:0
Branch instruction
7/29/2019 8_3501_07_2
12/19
CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 23
Must describe hardware to compute 4-bit ALU control input given instruction type
00 = lw, sw01 = beq,10 = arithmetic
function code for arithmetic
Describe it using a truth table (can turn into gates):
ALUOp
computed from instruction type
ControlControl
CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 24
Truth Table for (Main) Control Unit
7/29/2019 8_3501_07_2
13/19
CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 25
Truth Table of ALU Control Unit
CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 26
Design of (Main) Control Unit
7/29/2019 8_3501_07_2
14/19
CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 27
Design of Control Unit (J included)
CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 28
Design of 7-Function ALU Control Unit
7/29/2019 8_3501_07_2
15/19
CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 29
Cycle Time Calculation
CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 30
Single Cycle ImplementationSingle Cycle Implementation Calculate cycle time assuming negligible delays except:
memory (200ps),ALU and adders (100ps),register file access (50ps)
Readregister 1
Readregister 2
Writeregister
Writedata
Writedata
Registers ALU
Add
Zero
RegWrite
MemRead
MemWrite
PCSrc
MemtoReg
Readdata 1
Readdata 2
ALU operation4
Signextend
16 32
Instruction
ALUresult
Add
ALUresult
Mux
Mux
Mux
ALUSrc
Address
Datamemory
Readdata
Shiftleft 2
4
Readaddress
Instructionmemory
PC
7/29/2019 8_3501_07_2
16/19
CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 31
Performance of Single cycle machinesPerformance of Single cycle machines Memory (200ps), ALU and adders (100ps), register file access (50ps)
Instructions mix: 25% loads, 10% stores, 45% ALU, 15% branches, 5%jumps. Which of the following implementations would be faster?
Every instruction operates in a 1 clock cycle of a fixed length Every instruction operates in a 1 clock cycle of a variable length
CPU execution time = Instruction count x CPI x Clock cycle time Since CPI = 1 CPU execution time = Instruction count x Clock cycle time Using the critical paths we can compute the required length for each
class: R-type 400ps, Load word 600ps, Store word 550ps, Branch 350ps, jump
200ps In case 1 the clock has to be 600ps depending on the longest instruction A machine with a variable clock will have a clock cycle that varies
between 200ps and 600ps. The average CPU clock cycle= 600x25% + 550x10% + 400x45% +350x15%+200x5% = 447.5ps So the variable clock machine is faster 1.34 times
CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 32
ExampleExampleInstructionclass
Functional units used by the instruction class
R-type Instruction fetch
Registeraccess
ALU Registeraccess
Load word Instruction fetch
Registeraccess
ALU Registeraccess
Registeraccess
Store word Instruction fetch
Registeraccess
ALU Registeraccess
Branch Instruction fetch
Registeraccess
ALU
Jump Instruction fetch
7/29/2019 8_3501_07_2
17/19
CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 33
ExampleExampleInstructionclass
Instruction
memory
Registerread
ALU Datamemory
Registerwrite
Total
R-type 200 50 100 0 50 400ps
Load word 200 50 100 200 50 600ps
Store word 200 50 100 200 0 550ps
Branch 200 50 100 0 350ps
Jump 200 200ps
CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 34
Abstract View of our single cycle procesAbstract View of our single cycle proces
looks like a FSM with PC as state
PC
NextPC
R
egister
Fetch ALU R
eg.
Wrt
Mem
A
ccess
Data
MemIn
struction
Fetch
ResultStore
ALUctr
RegDst
ALUSrc
ExtOp
MemWr
Equal
nPC_
sel
RegWr
MemWr
MemRd
MainControl
ALUcontrol
op
fun
Ext
7/29/2019 8_3501_07_2
18/19
CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 35
WhatWhats wrong with our CPI=1 processor?s wrong with our CPI=1 processor?
Long Cycle Time
All instructions take as much time as the slowest Real memory is not as nice as our idealized memory
cannot always get the job done in one (short) cycle
PC Inst Memory mux ALU Data Mem mux
PC Reg FileInst Memory mux ALU mux
PC Inst Memory mux ALU Data Mem
PC Inst Memory cmp mux
Reg File
Reg File
Reg File
Arithmetic & Logical
Load
Store
Branch
Critical Path
setup
setup
CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 36
Where we are headedWhere we are headed Single Cycle Problems:
what if we had a more complicated instruction like floating point?
wasteful of area
One Solution:
use a smaller cycle time
have different instructions take different numbers of cycles
a multicycle datapath:
Data
Register #
Register #
Register #
PC Address
Instructionor dataMemory
Registers ALU
Instruction
register
Memorydata
register
ALUOut
A
BData
7/29/2019 8_3501_07_2
19/19
CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 37
Single Cycle Processor:Conclusion
Single Cycle Problems: what if we had a more complicated instruction like
floating point?
a clock cycle would be much longer,
thus for shorter and more often used instructions,such as add & lw, wasteful of time.
One Solution:
use a smaller cycle time, and
have different instructions take different numbers
of cycles. And that is a multi-cycle processor.