Chapter 4The Processor
Chapter 4 — The Processor — 2
IntroductionCPU performance factors
Instruction countDetermined by ISA and compiler
CPI and Cycle timeDetermined by CPU hardware
We will examine two MIPS implementationsA simplified versionA more realistic pipelined version
Simple subset, shows most aspectsMemory reference: lw, sw
Arithmetic/logical: add, sub, and, or, slt
Control transfer: beq, j
§4.1 Introduction
Chapter 4 — The Processor — 3
Instruction ExecutionPC → instruction memory, fetch instructionRegister numbers → register file, read registersDepending on instruction class
Use ALU to calculateArithmetic resultMemory address for load/storeBranch target address
Access data memory for load/storePC ← target address or PC + 4
Chapter 4 — The Processor — 4
CPU Overview
Chapter 4 — The Processor — 5
MultiplexersCan’t just join wires together
Use multiplexers
Chapter 4 — The Processor — 6
Control
Chapter 4 — The Processor — 7
Logic Design Basics§4.2 Logic D
esign Conventions
Information encoded in binaryLow voltage = 0, High voltage = 1One wire per bitMulti-bit data encoded on multi-wire buses
Combinational elementOperate on dataOutput is a function of input
State (sequential) elementsStore information
Chapter 4 — The Processor — 8
Combinational Elements
AND-gateY = A & B
AB Y
I0I1 Y
Mux
S
MultiplexerY = S ? I1 : I0
A
BY+
A
B
YALU
F
AdderY = A + B
Arithmetic/Logic UnitY = F(A, B)
Chapter 4 — The Processor — 9
Sequential ElementsRegister: stores data in a circuit
Uses a clock signal to determine when to update the stored valueEdge-triggered: update when Clk changes from 0 to 1
D
Clk
QClk
D
Q
Chapter 4 — The Processor — 10
Sequential ElementsRegister with write control
Only updates on clock edge when write control input is 1Used when stored value is required later
D
Clk
QWrite
Write
D
Q
Clk
Chapter 4 — The Processor — 11
Clocking MethodologyCombinational logic transforms data during clock cycles
Between clock edgesInput from state elements, output to state elementLongest delay determines clock period
Chapter 4 — The Processor — 12
Building a DatapathDatapath
Elements that process data and addressesin the CPU
Registers, ALUs, mux’s, memories, …
We will build a MIPS datapath incrementally
Refining the overview designTop Down
§4.3 Building a D
atapath
Chapter 4 — The Processor — 13
Instruction Fetch
32-bit register
Increment by 4 for next instruction
Chapter 4 — The Processor — 14
R-Format InstructionsRead two register operandsPerform arithmetic/logical operationWrite register result
Chapter 4 — The Processor — 15
Load/Store InstructionsRead register operandsCalculate address using 16-bit offset
Use ALU, but sign-extend offsetLoad: Read memory and update registerStore: Write register value to memory
Chapter 4 — The Processor — 16
Branch InstructionsRead register operandsCompare operands
Use ALU, subtract and check Zero outputCalculate target address
Sign-extend displacementShift left 2 places (word displacement)Add to PC + 4
Already calculated by instruction fetch
Chapter 4 — The Processor — 17
Branch Instructions
Justre-routes
wires
Sign-bit wire replicated
Chapter 4 — The Processor — 18
Composing the ElementsFirst-cut data path does an instruction in one clock cycle
Each datapath element can only do one function at a timeHence, we need separate instruction and data memories
Use multiplexers where alternate data sources are used for different instructions
Chapter 4 — The Processor — 19
R-Type/Load/Store Datapath
Chapter 4 — The Processor — 20
Full Datapath
Chapter 4 — The Processor — 21
ALU ControlALU used for
Load/Store: F = addBranch: F = subtractR-type: F depends on funct field
§4.4 A Sim
ple Implem
entation Schem
eALU control Function0000 AND0001 OR0010 add0110 subtract0111 set-on-less-than1100 NOR
All logic can be done with
a NOR
Equal and not-equal can also be done with an
XOR (that is faster), but you need a bit more
hardware.
Chapter 4 — The Processor — 22
ALU ControlAssume 2-bit ALUOp derived from opcode
Combinational logic derives ALU control
opcode ALUOp Operation funct ALU function ALU controllw 00 load word XXXXXX add 0010sw 00 store word XXXXXX add 0010beq 01 branch equal XXXXXX subtract 0110R-type 10 add 100000 add 0010
subtract 100010 subtract 0110AND 100100 AND 0000OR 100101 OR 0001set-on-less-than 101010 set-on-less-than 0111
Chapter 4 — The Processor — 23
The Main Control UnitControl signals derived from instruction
0 rs rt rd shamt funct31:26 5:025:21 20:16 15:11 10:6
35 or 43 rs rt address31:26 25:21 20:16 15:0
4 rs rt address31:26 25:21 20:16 15:0
R-type
Load/Store
Branch
opcode always read
read, except for load
write for R-type
and load
sign-extend and add
Chapter 4 — The Processor — 24
Datapath With Control
Chapter 4 — The Processor — 25
R-Type Instruction
Chapter 4 — The Processor — 26
Load Instruction
Chapter 4 — The Processor — 27
Branch-on-Equal Instruction
Chapter 4 — The Processor — 28
Implementing Jumps
Jump uses word addressUpdate PC with concatenation of
Top 4 bits of old PC26-bit jump address00
Need an extra control signal decoded from opcode
2 address31:26 25:0
Jump
Chapter 4 — The Processor — 29
Datapath With Jumps Added
Chapter 4 — The Processor — 30
Performance IssuesLongest delay determines clock period
Critical path: load instructionInstruction memory → register file → ALU →data memory → register file
Not feasible to vary period for different instructionsViolates design principle
Making the common case fastWe will improve performance by pipelining
Top Related