The Processor: Datapath and Control
description
Transcript of The Processor: Datapath and Control
![Page 1: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/1.jpg)
The Processor:
Datapath and Control
![Page 2: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/2.jpg)
Outline
Goals in processor implementation
Brief review of sequential logic design
Pieces of the processor implementation puzzle
A simple implementation of a MIPS integer instruction subsetDatapath Control logic design
A multi-cycle MIPS implementationDatapath Control logic design
Microcoded control
Exceptions
Some real microprocessor datapath and control
![Page 3: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/3.jpg)
Goals in processor implementation
Balance the rate of supply of instructions and data and the rate at which the execution core can consume them and can update memory
instruction supply data supplyexecution core
![Page 4: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/4.jpg)
Goals in processor implementation
Recall from Chapter 2CPU Time = INST x CPI x CT
INST largely a function of the ISA and compiler
Objective: minimize CPI x CT within design constraints (cost, power, etc.)
Trading off CPI and CT is tricky
multiplier
multiplier
multiplier
logic
logic
logic
![Page 5: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/5.jpg)
Brief review of sequential logic design
State elements are clocked devicesFlip flops, etc
Combinatorial elements hold no stateALU, caches, multiplier, multiplexers, etc.
In edge triggered clocking, state elements are only updated on the (rising) edge of the clock pulse
![Page 6: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/6.jpg)
Brief review of sequential logic design
The same state element can be read at the beginning of a clock cycle and updated at the end
Example: incrementing the PC
Add
12
8
PC
4
clock
PC register 8 12
12Add output
Add input 8
clock
![Page 7: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/7.jpg)
Our processor design progression
(1) Instruction fetch, execute, and operand reads from data memory all take place in a single clock cycle
(2) Instruction fetch, execute, and operand reads from data memory take place in successive clock cycles
(3) A pipelined design
![Page 8: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/8.jpg)
Pieces of the processor puzzle
Instruction fetch
Execution
Data memory
instruction supply data supplyexecution core
![Page 9: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/9.jpg)
Instruction fetch datapath
Memory to hold instructions
Register to hold the instruction memory address
Logic to generate the next instruction address
PC +4
![Page 10: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/10.jpg)
Execution datapath
Focus on only a subset of all MIPS instructionsadd, sub, and, orlw, sw sltbeq, j
For all instructions except j, we Read operands from the register filePerform an ALU operation
For all instructions except sw, beq, and j, we write a result into the register file
![Page 11: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/11.jpg)
Execution datapath
Register file block diagram
Read register 1,2: source operand register numbers Read data 1,2: source operands (32 bits each)Write register: destination operand register numberWrite data: data written into register file RegWrite: when asserted, enables the writing of Write
Data
![Page 12: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/12.jpg)
Execution datapath
Datapath for R-type (add, sub, and, or, slt)
R-type instruction format:
op rs rt functrd shamt31 26 16 15 11 10 6 5 025 2021
![Page 13: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/13.jpg)
Execution datapath
Datapath for beq instruction
I-type instruction format:
Zero ALU output indicates if rs=rt (branch is taken/not taken)Branch target address is the sign extended immediate left
shifted two positions, and added to PC+4
op rs rt immediate31 26 16 15 025 2021
![Page 14: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/14.jpg)
Data memory Used for lw, sw (I-type format)
Block diagram
Address: memory location to be read or writtenRead data: data out of the memory on a loadWrite data: data into the memory on a storeMemRead: indicates a read operation is to be performedMemWrite: indicates a write operation is to be performed
![Page 15: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/15.jpg)
Execution datapath + data memory
Datapath for lw, sw
Address is the sign-extended immediate added to the source operand read out of the register file
sw: data written to memory from specified registerlw: data written to register file from specified memory
address
![Page 16: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/16.jpg)
Putting the pieces together Single clock cycle for fetch, execute, and
operand read from data memory
3 MUXesRegister file operand or sign extended immediate to ALUALU or data memory output written to register filePC+4 or branch target address written to PC register
![Page 17: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/17.jpg)
Datapath for R-type instructions
Example: add $4, $18, $30
![Page 18: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/18.jpg)
Datapath for I-type ALU instructions
Example: slti $7, $4, 100
![Page 19: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/19.jpg)
Datapath for not taken beq instruction
Example: beq $28, $13, EXIT
![Page 20: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/20.jpg)
Datapath for taken beq instruction
Example: beq $28, $13, EXIT
![Page 21: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/21.jpg)
Datapath for load instruction
Example: lw $8, 112($2)
![Page 22: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/22.jpg)
Datapath for store instruction
Example: sw $10, 0($3)
![Page 23: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/23.jpg)
Control signals we need to generate
![Page 24: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/24.jpg)
ALU operation control
ALU control input codes from Chapter 4
Two steps to generate the ALU control inputUse the opcode to distinguish R-type, lw and sw, and
beqIf R-type, use funct field to determine the ALU control
input
ALU control input ALU operation Used for
000 and and
001 or or
010 add add, lw, sw
110 subtract sub, beq
111 set on less than slt
![Page 25: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/25.jpg)
ALU operation control
Opcode used to generate a 2-bit signal called ALUOp with the following encodings00: lw or sw, perform an ALU add 01: beq, perform an ALU subtract 10: R-type, ALU operation is determined by the funct
field
Funct Instruction
ALU control input
100000 add 010
100010 sub 110
100100 and 000
100101 or 001
101010 slt 111
![Page 26: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/26.jpg)
Comparing instruction fields
Opcode, source registers, function code, and immediate fields always in same place
Destination register isbits 15-11 (rd) for R-typebits 20-16 (rt) for lwMUX to select the right one
0 rs rt functrd shamt31 26 16 15 11 10 6 5 025 2021
4 rs rt immediate (offset)31 26 16 15 025 2021
R-type
beq
35 (43) rs rt immediate (offset)31 26 16 15 025 2021
lw (sw)
![Page 27: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/27.jpg)
Datapath with instr fields and ALU control
![Page 28: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/28.jpg)
Main control unit design
![Page 29: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/29.jpg)
Main control unit design
Truth table
(4)
(0)
(34)
(43)
![Page 30: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/30.jpg)
Adding support for jump instructions
J-type format
Next PC formed by shifting left the 26-bit target two bits and combining it with the 4 high-order bits of PC+4
Now the next PC will be one ofPC+4beq target addressj target address
We need another MUX and control bit
2 target31 26 025
![Page 31: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/31.jpg)
Adding support for jump instructions
![Page 32: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/32.jpg)
Evaluation of the simple implementation
All instructions take one clock cycle (CPI = 1)
Assume the following worst case delaysInstruction memory: 4 time units Data memory: 4 time units (read), 2 time units (write)ALU: 4 time unitsAdders: 3 time unitsRegister file: 2 time units (read), 1 time unit (write)MUXes, sign extension, gates, and shifters: 1 time unit
Large disparity in worst case delays among instruction typesR-type: 4+2+1+4+1+1 = 13 time unitsbeq: 4+2+1+4+1+1+1 = 14 time unitsj: 4+1+1 = 6 time unitsstore: 4+2+4+2 = 12 time unitsload: 4+2+4+4+1+1 = 16 time units
![Page 33: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/33.jpg)
Evaluation of the simple implementation
Disparity would be worse in a real machineEven slower integer instructions (e.g., multiply/divide
in MIPS)Floating point instructions
Simple instructions take as long as complex ones
![Page 34: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/34.jpg)
A multicycle implementation
Instruction fetch, register file access, etc occur in separate clock cycles
Different instruction types take different numbers of cycles to complete
Clock cycle time should be faster
![Page 35: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/35.jpg)
High level view of datapath
New registers store results of each step Not programmer visible!
Hardware can be sharedOne ALU for PC+4, branch target calculation, EA calculation,
and arithmetic operationsOne memory for instructions and data
![Page 36: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/36.jpg)
Detailed multi-cycle datapath
![Page 37: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/37.jpg)
Multi-cycle control
![Page 38: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/38.jpg)
First two cycles for all instructions
Instruction fetch (1st cycle)Load the instruction into the IR register
IR = Memory[PC]Increment the PC
PC = PC+4
Instruction decode and register fetch (2nd cycle)Read register file locations rs and rt, results into the A
and B registersA=Reg[IR[25-21]]B=Reg[IR[20-16]]
Calculate the branch target address and load into ALUOutALUOut = PC+(sign-extend (IR[15-0]) <<2)
![Page 39: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/39.jpg)
Instruction fetch
IR=Mem[PC]
![Page 40: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/40.jpg)
Instruction fetch
PC=PC+4
![Page 41: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/41.jpg)
Instruction decode and register fetch
A=Reg[IR[25-21]], B=Reg[IR[20-16]]
![Page 42: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/42.jpg)
Instruction decode and register fetch
ALUOut = PC+(sign-extend (IR[15-0]) <<2)
![Page 43: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/43.jpg)
Additional cycles for R-type
Execution ALUOut = A op B
CompletionReg[IR[15-11]] = ALUOut
![Page 44: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/44.jpg)
R-type execution cycle
ALUOut = A op B
![Page 45: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/45.jpg)
R-type completion cycle
Reg[IR[15-11]] = ALUOut
![Page 46: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/46.jpg)
Additional cycles for store
Address computationALUOut = A + sign-extend (IR[15-0])
Memory accessMemory[ALUOut] = B
![Page 47: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/47.jpg)
Store address computation cycle
ALUOut = A + sign-extend (IR[15-0])
![Page 48: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/48.jpg)
Store memory access cycle
Memory[ALUOut] = B
![Page 49: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/49.jpg)
Additional cycles for load
Address computation ALUOut = A + sign-extend (IR[15-0])
Memory accessMDR = Memory[ALUOut]
Read completionReg[IR[20-16]] = MDR
![Page 50: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/50.jpg)
Load memory access cycle
MDR = Memory[ALUOut]
![Page 51: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/51.jpg)
Load read completion cycle
Reg[IR[20-16]] = MDR
![Page 52: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/52.jpg)
Additional cycle for beq
Branch completionif (A == B) PC = ALUOut
![Page 53: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/53.jpg)
Branch completion cycle for beq
if (A == B) PC = ALUOut
![Page 54: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/54.jpg)
Additional cycle for j
Jump completionPC = PC[31-28] || (IR[25-0]<<2)
![Page 55: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/55.jpg)
Jump completion cycle for j
PC = PC[31-28] || (IR[25-0]<<2)
![Page 56: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/56.jpg)
Control logic design
Implemented as a Finite State Machine
Inputs: 6 opcode bitsOutputs: 16 control signalsState: 4 bits for 10 states
![Page 57: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/57.jpg)
High-level view of FSM
![Page 58: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/58.jpg)
Instruction fetch cycle
![Page 59: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/59.jpg)
Instruction decode/register fetch cycle
![Page 60: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/60.jpg)
R-type execution cycle
![Page 61: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/61.jpg)
R-type completion cycle
![Page 62: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/62.jpg)
Memory address computation cycle
![Page 63: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/63.jpg)
Store memory access cycle
![Page 64: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/64.jpg)
Load memory access cycle
![Page 65: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/65.jpg)
Load read completion cycle
![Page 66: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/66.jpg)
beq branch completion cycle
![Page 67: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/67.jpg)
j jump completion cycle
![Page 68: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/68.jpg)
Complete FSM
![Page 69: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/69.jpg)
Evaluation of the multi-cycle design
CPI calculated based on the instruction mixFor gcc (Figure 4.54)
23% loads (5 cycles each)13% stores (4 cycles each)19% branches (3 cycles each)2% jumps (3 cycles each)43% ALU (4 cycles each)
CPI = 0.23*5+0.13*4+0.19*3+0.02*3+0.43*4=4.02
Cycle time is calculated from the longest delay path assuming the same timing delays as before
![Page 70: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/70.jpg)
Worst case datapath: branch target
ALUOut = PC+(sign-extend (IR[15-0]) <<2)
Delay = 7 time units (delay of simple = 16)
![Page 71: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/71.jpg)
Evaluation of the multi-cycle design
Time per instruction of simple and multi-cycleTPI(simple) = CPI(simple) x cycle time(simple) = 16TPI(multi-cycle) = 4.02 x 7 = 28.1
Simple single-cycle implementation is faster
Multicycle with pipelining will be considerably faster than single-cycle implementation
![Page 72: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/72.jpg)
Exceptions
An exception is an event that causes a deviation from the normal execution of instructions
Types of exceptions Operating system call (e.g., read a file, print a file)Input/output device requestPage fault (request for instruction/data not in memory – Ch 7)Arithmetic error (overflow, underflow, etc.)Undefined instructionMisaligned memory access (e.g., word access to odd address)Memory protection violationHardware errorPower failure
An exception is not usually due to an error!
We need to be able to restart the program at the point where the exception was detected
![Page 73: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/73.jpg)
Handling exceptions
Detect the exception
Save enough information about the exception to handle it properly
Save enough information about the program to resume it after the exception is handled
Handle the exception
Either terminate the program or resume executing it depending on the exception type
![Page 74: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/74.jpg)
Detecting exceptions
Performed by hardware
Overflow: determined from the opcode and the overflow output of the ALU
Undefined instruction: determined from The opcode in the main control unitThe function code and ALUop in the ALU control logic
![Page 75: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/75.jpg)
Detecting exceptions
overflow
undefinedinstruction
![Page 76: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/76.jpg)
Saving exception information
Performed by hardware
We need the type of exception and the PC of the instruction when the exception occurred
In MIPS, the Cause register holds the exception typeNeed an encoding for each exception typeNeed a signal from the control unit to load it into the
Cause register
and the Exception Program Counter (EPC) register holds the PCNeed to subtract 4 from the PC register to get the
correct PC (since we loaded PC+4 into the PC register during the Instruction Fetch cycle)
Need a signal from the control unit to load it into EPC
![Page 77: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/77.jpg)
Saving exception information
![Page 78: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/78.jpg)
Saving program information
Needed in order to restart the program from the point where the exception occurred
Performed by hardware and software
EPC register holds the PC of the instruction that had the exception (where we will restart the program)
The software routine that handles the exception saves any registers that it will need to the stack and restores them when it is done
![Page 79: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/79.jpg)
Handling the exception
Performed by hardware and software
Need to transfer control to a software routine to handle the exception (exception handler)
The exception handler runs in a privileged mode that allows it to use special instructions and access all of memoryOur programs run in user mode
The hardware enables the privileged mode, loads PC with the address of the exception handler, and transitions to the Fetch state
![Page 80: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/80.jpg)
Handling the exception Loading the PC with exception handler
address
![Page 81: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/81.jpg)
Exception handler
Stores the values of the registers that it will need to the stack
Handles the particular exceptionOperating system call: calls the subroutine associated with the
callUnderflow: sets register to zero or uses denormalized numbers I/O: handles the particular I/O request, e.g., keyboard input
Restores registers from the stack (if program is to be restarted)
Terminates the program, or resumes execution by loading the PC with EPC and transitioning to the Instruction Fetch state
![Page 82: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/82.jpg)
FSM modifications
![Page 83: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/83.jpg)
The Intel Pentium processor
Introduced in 1993
Uses a multi-cycle datapath with the following steps for integer instructionsPrefetch (PF): read instruction from the instruction
memoryDecode 1 (D1): first stage of instruction decodeDecode 2 (D2): second stage of instruction decodeExecute (E): perform the ALU operationWrite back (WB): write the result to the register file
Datapath usage varies by instruction typeSimple instructions make one pass through the
datapath using state machine controlComplex instructions make multiple passes, reusing
the same hardware elements under microcode control
![Page 84: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/84.jpg)
The Intel Pentium processor
The Pentium is a 2-way superscalar design as two instructions can simultaneously execute
Ideal CPI for a 2-way superscalar is 0.5
Conditions for superscalar executionBoth must be simple instructionsThe result of the first instruction cannot be needed by the
secondBoth instructions cannot write the same registerThe first instruction in program sequence cannot be a
jump
PF D1
D2 E WB
D2 E WB U pipe
V pipe
![Page 85: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/85.jpg)
The Intel Pentium Pro processor
Introduced in 1995 as the successor to the Pentium
The basis for the Pentium II and Pentium III
Implements a 14-cycle, 3-way superscalar integer datapathVery high frequency is the goal
Uses out-of-order execution in that instructions may execute out of their original program orderCompletely handled by hardware transparently to
softwareInstructions execute as soon as their source operands
become availableComplicates exception handling
Some instructions before the excepting one may not have executed, while some after it may have executed
![Page 86: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/86.jpg)
The Intel Pentium Pro processor
Pentium Pro designers (and AMD designers before them) used innovative engineering to overcome the disadvantages of CISC ISAsMany complex X86 instructions are internally
translated by hardware into RISC-like micro-ops with state machine control
Achieves a very low CPI for simple integer operations even on programs compiled for older implementations
Combination of high frequency and low CPI gave the Pentium Pro extremely competitive integer performance versus RISC microprocessorsResult has been that RISC CPUs have failed to gain the
desktop market share that had been expected
![Page 87: The Processor: Datapath and Control](https://reader036.fdocuments.in/reader036/viewer/2022081416/56813982550346895da115de/html5/thumbnails/87.jpg)
The Intel Pentium 4 processor
20 cycle superscalar integer pipeline
Extremely high frequency (>3GHz)
Major effort to lower power dissipationClock gating: clock to a unit is turned off when the unit
is not in useTrace cache: caches micro-ops of previously decoded
complex instructions to avoid power-consuming decode operation