Electrical and Computer Engineering University of Cyprus 22-10-2014 LAB3: IMPROVING MIPS PERFORMANCE...
-
Upload
philip-mcdonald -
Category
Documents
-
view
215 -
download
0
Transcript of Electrical and Computer Engineering University of Cyprus 22-10-2014 LAB3: IMPROVING MIPS PERFORMANCE...
![Page 1: Electrical and Computer Engineering University of Cyprus 22-10-2014 LAB3: IMPROVING MIPS PERFORMANCE WITH PIPELINING.](https://reader030.fdocuments.in/reader030/viewer/2022012922/56649f4d5503460f94c6e4dc/html5/thumbnails/1.jpg)
Electrical and Computer EngineeringUniversity of Cyprus
22-10-2014
LAB3: IMPROVING MIPS
PERFORMANCE WITH PIPELINING
![Page 2: Electrical and Computer Engineering University of Cyprus 22-10-2014 LAB3: IMPROVING MIPS PERFORMANCE WITH PIPELINING.](https://reader030.fdocuments.in/reader030/viewer/2022012922/56649f4d5503460f94c6e4dc/html5/thumbnails/2.jpg)
Previous Labs: MIPS single-cycle with Memory System
2
![Page 3: Electrical and Computer Engineering University of Cyprus 22-10-2014 LAB3: IMPROVING MIPS PERFORMANCE WITH PIPELINING.](https://reader030.fdocuments.in/reader030/viewer/2022012922/56649f4d5503460f94c6e4dc/html5/thumbnails/3.jpg)
In this Lab: Pipelining
You are expected to: Understand the concept Be familiar with the 5 MIPS pipeline stages Understand the three pipeline hazards and
their solutions.
3
![Page 4: Electrical and Computer Engineering University of Cyprus 22-10-2014 LAB3: IMPROVING MIPS PERFORMANCE WITH PIPELINING.](https://reader030.fdocuments.in/reader030/viewer/2022012922/56649f4d5503460f94c6e4dc/html5/thumbnails/4.jpg)
PIPELINING
Technique in which the execution of several instructions is overlapped.
Each instruction is broken into several stages. Stages can operate concurrently PROVIDED
WE HAVE SEPARATE RESOURCES FOR EACH STAGE! => each step uses a different functional unit.
Note: execution time for a single instruction is NOT improved. Throughput of several instructions is improved.
4
![Page 5: Electrical and Computer Engineering University of Cyprus 22-10-2014 LAB3: IMPROVING MIPS PERFORMANCE WITH PIPELINING.](https://reader030.fdocuments.in/reader030/viewer/2022012922/56649f4d5503460f94c6e4dc/html5/thumbnails/5.jpg)
Major Pipeline Benefit = Performance
Ideal Performance Time/instruction = non-piped-time/#stages This is an asymptote of course, but +10% is
commonly achievedTwo ways to view the performance mechanism Reduced CPI (i.e. non-piped to piped change)
Close to 1 instruction/cycle if you’re lucky Reduced cycle-time (i.e. increasing pipeline depth)
Work split into more stages Simpler stages result in faster clock cycles
5
![Page 6: Electrical and Computer Engineering University of Cyprus 22-10-2014 LAB3: IMPROVING MIPS PERFORMANCE WITH PIPELINING.](https://reader030.fdocuments.in/reader030/viewer/2022012922/56649f4d5503460f94c6e4dc/html5/thumbnails/6.jpg)
Other Pipeline Benefits
Completely hardware mechanism All modern machines are pipelined
This was the key technique to advancing performance in the 80’s
In the 90’s the move was to multiple pipelines
Beware, no benefit is totally free/good
6
![Page 7: Electrical and Computer Engineering University of Cyprus 22-10-2014 LAB3: IMPROVING MIPS PERFORMANCE WITH PIPELINING.](https://reader030.fdocuments.in/reader030/viewer/2022012922/56649f4d5503460f94c6e4dc/html5/thumbnails/7.jpg)
7
Laundry analogy to pipelining
![Page 8: Electrical and Computer Engineering University of Cyprus 22-10-2014 LAB3: IMPROVING MIPS PERFORMANCE WITH PIPELINING.](https://reader030.fdocuments.in/reader030/viewer/2022012922/56649f4d5503460f94c6e4dc/html5/thumbnails/8.jpg)
Implementation of a RISC (Unpipelined, Multicycle) Implementation of an integer subset of a RISC
architecture that takes at most 5 clock cycles. Instruction Fetch (IF) Instruction Decode/Register Fetch (ID) Execution/Effective Address Calculation
(EX) Memory Access (MEM) Write-Back (WB)
8
![Page 9: Electrical and Computer Engineering University of Cyprus 22-10-2014 LAB3: IMPROVING MIPS PERFORMANCE WITH PIPELINING.](https://reader030.fdocuments.in/reader030/viewer/2022012922/56649f4d5503460f94c6e4dc/html5/thumbnails/9.jpg)
Review: Single-cycle Datapath for MIPS
DataMemory(Dmem)
PC Registers ALUInstruction
Memory(Imem)
Stage 1 Stage 2 Stage 3 Stage 4
Stage 5
IFtch Dcd Exec Mem WB
Use datapath figure to represent pipeline
AL
U IM Reg DM Reg
9
![Page 10: Electrical and Computer Engineering University of Cyprus 22-10-2014 LAB3: IMPROVING MIPS PERFORMANCE WITH PIPELINING.](https://reader030.fdocuments.in/reader030/viewer/2022012922/56649f4d5503460f94c6e4dc/html5/thumbnails/10.jpg)
Classic 5 Stage Pipeline for a RISC Processor
10
![Page 11: Electrical and Computer Engineering University of Cyprus 22-10-2014 LAB3: IMPROVING MIPS PERFORMANCE WITH PIPELINING.](https://reader030.fdocuments.in/reader030/viewer/2022012922/56649f4d5503460f94c6e4dc/html5/thumbnails/11.jpg)
Pipelined Execution Representation
To simplify pipeline, every instruction takes same number of steps, called stages
One clock cycle per stage
IFtch Dcd Exec Mem WB
IFtch Dcd Exec Mem WB
IFtch Dcd Exec Mem WB
IFtch Dcd Exec Mem WB
IFtch Dcd Exec Mem WB
Program Flow
Time
11
![Page 12: Electrical and Computer Engineering University of Cyprus 22-10-2014 LAB3: IMPROVING MIPS PERFORMANCE WITH PIPELINING.](https://reader030.fdocuments.in/reader030/viewer/2022012922/56649f4d5503460f94c6e4dc/html5/thumbnails/12.jpg)
Important Pipeline Characteristics LatencyLatency
Time it takes an instruction to go through the pipe
Latency = # stages * stage-delay Dominant feature if there are a lot of
exceptions… ThroughputThroughput
Determined by the rate at which instructions can start/finish
Dominant feature if there are few exceptions
12
![Page 13: Electrical and Computer Engineering University of Cyprus 22-10-2014 LAB3: IMPROVING MIPS PERFORMANCE WITH PIPELINING.](https://reader030.fdocuments.in/reader030/viewer/2022012922/56649f4d5503460f94c6e4dc/html5/thumbnails/13.jpg)
Pipelining Lessons Pipelining doesn’t help latency (execution time) of
single task, it helps throughput of entire workload Multiple tasks operating simultaneously using
different resources Potential speedup = Number of pipe stages Time to “fill” pipeline and time to “drain” it reduces
speedup
Pipeline rate limited by slowest pipeline stage Unbalanced lengths of pipe stages also reduces
speedup
13
![Page 14: Electrical and Computer Engineering University of Cyprus 22-10-2014 LAB3: IMPROVING MIPS PERFORMANCE WITH PIPELINING.](https://reader030.fdocuments.in/reader030/viewer/2022012922/56649f4d5503460f94c6e4dc/html5/thumbnails/14.jpg)
Single Cycle Datapath
Regs
ReadReg1
Readdata1
ALURead
data2
ReadReg2
WriteReg
WriteData
Zero
ALU-con
RegWrite
Address
Readdata
WriteData
SignExtend
Dmem
MemRead
MemWrite
Mux
MemTo-Reg
Mux
Read Addr
Instruc-tion
Imem
4
PC
add
add <<
2
Mux
PCSrc
ALUOp
ALU-src
Mux
25:21
20:16
15:11
RegDst
15:0
31:0
14
![Page 15: Electrical and Computer Engineering University of Cyprus 22-10-2014 LAB3: IMPROVING MIPS PERFORMANCE WITH PIPELINING.](https://reader030.fdocuments.in/reader030/viewer/2022012922/56649f4d5503460f94c6e4dc/html5/thumbnails/15.jpg)
Required Changes to Datapath Introduce registers to separate 5 stages by
putting IF/ID, ID/EX, EX/MEM, and MEM/WB registers in the datapath.
Next PC value is computed in the 3rd step, but we need to bring in next instruction in the next cycle.
Branch address is computed in 3rd stage. With pipeline, the PC value has changed! Must carry the PC value along with instruction.
15
![Page 16: Electrical and Computer Engineering University of Cyprus 22-10-2014 LAB3: IMPROVING MIPS PERFORMANCE WITH PIPELINING.](https://reader030.fdocuments.in/reader030/viewer/2022012922/56649f4d5503460f94c6e4dc/html5/thumbnails/16.jpg)
Changes to Datapath (cont.) For lw instruction, we need write register
address at stage 5. But the IR is now occupied by another instruction! So, we must carry the IR destination field as we move along the stages. See connection in fig.
16
![Page 17: Electrical and Computer Engineering University of Cyprus 22-10-2014 LAB3: IMPROVING MIPS PERFORMANCE WITH PIPELINING.](https://reader030.fdocuments.in/reader030/viewer/2022012922/56649f4d5503460f94c6e4dc/html5/thumbnails/17.jpg)
Pipelined Datapath (with Pipeline Regs)
Address
4
32
0
Add Addresult
Shiftleft 2
Ins
tru
ctio
n
Mux
0
1
Add
PC
0
Address
Writedata
Mux
1
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALU
Zero
Imem
Dmem
Regs
IF/ID ID/EX EX/MEM MEM/WB
64 bits 133 bits 102 bits 69 bits
5
Fetch Decode Execute Memory Write Back
17
![Page 18: Electrical and Computer Engineering University of Cyprus 22-10-2014 LAB3: IMPROVING MIPS PERFORMANCE WITH PIPELINING.](https://reader030.fdocuments.in/reader030/viewer/2022012922/56649f4d5503460f94c6e4dc/html5/thumbnails/18.jpg)
Pipelined Datapath (with Control Signals)
MemtoReg5
RD1
RD2
RN1
RN2
WN
WD
RegisterFile
ALU
EXTND
16 32
RD
WD
DataMemory
ADDR
32
<<2
RD
InstructionMemory
ADDR
PC
4
ADD
ADD
5
5
5
IF/IDID/EX
EX/MEM MEM/WB
Zero
0
1
MemRead
ALUSrc
MemWrite
ALUControl6
ALUOp0
1
RegDst5
rs
rt
rt
rd
RegWrite
immed
Branch
0
1
PCSrc PCSrc
0
1
18
![Page 19: Electrical and Computer Engineering University of Cyprus 22-10-2014 LAB3: IMPROVING MIPS PERFORMANCE WITH PIPELINING.](https://reader030.fdocuments.in/reader030/viewer/2022012922/56649f4d5503460f94c6e4dc/html5/thumbnails/19.jpg)
Control for Pipelined Datapath
EX
M
WB
Control
IF / ID ID / EX EX / MEM MEM / WB
M
WB
WB
RegDstALUOp[1:0]ALUSrc
MemReadMemWriteBranch
RegWriteMemtoReg
Basic approach: build on single-cycle control Place control unit in ID stage Pass control signals to following stages
19
![Page 20: Electrical and Computer Engineering University of Cyprus 22-10-2014 LAB3: IMPROVING MIPS PERFORMANCE WITH PIPELINING.](https://reader030.fdocuments.in/reader030/viewer/2022012922/56649f4d5503460f94c6e4dc/html5/thumbnails/20.jpg)
Control for Pipelined Datapath
Execution/Address Calculation stage control lines
Memory access stage control lines
Write-back stage control lines
Instruction Reg DstALU Op1
ALU Op0 ALU Src Branch
Mem Read
Mem Write
Reg write
Mem to Reg
R-format 1 1 0 0 0 0 0 1 0lw 0 0 0 1 0 1 0 1 1sw X 0 0 1 0 0 1 0 Xbeq X 0 1 0 1 0 0 0 X
EX
M
WB
Control
IF / ID ID / EX EX / MEM MEM / WB
M
WB
WB
20
![Page 21: Electrical and Computer Engineering University of Cyprus 22-10-2014 LAB3: IMPROVING MIPS PERFORMANCE WITH PIPELINING.](https://reader030.fdocuments.in/reader030/viewer/2022012922/56649f4d5503460f94c6e4dc/html5/thumbnails/21.jpg)
Datapath and Control Unit
W
M WE
5
RD1
RD2
RN1
RN2
WN
WD
RegisterFile
ALU
EXTND
16 32
RD
WD
DataMemory
ADDR
32
<<2
RD
InstructionMemory
ADDR
PC
4
ADD
ADD
5
5
5
IF/ID ID/EX EX/MEM MEM/WB
Zero
0
1
MemRead
ALUSrc
ALUControl6
ALUOp0
1
RegDst
5
rs
rt
rt
rd
RegWrite
immed
Branch
0
1
PCSrcRegWrite
0
1
W
MControl
21
![Page 22: Electrical and Computer Engineering University of Cyprus 22-10-2014 LAB3: IMPROVING MIPS PERFORMANCE WITH PIPELINING.](https://reader030.fdocuments.in/reader030/viewer/2022012922/56649f4d5503460f94c6e4dc/html5/thumbnails/22.jpg)
Tracking Control Signals - Cycle 1
LW
W
M WE
5
RD1
RD2
RN1
RN2
WN
WD
RegisterFile
ALU
EXTND
16 32
RD
WD
DataMemory
ADDR
32
<<2
RD
InstructionMemory
ADDR
PC
4
ADD
ADD
5
5
5
IF/ID ID/EX EX/MEM MEM/WB
Zero
0
1
MemRead
ALUSrc
ALUControl6
ALUOp0
1
RegDst
5
rs
rt
rt
rd
RegWrite
immed
Branch
0
1
PCSrcRegWrite
0
1
W
MControl
22
![Page 23: Electrical and Computer Engineering University of Cyprus 22-10-2014 LAB3: IMPROVING MIPS PERFORMANCE WITH PIPELINING.](https://reader030.fdocuments.in/reader030/viewer/2022012922/56649f4d5503460f94c6e4dc/html5/thumbnails/23.jpg)
Tracking Control Signals - Cycle 2
SW LW
W
M WE
5
RD1
RD2
RN1
RN2
WN
WD
RegisterFile
ALU
EXTND
16 32
RD
WD
DataMemory
ADDR
32
<<2
RD
InstructionMemory
ADDR
PC
4
ADD
ADD
5
5
5
IF/ID ID/EX EX/MEM MEM/WB
Zero
0
1
MemRead
ALUSrc
ALUControl6
ALUOp0
1
RegDst
5
rs
rt
rt
rd
RegWrite
immed
Branch
0
1
PCSrcRegWrite
0
1
W
MControl
23
![Page 24: Electrical and Computer Engineering University of Cyprus 22-10-2014 LAB3: IMPROVING MIPS PERFORMANCE WITH PIPELINING.](https://reader030.fdocuments.in/reader030/viewer/2022012922/56649f4d5503460f94c6e4dc/html5/thumbnails/24.jpg)
Tracking Control Signals - Cycle 3
ADD SW LW
001
1
W
M WE
5
RD1
RD2
RN1
RN2
WN
WD
RegisterFile
ALU
EXTND
16 32
RD
WD
DataMemory
ADDR
32
<<2
RD
InstructionMemory
ADDR
PC
4
ADD
ADD
5
5
5
IF/ID ID/EX EX/MEM MEM/WB
Zero
0
1
MemRead
ALUSrc
ALUControl6
ALUOp0
1
RegDst
5
rs
rt
rt
rd
RegWrite
immed
Branch
0
1
PCSrcRegWrite
0
1
W
MControl
24
![Page 25: Electrical and Computer Engineering University of Cyprus 22-10-2014 LAB3: IMPROVING MIPS PERFORMANCE WITH PIPELINING.](https://reader030.fdocuments.in/reader030/viewer/2022012922/56649f4d5503460f94c6e4dc/html5/thumbnails/25.jpg)
Tracking Control Signals - Cycle 4
SUB ADD SW LW
1
0
0
W
M WE
5
RD1
RD2
RN1
RN2
WN
WD
RegisterFile
ALU
EXTND
16 32
RD
WD
DataMemory
ADDR
32
<<2
RD
InstructionMemory
ADDR
PC
4
ADD
ADD
5
5
5
IF/ID ID/EX EX/MEM MEM/WB
Zero
0
1
MemRead
ALUSrc
ALUControl6
ALUOp0
1
RegDst
5
rs
rt
rt
rd
RegWrite
immed
Branch
0
1
PCSrcRegWrite
0
1
W
MControl
25
![Page 26: Electrical and Computer Engineering University of Cyprus 22-10-2014 LAB3: IMPROVING MIPS PERFORMANCE WITH PIPELINING.](https://reader030.fdocuments.in/reader030/viewer/2022012922/56649f4d5503460f94c6e4dc/html5/thumbnails/26.jpg)
1
1
ADD
Tracking Control Signals - Cycle 5
SUB SW LW
W
M WE
5
RD1
RD2
RN1
RN2
WN
WD
RegisterFile
ALU
EXTND
16 32
RD
WD
DataMemory
ADDR
32
<<2
RD
InstructionMemory
ADDR
PC
4
ADD
ADD
5
5
5
IF/ID ID/EX EX/MEM MEM/WB
Zero
0
1
MemRead
ALUSrc
ALUControl6
ALUOp0
1
RegDst
5
rs
rt
rt
rd
RegWrite
immed
Branch
0
1
PCSrcRegWrite
0
1
W
MControl
26
![Page 27: Electrical and Computer Engineering University of Cyprus 22-10-2014 LAB3: IMPROVING MIPS PERFORMANCE WITH PIPELINING.](https://reader030.fdocuments.in/reader030/viewer/2022012922/56649f4d5503460f94c6e4dc/html5/thumbnails/27.jpg)
Can pipelining get us into trouble? Yes: Pipeline Hazards Hazards occur because data required for
executing the current instruction may not be available. Structural hazards: attempt to use the same
resource two different ways at the same time Control hazards: attempt to make a decision
before condition is evaluated branch instructions interrupts
27
![Page 28: Electrical and Computer Engineering University of Cyprus 22-10-2014 LAB3: IMPROVING MIPS PERFORMANCE WITH PIPELINING.](https://reader030.fdocuments.in/reader030/viewer/2022012922/56649f4d5503460f94c6e4dc/html5/thumbnails/28.jpg)
Pipelining troubles (cont.) Data hazards occur when an instruction needs register contents for an arithmetic/ logical/memory instruction, before they are ready. instruction depends on result of prior instruction
still in the pipeline Can always resolve hazards by waiting:
pipeline control must detect the hazard take action (or delay action) to resolve hazards
28
![Page 29: Electrical and Computer Engineering University of Cyprus 22-10-2014 LAB3: IMPROVING MIPS PERFORMANCE WITH PIPELINING.](https://reader030.fdocuments.in/reader030/viewer/2022012922/56649f4d5503460f94c6e4dc/html5/thumbnails/29.jpg)
Hazard detection & Forwarding Units
29
![Page 30: Electrical and Computer Engineering University of Cyprus 22-10-2014 LAB3: IMPROVING MIPS PERFORMANCE WITH PIPELINING.](https://reader030.fdocuments.in/reader030/viewer/2022012922/56649f4d5503460f94c6e4dc/html5/thumbnails/30.jpg)
Forwarding: A solution to Data Hazards
30
![Page 31: Electrical and Computer Engineering University of Cyprus 22-10-2014 LAB3: IMPROVING MIPS PERFORMANCE WITH PIPELINING.](https://reader030.fdocuments.in/reader030/viewer/2022012922/56649f4d5503460f94c6e4dc/html5/thumbnails/31.jpg)
Stalls Forwarding will not always solve the problems of data hazards.
For example, suppose an add instruction follows a load word (lw), and the add involves the register that receives the memory data.
In this case, forwarding will not work.
The reason is that the data must be read from memory, and so it will not be available until the end of the MEM cycle. Thus the required data is not available for a forward, and the add instruction. if it proceeds, will process the wrong data.
A solution to this problem is the stall.
A stall halts the instruction awaiting data, while the key instruction (a lw in this case) proceeds to the end of the MEM cycle, after which the desired data is available to the add.
31
![Page 32: Electrical and Computer Engineering University of Cyprus 22-10-2014 LAB3: IMPROVING MIPS PERFORMANCE WITH PIPELINING.](https://reader030.fdocuments.in/reader030/viewer/2022012922/56649f4d5503460f94c6e4dc/html5/thumbnails/32.jpg)
Other Problems With Branches A remaining problem is what to do about instructions following a
branch. Even assuming forwarding and stalls, the branch/no branch
decision is not made until the third stage. This means that in the MIPS pipeline, two following instructions
will enter the pipe before the branch/no branch decision is made. What if:
The following instructions were for the case of “branch taken” and the branch was not taken.
The following instructions were for “branch not taken” and it was taken.
In either case, the wrong instructions are in the pipe and they must be eliminated (“flushed”).
How can this problem be prevented?
32
![Page 33: Electrical and Computer Engineering University of Cyprus 22-10-2014 LAB3: IMPROVING MIPS PERFORMANCE WITH PIPELINING.](https://reader030.fdocuments.in/reader030/viewer/2022012922/56649f4d5503460f94c6e4dc/html5/thumbnails/33.jpg)
Control Hazard Approach One approach is to always assume the branch is(or is not)
taken: Say we assume the branch is never taken. Then if the
instruction in ALU/EX is a branch, the instructions in IF and ID/RF will be those in the “not taken” program line (branch determination is made in ALU/EX). It this assumption is correct, the pipeline will continue to flow
without delay. When the branch is taken, instructions in IF and ID/RF must be
“flushed,” usually by changing the “op” code of those instructions to a “nop” and letting them proceed to the end of the pipe.
33
![Page 34: Electrical and Computer Engineering University of Cyprus 22-10-2014 LAB3: IMPROVING MIPS PERFORMANCE WITH PIPELINING.](https://reader030.fdocuments.in/reader030/viewer/2022012922/56649f4d5503460f94c6e4dc/html5/thumbnails/34.jpg)
Summary
Pipelining is a fundamental concept in computers/nature Multiple instructions in flight Limited by length of longest stage Latency vs.Throughput
Hazards gum up the works
34