Computer Architecture and Engineering Lecture 7 Pipelining Ics152/fa05/lecnotes/lec4-1.pdf ·...
Transcript of Computer Architecture and Engineering Lecture 7 Pipelining Ics152/fa05/lecnotes/lec4-1.pdf ·...
UC Regents Fall 2005 © UCBCS 152 L7: Pipelining I
2005-9-20John Lazzaro
(www.cs.berkeley.edu/~lazzaro)
CS 152 Computer Architecture and Engineering
Lecture 7 – Pipelining I
www-inst.eecs.berkeley.edu/~cs152/
TAs: David Marquardt and Udam Saini
UC Regents Fall 2005 © UCBCS 152 L7: Pipelining I
Office Hours Change
David: W 3-4, Th 3-4, 125 CoryUdam: W 3-5 125 Cory, Tu 10-12 345 SodaJohn: Mon 9:30-10:30 AM, 315 Soda
UC Regents Fall 2005 © UCBCS 152 L7: Pipelining I
Last Time: Performance Equation
SecondsProgram
InstructionsProgram= Seconds
Cycle InstructionCycles
Goal is to optimize execution time, notindividualequation
terms.
The CPI of the
program.Reflects
the program’s instruction
mix.
Machinesare
optimizedwith
respect toprogram
workloads.
Clockperiod.
Optimizejointlywith
machineCPI.
UC Regents Fall 2005 © UCBCS 152 L7: Pipelining I
Today: Introduction to Pipelining
How to apply the performance equation to our single-cycle CPU.
Why pipelining is hard: data hazards,control hazards, structural hazards.
Pipelining: an idea from assemblyline production applied to CPU design
Also: Introduction to Lab 3
UC Regents Fall 2005 © UCBCS 152 L7: Pipelining I
Note: Reading is Fundamental ...
The lectures are a gentle introduction, to prepare you to read the book ...
The book presentation of pipelined processors is sufficient to do Lab 3.
These lectures are not.
UC Regents Fall 2005 © UCBCS 152 L7: Pipelining I
Recall: Our single-cycle processor
rd1
RegFile
rd2
WEwd
rs1
rs2
ws
D
PC
Q
+
0x4
Dout
Data Memory
WE
Din
Addr
MemToReg
Addr Data
Instr
Mem32A
L
U
32
32
op
Ext
SecondsProgram
InstructionsProgram
= SecondsCycle Instruction
Cycles
CPI == 1This is good.
Slow.This is bad.
Challenge: Speed up clock while keeping CPI == 1
UC Regents Fall 2005 © UCBCS 152 L7: Pipelining I
Recall: An R-format CPU design
32rd1
RegFile
32rd2
WE32wd
5rs1
5rs2
5ws
32ALU
32
32
op
opcode rs rt rd functshamt
Decode fields to get : ADD $8 $9 $10
Logic
UC Regents Fall 2005 © UCBCS 152 L7: Pipelining I
Reminder: How data flows after posedge
32rd1
RegFile
32rd2
WE32wd
5rs1
5rs2
5ws
32ALU
32
32
op
Logic
Addr Data
InstrMem
D
PC
Q+
0x4
UC Regents Fall 2005 © UCBCS 152 L7: Pipelining I
Next posedge: Update state and repeat
32rd1
RegFile
32rd2
WE32wd
5rs1
5rs2
5ws
D
PC
Q
UC Regents Fall 2005 © UCBCS 152 L7: Pipelining I
Observation: Logic idle most of cycle
rd1
RegFile
rd2
WEwd
rs1
rs2
ws
D
PC
Q
+
0x4
Dout
Data Memory
WE
Din
Addr
MemToReg
Addr Data
Instr
Mem32A
L
U
32
32
op
Ext
For most of cycle, ALU is either “waiting” for its inputs, or “holding” its output
Ideal: a CPU architecture where each part is always “working”.
UC Regents Fall 2005 © UCBCS 152 L7: Pipelining I
Inspiration: Automobile assembly lineAssembly line moves on a steady clock.
Each station does the same task on each car.Car
body shell
Car chassis
Mergestation
Boltingstation
The clock
UC Regents Fall 2005 © UCBCS 152 L7: Pipelining I
Inspiration: Automobile assembly lineSimpler station tasks → more cars per hour.Simple tasks take less time, clock is faster.
UC Regents Fall 2005 © UCBCS 152 L7: Pipelining I
Inspiration: Automobile assembly lineLine speed limited by slowest task.
Most efficient if all tasks take same time to do
UC Regents Fall 2005 © UCBCS 152 L7: Pipelining I
Inspiration: Automobile assembly lineSimpler tasks, complex car → long line!
These lines go 24 x 7, and rarely shut down.
Why?
UC Regents Fall 2005 © UCBCS 152 L7: Pipelining I
Lessons from car assembly lines
Faster line movement yields more cars per hour off the line.
Faster line movement requires more stages, each doing simpler tasks.
To maximize efficiency, all stages should take same amount of time(if not, workers in fast stages are idle)
“Filling”, “flushing”, and “stalling” assembly line are all bad news.
UC Regents Fall 2005 © UCBCS 152 L7: Pipelining I
Key Analogy: The instruction is the car
D
PC
Q
+
0x4
Addr Data
Instr
Mem
IR IR IR
Instruction Fetch
IR
Pipeline Stage #1 Stage #2
Controlshardware
in stage 2
Stage #3
Controlshardware
in stage 3
Stage #4
Controlshardware
in stage 4
Stage #5
Controlshardware
in stage 5
“Data-stationary control”
UC Regents Fall 2005 © UCBCS 152 L7: Pipelining I
Example: Decode & Register Fetch Stage
D
PC
Q
+
0x4
Addr Data
Instr
Mem
IR
Instr Fetch
Pipeline Stage #1
rd1
RegFile
rd2
WEwd
rs1
rs2
ws
Ext
IR
B
A
M
Stage #2
Decode & Reg Fetch
IR
Stage #3
ADD R4,R3,R2OR R7,R6,R5SUB R10, R9,R8
ADD R4,R3,R2OR R7,R6,R5SUB R10,R9,R8
A sample program
R’s chosen so that instructions are
independent - like cars on the line.
UC Regents Fall 2005 © UCBCS 152 L7: Pipelining I
Decode & Reg Fetch
Performance Equation and Pipelining
rd1
RegFile
rd2
WEwd
rs1
rs2
ws
D
PC
Q
+
0x4
Addr Data
Instr
Mem
Ext
IR IR IR
B
A
M
Instr Fetch Stage #3
SecondsProgram
InstructionsProgram= Seconds
Cycle InstructionCycles
To get shortest clock period,
balance the work to do in each
pipeline stage.
CPI == 1Once pipe is fill,one instructioncompletes per
cycle
Clock period is shorter
Less work to do in each cycle
UC Regents Fall 2005 © UCBCS 152 L7: Pipelining I
Hazards: An instruction is not a car ...
rd1
RegFile
rd2
WEwd
rs1
rs2
ws
D
PC
Q
+
0x4
Addr Data
Instr
Mem
Ext
IR IR IR
B
A
M
Instr Fetch
Stage #1 Stage #2 Stage #3
Decode & Reg Fetch
ADD R4,R3,R2OR R5,R4,R2
An example of a “hazard” -- we must
(1) detect and (2) resolve all hazards
to make a CPU that matches ISA
R4 not written yet ...... wrong value of R4 fetched from RegFile, contract with programmer broken! Oops! ADD R4,R3,R2
OR R5,R4,R2
New sample program
UC Regents Fall 2005 © UCBCS 152 L7: Pipelining I
Decode & Reg Fetch
Performance Equation and Hazards
rd1
RegFile
rd2
WEwd
rs1
rs2
ws
D
PC
Q
+
0x4
Addr Data
Instr
Mem
Ext
IR IR IR
B
A
M
Instr Fetch Stage #3
SecondsProgram
InstructionsProgram= Seconds
Cycle InstructionCycles
“Software slows the machine
down”Seymour Cray
Some ways to cope with hazards
makes CPI > 1“stalling pipeline”
Added logic to detect and resolve hazards increases
clock period
UC Regents Fall 2005 © UCBCS 152 L7: Pipelining I
A (simplified) 5-stage pipelined CPU
rd1
RegFile
rd2
WEwd
rs1
rs2
ws
D
PC
Q
+
0x4
Addr Data
Instr
Mem
Ext
IR IR
B
A
M
Instr Fetch
“IF” Stage “ID/RF” Stage
Decode & Reg Fetch
1 2
“EX” StageExecution
32A
L
U
32
32
op
IR
Y
M
3
IR
Dout
Data Memory
WE
Din
Addr
MemToReg
R
“MEM” StageMemory
WE, MemToReg
4WB5
WriteBack
Mux,Logic
Welcome to Lab 3!
UC Regents Fall 2005 © UCBCS 152 L7: Pipelining I
Administrivia: Upcoming deadlines ...
Thursday 9/29: At 11:59 PM via email:Lab 2 peer evaluations, and Lab 3 preliminary design document due.
Monday 9/26: Lab 2 final report due via the submit program, 11:59 PM.
Friday 9/23: Lab 2 “Xilinx Checkoff”, in section. For non-150 students, “150 Lab Lecture 4”, 2-3 PM, 125 Cory.
Lab 3 now available on the web site
UC Regents Fall 2005 © UCBCS 152 L7: Pipelining I
Starting 9/29: Homework, Midterm, LabHW graded on effort
Midterm two weeks from today, in evening, no class that day.
Thursday review session.Will cover format, material, and ground rules for test.
Lab 3 design doc, checkoffs, later in week ...
UC Regents Fall 2005 © UCBCS 152 L7: Pipelining I
Lab 3 Introduction
“Pipelining Your Processor”
UC Regents Fall 2005 © UCBCS 152 L7: Pipelining I
Week 1 for Lab 3: Pipelining Processors
UC Regents Fall 2005 © UCBCS 152 L7: Pipelining I
Week 1 for Lab 3: Pipelining Processors
UC Regents Fall 2005 © UCBCS 152 L7: Pipelining I
Week 2: Hazard-Free Code on the Board
ADD R4,R3,R2OR R7,R6,R5SUB R10,R9,R8
A sample program
R’s chosen so that instructions are
independent - like cars on the line.
UC Regents Fall 2005 © UCBCS 152 L7: Pipelining I
Week 3: Run TA’s “Hard Tests” on Xilinx
An example of a “hazard” -- we must
(1) detect and (2) resolve all hazards
to make a CPU that matches ISA
ADD R4,R3,R2OR R5,R4,R2
New sample program
UC Regents Fall 2005 © UCBCS 152 L7: Pipelining I
Next 2 Lectures: Pipelining details ...
Control, Hazards,Forwarding