Computer Architecture Pipelines & Superscalars. Pipelines Data Hazards Code: lw $4, 0($1) add $15,...
-
Upload
yahir-spurlock -
Category
Documents
-
view
215 -
download
0
Transcript of Computer Architecture Pipelines & Superscalars. Pipelines Data Hazards Code: lw $4, 0($1) add $15,...
Pipelines
• Data Hazards• Code:
lw $4, 0($1)add $15, $1, $1sub $2, $1, $3and $12, $2, $5or $13, $6, $2add $14, $2, $2sw $15,100($2)
The last four instructions all depend on a result
produced by the first!
MIPS instructionshave the format
op dest, srca, srcb
Pipelines - Data Hazards
• Second compiler solution• Reorder
lw $4, 0($1)add $15, $1, $1sub $2, $1, $3and $12, $2, $5or $13, $6, $2add $14, $2, $2sw $15,100($2)
sub $2, $1, $3lw $4, 0($1)add $15, $1, $1and $12, $2, $5or $13, $6, $2add $14, $2, $2sw $15,100($2)
These two must not define$1 or $3!
ReadWritten
Pipelines - Data Hazards
• Second compiler solution• Reorder
sub $2, $1, $3lw $4, 0($1)add $15, $1, $1and $12, $2, $5or $13, $6, $2add $14, $2, $2sw $15,100($2)
ReadWritten
First use of $2
Pipelines - Data Hazards
• Compiler analyses dependencies• Register
definitions
• Registeruse
• Read After Write(RAW)dependency
• No dependencies
• Instruction can be moved!
sub $2, $1, $3lw $4, 0($1)add $15, $1, $1and $12, $2, $5or $13, $6, $2add $14, $2, $2sw $15,100($2)
Written
Usesof $2
Pipelines - Data Hazards
• Hardware solution• Value forwarding
• Hardware detectsdependency
• scoreboard• Forwards result
from WB to EXfor subsequentuse
• Hardware• Transparent to software!
Data Hazards - classification
• Read after Write (RAW)• Instruction 1 must write
before instruction 2 reads
• Write after Write (WAW)• Instructions 1 and 2 both write
Instruction 2 must write after 1
• Write after Read (WAR)• Instruction 1 reads
Instruction 2 writes (overwrites)• Instruction 2 must not write before 1 reads
Reordering algorithms must consider all three!
Lecture 5 - Key Points
• Data Hazards• RAW - most common• WAW• WAR
• Compiler looks for dependencies• then re-orders
• Hardware• Scoreboard
• Monitors dependencies• ensures correct operation
• Value forwarding hardware• Forwards results from EX stage
Pipelines - Exceptions
• Caused by overflow, underflow• Example
add $1, $2, $1• Overflow detected in EX stage• Causes jump to exception handler
• as branch - remainder of pipeline flushed
but• Compiler needs original $1 causing overflow
Register must not be overwritten • EX stage needs to squash WB operation
• Precise Exception problem - more later!
Pipelines - Depth
• Pipeline can’t be too deep• Hazards are frequent
many stalls in deep pipelines
0.5
1.0
1.5
2.0
2.5
1 2 4 8 16
Rel
ativ
eP
erfo
rman
ce
Pipeline Depth
TooDeep!
Pipelines - Depth
• Pipeline can’t be too deep• Hazards are frequent
many stalls in deep pipelines
0.5
1.0
1.5
2.0
2.5
1 2 4 8 16
Rel
ativ
eP
erfo
rman
ce
Pipeline Depth
TooDeep!
Superpipelined
CISC and pipelines
• High Speed CISC processors are pipelined• Overlap IF, EX
• Variable• instruction length• running time (number of microcode cycles)pipeline imbalance“backup” in pipe stagescomplicate hazard detection
• Complex addressing modesauto-increment updates address registermultiple memory accesses required
smooth pipeline flow more difficult!
Instruction Queues
• Vital performance determinant• Rate of instruction fetch
• High Performance processors• Fetch multiple instructions in each cycle
• 2 - 4 common• Use wide datapath to memory
• PowerPC 604 128 bits = 4 instructions• Despatch unit
• Examine dependencies• Determine which instructions can be
despatched
Instruction Queues
• Q “matches” fetch/despatch rates• General Strategy for matching
Producers - Consumers• Use of FIFO-style Queues• Absorb
AsynchronousDelivery / ConsumptionRates
• ProvidesElasticityin pipelines
Producer
FIFO
Consumer
DifferingInstantaneous
Rates
PowerPC organisation
PowerPC 601~1993
Boundary of theSi die
New - Look in the “Example Processors” sectionof the Web notes
3-way SuperScalar• Integer• Branch• Floating Point
A newer machine will have more functional units here!
Superscalar Processors
• Multiple Functional Units• PowerPC 604
6-way superscalar
• Despatch Unit • Sends “ready” instructions to all free units• PowerPC 604:
• potential 4 instructions/cycle (pipeline lengths are different!)
• reality: 2-3 instructions/cycle?(program dependent!)
Branch UnitLoadStore Unit3 Integer UnitsFloating Point Unit
Superscalar Processors
• Mix of functional units• Up to 8-way superscalar common now
• 2 Floating point units• Usually have ~3 cycle latency
• 3 Integer Arithmetic• Branch unit• Load / store unit• + ….?
• Marketing departments can play some games with the ‘n’ of a n-way superscalar!
Superscalar – Maximum throughput
• Instruction Issue Unit is the key!• If IIU only issues 4 instructions per cycle,• An n-way superscalar (n >> 4) can still only
complete 4 instructions / cycle!• IIU has many tasks
• Pre-fetch instructions• At least one cache line!
• Check dependencies• Has data required by this instruction been computed
yet?• Keeps register ‘scoreboard’
• Mark registers which will be written by instructions already issued
• It’s a small dataflow machine (see later!)
• Check availability of functional units