CSC 4250 Computer Architectures
description
Transcript of CSC 4250 Computer Architectures
CSC 4250Computer Architectures
September 19, 2006Appendix A. Pipelining
Three Classes of Pipeline Hazards Structural Hazards: Arise from resource
conflicts when hardware cannot support the overlapped execution of all possible combinations of instructions
Data Hazards: Arise when an instruction depends on results of a previous instruction exposed by the pipeline
Control Hazards: Arise from pipelining of branches and other instructions that change PC (what is PC?)
Structural Hazards
Functional unit is not pipelined, e.g., FP divide One register write port ─ two writes in a cycle;
when can this happen? Single memory pipeline for data and instructions
─ instruction contains data memory reference
Figure A.4. Load with One Memory Port
Why Allow Structural Hazards? Reduce cost Pipelining (or duplicating) all functional units
is expensive (e.g., fully pipeline FP multiply) Processors that support both instruction and
data cache accesses every cycle require twice as much bandwidth
Data Hazards
Pipelining changes order of read/write accesses:DADD R1,R2,R3DSUB R4,R1,R5AND R6,R1,R7OR R8,R1,R9XOR R10, R1, R11
Add writes R1 in WB stage (5th cycle) Sub reads R1 in ID (3rd cycle) → data hazard Same problem for And instruction What about Or? Or reads R1 in the 5th cycle, while
Add writes R1
Fig. A.6. Use of DADD Result Causes Data
Hazard
Minimize Data Hazard Stalls by Forwarding ALU result from both EX/MEM and MEM/WB
pipeline registers always fed back to ALU inputs If forwarding hardware detects that previous
ALU operation writes the register corresponding to current source for ALU operation, then control logic selects forwarded result as input
Fig. A.7. Use Forwarding Paths to Avoid
Data Hazard
Fig. A.23. Extra Hardware for Forwarding to ALU
Forwarding
Generalized ForwardingResult forwarded from pipeline register corresponding to output of one unit to input of another unit
Forwarding FailsLoad causes delay that forwarding cannot handle
Pipeline Interlock Hardware detects a hazard and stalls pipeline until hazard is cleared
MIPSMicroprocessor without Interlocking Pipeline Stages
Fig. A.8. Forwarding of Operand Required
by Stores
Figure A.9. Load Instruction Causes Stall
Figure A.17. Implementation of MIPS Data Path
Figure A.18. Pipeline Data Path by Adding Pipeline
Registers
Control Hazard
Branch may change value of PC Branch is taken or untaken Three cycles of delay on MIPS
MIPS Branch Delay
Clock Number
Instr. # 1 2 3 4 5 6 7 8 9
Branch instr. IF ID EX ME WB
Instr. i+1 IF stall stall stall stall
Branch target IF ID EX ME WB
Branch target+1 IF ID EX ME
Branch target+2 IF ID EX
How MIPS Reduces Branch Delay Consider only BEQZ and BNEZ Move zero test into ID stage (from EX stage) Compute both PCs (taken and untaken) early Additional adder in ID stage (old: use ALU) Only one cycle stall on branches Branch on result of immediately preceding ALU
instruction causes data hazard
Figure A.24. Branch Hazard Stall Reduced
to One Cycle
Data Hazard in ALU Instr. followed by Branch
Clock Number
Instruction # 1 2 3 4 5 6 7
ALU instruction IF ID EX ME WB
Branch instruction IF ID ID EX ME WB
Example.
ADD R1,R2,R3
BEQZ R1,name
Delayed Branch
Heavily used in early RISC processors Works well with branch delay of one cycle Sequential successor is in branch delay slot.
This instruction is executed whether or not branch is taken:
Branch instruction Sequential successor Branch target if taken
Figure A.14. Schedule Branch Delay Slot