Accelerating Performance
description
Transcript of Accelerating Performance
© 2006
Department of Computing Science
CMPUT 229
CISC RISC
CISC: Complex Instruction Set Architecture
– Complex decoders
– Lots of Circuitry
– Some Complex instructions may never be used
RISC: Reduced Instruction Set Architecture
– Better use of silicon real state.
Regular
© 2006
Department of Computing Science
CMPUT 229 Clements, pp. 328
Instruction Usage
Fairclough* divided instructions into eight groups:
– Data movement
– Program modification (branch, call, return)
– Arithmetic
– Compare
– Logical
– Shift
– Bit manipulation
– Input/output and miscellaneous
* Fairclough, D. A., “A Unique Microprcessor Instruction Set,” IEEE Micro, May, 1982, pp. 8-18.
© 2006
Department of Computing Science
CMPUT 229
Constants, parameters, and local storage
Tanenbaum* reported that:
• 56% of all constant values are in the -15 to +15 range
• 98% of all constant values are in the -511 to +511 range
• Thus a 5-bit immediate field covers more than half of the literals
Other researchers showed that
• 95% of subroutines require 12 words or less for parameter passing and local storage
• Thus providing this space in the processor reduces processor-memory bus traffic.
* Tanenbaum, Andrew S., “Implications of Structured Programming for Machine Architecture,” Communications of the ACM, Vol. 21, N. 3, March 1978, pp. 237-246
Clements, pp. 329
© 2006
Department of Computing Science
CMPUT 229
RISC Characteristics
Enough registers to reduce memory traffic
Instructions operate on three registers
Efficient parameter passing and branching
Don’t implement infrequent (complex) instructions
Aim to execute one instruction per cycle
Fix instruction length
Clements, pp. 329
© 2006
Department of Computing Science
CMPUT 229
Register Windows
A window is a set of registers visible to the current subroutine
A Window Pointer (WP) register indicate the current active
window
In the Berkeley RISC each window has 32 registers.
A call to a subroutine in the Berkeley RISC used the intruct.:
CALLR Rd, address
The current value of the PC is written into the register Rd of
the new window.
Clements, pp. 330
© 2006
Department of Computing Science
CMPUT 229
Berkeley RISC Register Window
Register Name Register Type
R0 to R9Global registers common to all windows
R10 to R15Used to receive parameters from parent and to pass parameters back to parent
R16 to R25Accessed exclusively by the current subroutine
R26 to R31Used to pass parameters to and from its own child
Clements, pp. 332
© 2006
Department of Computing Science
CMPUT 229
Instruction Overlapping in a RISC Pipeline
Clements, pp. 336
© 2006
Department of Computing Science
CMPUT 229
Instruction Overlapping in a RISC Pipeline
Clements, pp. 336
© 2006
Department of Computing Science
CMPUT 229
Pipeline Hazards
Cause a stall in the pipeline
Branch instructions
• We don’t know which instruction to execute next
Data Dependences
• We don’t know what is the value of an operand
© 2006
Department of Computing Science
CMPUT 229
Data Dependency
ADD R1, R2, R3 [R1] [R2] + [R3]
ADD R5, R2, R4 [R5] [R2] + [R4]
ADD R6, R7, R5 [R6] [R7] + [R5]
ADD R2, R2, R4 [R2] [R2] + [R4]
Clements, pp. 338
© 2006
Department of Computing Science
CMPUT 229
Data Dependency
ADD R1, R2, R3 [R1] [R2] + [R3]
ADD R5, R2, R4 [R5] [R2] + [R4]
ADD R6, R7, R5 [R6] [R7] + [R5]
ADD R2, R2, R4 [R2] [R3] + [R4]
Clements, pp. 338
© 2006
Department of Computing Science
CMPUT 229
Bubble Because of Data Dependency
Clements, pp. 338
© 2006
Department of Computing Science
CMPUT 229
A Probabilistic Model for Branch Penalty
Assumptions:
• Non-branch instructions execute in one cycle
• pb: probability that an instruction is a branch
• pt: probability that a branch instruction is taken
• b: additional cycles required if the branch is taken
• There is no penalty if a branch is not taken
• Tave: average time to execute an instruction
Clements, pp. 339
Tave = (1 - pb)NonBranchTime + pbBranchTime
BranchTime = ptTimeTaken + (1-pt)TimeNotTaken
= pt(1+b) + (1-pt)1
= pt+ptb + 1 - pt = ptb + 1
Tave = (1 - pb)1 + pb(ptb + 1)
Tave = 1 - pb + pbptb + pb
Tave = 1 + pbptb
© 2006
Department of Computing Science
CMPUT 229
Branch Prediction
Idea: Guess which way a branch will go and start
fetching instructions from the right place.
pb: probability instruction is a branch
pt: probability taken
pt: probability prediction is correct
a,b,c,d: penalties in each case
© 2006
Department of Computing Science
CMPUT 229
Average Branch Penalty The average branch penalty is given by
Cave = a.(pt.pc) +
© 2006
Department of Computing Science
CMPUT 229
Average Branch Penalty The average branch penalty is given by
Cave = a.(pt.pc) + b.(1-pt).(1-pc)
© 2006
Department of Computing Science
CMPUT 229
Average Branch Penalty The average branch penalty is given by
Cave = a.(pt.pc) + b.(1-pt).(1-pc) + c.pt.(1-pc)
© 2006
Department of Computing Science
CMPUT 229
Average Branch Penalty The average branch penalty is given by
Cave = a.(pt.pc) + b.(1-pt).(1-pc) + c.(1-pt).(1-pc) + d.(1-pt).pc
© 2006
Department of Computing Science
CMPUT 229
Approaches to Branch Prediction
Static Branch Prediction:
– A given branch is predicted to be either always taken or never taken
Dynamic Branch Prediction:
– Use the past behavior of the program to predict a branch
– Processor maintain a branch prediction table
• Single bit predictors ==> accuracy of 80%
• Five bit predictors ==> accuracy of 98%