CISC 662 Graduate Computer Architecture Lecture 3 - ISAAlignment Restrictions con’t •A 32-bit...
Transcript of CISC 662 Graduate Computer Architecture Lecture 3 - ISAAlignment Restrictions con’t •A 32-bit...
CISC 662 Graduate ComputerArchitecture
Lecture 3 - ISAMichela Taufer
Powerpoint Lecture Notes from John Hennessy and David Patterson’s: ComputerArchitecture, 4th edition
----Additional teaching material from:
Jelena Mirkovic (U Del) and John Kubiatowicz (UC Berkeley)
MemoryAddressing
Alignment Restrictions• Computer systems place restrictions on
allowable addresses for some objects• Access to an object of size s bytes at byte
address A is aligned if A mod s = 0• Why do machines have alignment
restrictions?– Hardware to access memory is simpler– Program with alignment accesses run faster– A misalignment memory access will take multiple
aligned memory references
Alignment Restrictions con’t• A 32-bit processors require a 4-byte integer to reside
at a memory address that is evenly divisible by 4• Any aligned 4-byte int has its address be multiple of 4
e.g., 0x2000 or 0x2004 -> the value can be read orwritten with a single memory operation
• Any unaligned double has its address not a multipleof 4 e.g., 0x2001 -> the object may be slit across two4-byte blocks and therefore read or written with twomemory operations
Addressing Modes• Addressing mode = how architectures specify
the address of an object they will access• Addressing modes may:
– Reduce instructions counters– Add to the complexity of building a computer– Increase the average CPI
• Figure B.6 lists all the addressing modes inrecent computers
• Some examples in the next slide
Addressing Modes con’t• Register
– Add R4,R3 R4 <- R4 + R3– When a value is in a register
• Immediate – Add R4, #3 R4 <- R4 + 3– For constants
• Displacement– Add R4, 100(R1) R4 <- R4 + M[100+R1]– Accessing local variables
• Register indirect – Add R4,(R1) R4 <- R4 + M[R1]– Accessing using a pointer or a computed address
Operands andOperations
Operands and Operations
opcode:• which operation (ADD, MULT …)• type of operands (INT, FP)
result operand1 operandn
…
• operand location (memory or register)• type (INT, FP)
ADD R1, R3, R4ADD F1, F2, F3SUB R1, R2, R3FADD R1, R2, R3
Operands and operations are encoded in instructions
Frequency of Data Access• Frequency of access to different data
helps in deciding what types are moreimportant to support efficiently
Operations in the Instruction SetArithmetic: Add, multiply, subtract, divideLogical: And, orControl: branch, jump, procedure call andreturnSystem: OS call, virtual memorymanagementFP operations: add, multiply, subtract,divideDecimal: add, multiply, convertString: move, compare, searchGraphics: pixel and vertex operations
Frequency of Instructions• The most widely executed instructions
are the simple operations of an ISA
Control Flow Instructions (CFI)• Conditional branches• Jumps• Procedure calls• Procedure returns
Frequency of CFI• Each event is different and may use different
instructions and have different behaviors
How To Specify Branch Condition?Condition code• ALU operation sets special bits,
get condition for free• Constrain instruction orderingCondition register• Write 0 (false) or 1 (true) into a register
after comparison• Support only BZ and BNZ instructionsCompare and branch• Compare operands (BLT, BGT, BEQ …) and
branch• Instruction may last long
Procedure Invocation OptionsReturn address and some state must besavedCaller saving:• Calling procedure saves registers that it will
need upon return• Must be used for globally accessed variablesCallee saving:• Called procedure saves registers that it
will overwrite
Encoding The Instruction Set
Design decisions affect the size of theinstruction:• Size of the compiled program• Ease of decoding
Encoding The opcode Field
Depends on whether every operation canbe combined with every addressing mode• If it can separate address specifier is needed
for each operand• If it can’t opcode can signify the addressing
mode
Instruction Set Design Trade-offs
More registers are better for compileroptimizationMore addressing modes bring fasteroperationMore registers and addressing modesmake instructions longerShorter instructions and instructions withsimilar CPI are better for pipelining
Instruction Formats
Variable
operation and number of operands
addressing modeand address 1
addressing mode andaddress n
Works best if there are many operations and addressing modesAll addressing modes with all instructionAs few bits as possible to encode the programDecoding might be complicated
Fixed
operation and addressing mode
address 1 address 3
Works best if there is a small number of operationsand addressing modesLarger programsAlways same number bits to encode instructionsEasy decoding
address 2
Instruction Formats
Hybrid
operation addressing mode 1address 1
operation address spec 1
operation address spec 1address 1
address spec 2 address 1
address 2
Instruction Formats
CISC vs. RISCComplex Instruction Set Computer (CISC)• Instructions are highly specialized• Support for a variety of instructions,
addressing modes, etc.• Different CPI and instruction size
Reduced Instruction Set Computer (RISC)• Short, simple instructions, support for a few addressing
modes• More complex instructions must be programmed• Same low CPI
Reduced Code SizeImportant for embedded applicationsDesign hybrid version of instruction set withboth 16-bit and 32-bit instructions• 16-bit instructions are simpler, support fewer
operations and addressing modesCompressed code• Instruction cache contains full instructions• Memory contains compressed instructions• On cache miss, instruction is fetched and
decompressed
Role ofCompilers
Role of CompilersCompiler generates object code inmachine language from the high-levellanguage such as CInstruction set is compiler’s targetIn addition to generating the code,compiler optimizes the code to make it:• Shorter – 25% to 90%• Faster• Susceptible to pipelining
CompilationCompiler makes two to four passes through thecode• In each pass it performs one of the optimizations• The optimizations are optional and may be skipped to
achieve faster compilation• Passes are sequential
• If compiler could go back and repeat steps it might discoverbetter optimizations but this would increase time andcomplexity
Compiler design goals:• Correctness• Speed of compilation
Compilation
Front end per language
High-level optimizations
Global optimizer
Code generator
Front End
Transforms high-level language intocommon intermediate representationWhen a new language becomes popularonly front-end needs to be rewritten
High-Level Optimizations
Transform the code to take advantage ofparallelism and increase speed of execution:•• Loop unrollingLoop unrolling – expand body of the loop to
encompass several iterations thus eliminating numberof conditional branches
•• Procedure Procedure inlininginlining – eliminates context switch•• Prefetch insertionPrefetch insertion – prefech array references
in loops
for (i = 0; i < 100; i++) {
g ();}
for (i = 0; i < 100; i+=2) {
g ();g ();
} ⇒
Global OptimizationsGlobal and local optimizations•• Global common Global common subexpression subexpression eliminationelimination – locates
several expressions that compute same value andreplaces the second with the temporary variable
• Local optimization is done only within basic block•• Copy propagationCopy propagation: if A=X replace all later references to A
with XRegister allocation• Allocate most accessed variables to registers• Since number of registers is limited, must find a strategy
that does not result in too many transfers between thememory and the registers
Code GeneratorTakes advantage of design features of aspecific architecture• Reorder instructions to improve pipeline
performance• Replace multiplication with addition and shifts
Which Variables → RegistersProgram data allocation• Stack
• Local scalar variables and activation records forprocedures
• Best for register allocation• Global area
• Global variables and constants• Should be allocated to registers if accessed frequently
• Heap• Dynamic objects accessed with pointers• Should not be allocated to registers
Aliased variables should also not be allocatedto registers
How Can Architecture Help?Provide regularity• Operations, data types and addressing modes
should be orthogonalProvide primitives not solutions• Special features that match kernels or high-
level languages are often unusableSimplify trade-offs among alternatives• Compilers strive to generate efficient code• Specify benefits and costs of each alternative
Make use of everything that is known atcompile time
Next Weeks …
Week Date Topics Reading assigned Quiz
1 Sep 4 Lec01 - Introduction Chap 1; App B
2 Sep 9 Lec02 – Performance and ISAs Q1
2 Sep 11 Lec03 – ISAs and Role of Compilers App A1-A6
3 Sep 16 Lec04 - MIPS Overview
3 Sep 18 Lec05 – Pipeline Q2
4 Sep 23 Lec06 - Hazards
4 Sep 25 Lec07 – Multi-cycles App A.7; Chap 2
Sep 29 Homework 1 due
5 Sep 30 Homework review Q3