CS 136 1 CS136, Advanced Architecture Instruction Set Architecture.

29
CS 136 1 CS136, Advanced Architecture Instruction Set Architecture

Transcript of CS 136 1 CS136, Advanced Architecture Instruction Set Architecture.

Page 1: CS 136 1 CS136, Advanced Architecture Instruction Set Architecture.

CS 136 1

CS136, Advanced Architecture

Instruction Set Architecture

Page 2: CS 136 1 CS136, Advanced Architecture Instruction Set Architecture.

CS 136 2

Types of ISAs

• Stack– Implicit operands (top of stack)

– Heavy memory traffic

– Limited ability to access operands at will

– Obsolete

• Accumulator– Implicit register operand (“accumulator”)

– One memory operand

– Insufficient temporaries

– Obsolete

• General-purpose register– Multiple registers

– Several variations

Page 3: CS 136 1 CS136, Advanced Architecture Instruction Set Architecture.

CS 136 3

GPR Architectures

• Memory-memory– CISC idea

– Usually allows any operand to be in register as well

• Register-memory– Example: x86

– Can do one operand in register, one in memory, or 2 in regs

• Register-register– Only design used in modern machines

– Lots of registers ⇒ fast flexible operand access– Simplicity of hardware– Compiler has full flexibility in register usage

Page 4: CS 136 1 CS136, Advanced Architecture Instruction Set Architecture.

CS 136 4

Five Ways to Do C = A + B

STACK

PUSH APUSH BADDPOP C

ACCUM

LOAD AADD BSTORE C

MEM-MEM

ADD C,A,B

REG-MEM

LOAD R1,AADD R1,BSTORE R1,C

REG-REG

LOAD R1,ALOAD R2,BADD R3,R1,R2STORE R3,C

Page 5: CS 136 1 CS136, Advanced Architecture Instruction Set Architecture.

CS 136 5

Memory Addressing

• Originally just word addressing

• 8-bit bytes and byte addressing introduced on IBM 360 series

• Brief experiments with bit addressing (bad idea)

• Unaligned accesses not worth supporting

• Some machines byte-address but only load/store a word at a time– Turned out to be bad design decision

– Too many programs do string processing 1 character at a time

– May need to revisit in future (32-bit characters?)

• Modern RISC designs allow short load/store, but not short arithmetic

Page 6: CS 136 1 CS136, Advanced Architecture Instruction Set Architecture.

CS 136 6

Endian-ness

• The word is “Endian”, not “Indian”

• Reference to Gulliver’s Travels

• Little-Endian invented by Digital Equipment on the PDP-11– Mathematically more elegant

– Horrible for humans

– “It seemed like a good idea at the time”

– Should be banished from the face of the Earth

• Some machines can switch endianness with a control bit– This idea is even stupider than the original

Page 7: CS 136 1 CS136, Advanced Architecture Instruction Set Architecture.

CS 136 7

Addressing Modes

• How can an instruction reference memory?

• Early days: absolute address in instruction– Led to instruction modification

– Improvement: “Indirection” picked up absolute location, used it as final address

• Minimum necessary today: follow pointer in register– Clumsy if only option

• Fanciest conceivable: *(R1+S*R2+constant), with either or both of R1 and R2 autoincremented or autodecremented as side effect, either before or after instruction– No machine went quite this far

– But VAX came close

Page 8: CS 136 1 CS136, Advanced Architecture Instruction Set Architecture.

CS 136 8

Addressing Modes (cont’d)

• What’s actually useful?

• Need to follow pointers: can restrict to registers– ADD R1,(R2)

– Better: LOAD R1,R2 (like MIPS)

• Frequent stack access ⇒ register + constant useful• Immediates needed for built-in constants• Access to globals ⇒ absolute memory

addresses– (We’ll see that that’s painful)

• PC-relative modes– Used to be needed for data; not in modern systems– Still needed for calls and branches

• Absolute addresses no longer needed for branches– Can always emulate with PC-relative, since PC known– Still available on some architectures

Page 9: CS 136 1 CS136, Advanced Architecture Instruction Set Architecture.

CS 136 9

Operand Types and Sizes

• Type usually implies size

• Integers can safely be widened to word size– Shrink again when stored

– Takes advantage of two’s-complement representation

• Single-precision FP gives different results than double-precisions⇒Necessary to support both widths

– Some FPUs can do two SP operations in parallel

• Older machines allowed “packed” decimal (2 digits per byte)– x86 supports with DAA (Decimal Add Adjust) instruction

– Still useful in business world, though dying

• 32 bits standard these days, 64 bits coming– 128 some day?

Page 10: CS 136 1 CS136, Advanced Architecture Instruction Set Architecture.

CS 136 10

Operations Provided

• Only one instruction truly needed: SJ– Subtract A from B, giving C; if result is < 0, jump to D

– It’s Turing-complete!

• Practical machines need a bit more at minimum:– Arithmetic and logical (add, multiply, divide?, and, or, …)

– Data movement (load/store, move between registers)

– Control (conditional/unconditional branch, procedure call and return, trap to OS)

– System control (return from interrupt, manage VM, set unprivileged mode, access I/O devices)

• Other builtins can be useful:– Basic floating point

» Bad x86 design idea: sin, sqrt, etc.!

– Decimal

– String

– Vector, graphics

Page 11: CS 136 1 CS136, Advanced Architecture Instruction Set Architecture.

CS 136 11

Control Flow

• Addressing modes are important– PC-relative means code can run at any virtual address

– Useful for dynamically linked (shared) libraries

• Pointer-following jump needed for returns– Also useful for switch statements, function pointers, virtual

functions, and shared libraries

• How to specify condition for conditional branches?– Condition code as side effect of every instruction

» Boils down to extra register

» Spurious dependencies in pipeline

– Condition register explicitly set by comparison

– Compare as part of branch

» Adds delay slots in pipeline

Page 12: CS 136 1 CS136, Advanced Architecture Instruction Set Architecture.

CS 136 12

Encodings

• Variable-length instructions– Highly efficient (few wasted bits)

– Allows complex specifications (e.g., x86 addressing modes)

– Usually means misaligned instruction fetch

– Greatly complicates fetch/decode units

• Fixed-length instructions– May limit number of registers

– Usually very few instruction formats

– Wastes space but gains speed (e.g., only aligned fetches)

– Limits width of immediate operands

Page 13: CS 136 1 CS136, Advanced Architecture Instruction Set Architecture.

CS 136 13

The Fight for Bits

• How wide should instruction be?– Wider ⇒ can encode more registers, more options– Wider ⇒ bigger programs, more memory bandwidth– Bigger programs ⇒ fewer cache hits

• Things you need to encode:– Operation code (16 to 1000 instructions)– Operands (at least one, normally two or three)– Immediate operands– Memory offsets– Branch targets– Branch conditions– Conditional operations (e.g., conditional load, add)

Page 14: CS 136 1 CS136, Advanced Architecture Instruction Set Architecture.

CS 136 14

Two or Three Operands?

• In favor of three:– Smaller code size

– No clobbered operands ⇒ fewer copies or reloads– Setting R0 to zero allows fewer operations

supported in ALU

• In favor of two:– Can address more registers

Page 15: CS 136 1 CS136, Advanced Architecture Instruction Set Architecture.

CS 136 15

How to Decide All These Questions?

• Slide rules at 50 paces?

• Analysis wars– Look at existing designs, existing programs

– “Recompile” programs for hypothetical architecture

» Analyze size of resulting program

» Run through simulator to see how it performs

– Impractical approach

» Writing compiler back ends is expensive

» Simulators are slow

– instead, make projections based on existing object code

Page 16: CS 136 1 CS136, Advanced Architecture Instruction Set Architecture.

CS 136 16

Example of Bad Analysis: @-(R2)

• DEC VAX had three “auto” addressing modes: autopostincrement, autopredecrement, and indirect autopostincrement

• What happened to indirect autopredecrement?– Analyzed output of BLISS compiler on many programs

– Language didn’t provide way to express autopredecrement

– Concluded it wasn’t necessary

– Very different result if had analyzed C!

*--p1 = a[--i];

Page 17: CS 136 1 CS136, Advanced Architecture Instruction Set Architecture.

CS 136 17

Example of Difficult Analysis: imm16

• How big should an immediate be?

• Easy analysis: examine existing code– Calculate frequency of various widths

– Analyze tradeoff of using those bits for other purposes

• Problem: analyzed architecture affects frequency of different widths– E.g., Alpha has only 16 bits, so you’ll never see over 16!

– Alternative: look for multi-instruction sequences that effectively use more than 16 bits

» Hard to find (compiler pipeline scheduling)

» Compiler will stand on head, use sneaky tricks to avoid generating extra instructions

– Need for wider constants depends on architecture

» E.g., MIPS needs them when jumping to shared libraries

Page 18: CS 136 1 CS136, Advanced Architecture Instruction Set Architecture.

CS 136 18

Page 19: CS 136 1 CS136, Advanced Architecture Instruction Set Architecture.

CS 136 19

Interaction with Compilers

• Nearly all modern code generated by compilers

• Architect must make compiler’s job easier– Lots of registers

– Orthogonal instruction set

– Few side effects

– Instructions and addressing modes matched to language constructs

» But NOT attempt to implement them in detail!

» Primitives are better than “solutions” even when solutions are correct

– Good support for stack, globals, and pointers

– Support for both compile-time and run-time binding

– Don’t ask compiler to predict dynamic information (e.g., branch targets)

– Don’t provide features language can’t express

» Example pro and con: vector architectures

Page 20: CS 136 1 CS136, Advanced Architecture Instruction Set Architecture.

CS 136 20

The MIPS64 Architecture

• Extension of MIPS32

• Data path widened to 64 bits– Still 32-bit instructions

– Still only 32 registers

• Most instructions have “D” as prefix to indicate 64-bit version

Page 21: CS 136 1 CS136, Advanced Architecture Instruction Set Architecture.

CS 136 21

MIPS Instruction Formats

Opcode

6

rs

5

rt

5

rd

5

shamt

5

funct

6

R-Type Instruction

I-Type Instruction

Opcode

6

rs

5

rt

5 16

Immediate

J-Type Instruction

Opcode

6 26

Offset inserted into PC

Page 22: CS 136 1 CS136, Advanced Architecture Instruction Set Architecture.

CS 136 22

I-Type Instructions

• Encodes loads, stores (all widths), immediate ALU ops

• Also conditional branches (rt unused)

Opcode

6

rs

5

rt

5 16

Immediate

Page 23: CS 136 1 CS136, Advanced Architecture Instruction Set Architecture.

CS 136 23

R-Type Instructions

Opcode

6

rs

5

rt

5

rd

5

shamt

5

funct

6

• Register-register ALU operations– “funct” encodes the ALU operation: add, sub, etc.

– Opcode chooses operands, special registers, sizes, etc.

– Conditional moves

• Handles special registers, floating point, …

Page 24: CS 136 1 CS136, Advanced Architecture Instruction Set Architecture.

CS 136 24

J-Type Instructions

Opcode

6 26

Offset inserted into PC

• Jump, jump and link

• Trap, return from exception

Page 25: CS 136 1 CS136, Advanced Architecture Instruction Set Architecture.

CS 136 25

MIPS Control Flow

• Unconditional jump substitutes low bits of PC– NOT addition!

– Exceptionally bad on 64-bit architecture, where 36 bits unchanged

• No built-in stack– Subroutine call stores return in register

– Callee must save on stack if necessary

– Reduces overall cycle time

– Ultra-efficient for leaf functions

• Conditional branches only test against zero– Complex tests (e.g., <) store Z/NZ result in a register

– We’ve seen how this improves the pipeline

• Conditional moves can eliminate many branches– Feature of many modern architectures

Page 26: CS 136 1 CS136, Advanced Architecture Instruction Set Architecture.

CS 136 26

MIPS Floating Point

• Floating point was originally coprocessor

⇒Separate FP registers– Special instructions to move to/from integer registers

• MIPS64 (but not 32) has paired single operations– Two SP numbers pass through DP ALU simultaneously

• MIPS64 also has multiply-add in one instruction– Useful in signal processing (multimedia)

Page 27: CS 136 1 CS136, Advanced Architecture Instruction Set Architecture.

CS 136 27

Fallacies and Pitfalls

• PITFALL: Instruction designed to support feature in some language– Examples: PDP-11/45 MARK, VAX CALLS, IBM 360 ED/EDMK

– Why is this bad?

» Easy to get wrong (PDP-11 MARK instruction)

» Easy to make inefficient (VAX CALLS)

» Languages evolve, hardware doesn‘t

Page 28: CS 136 1 CS136, Advanced Architecture Instruction Set Architecture.

CS 136 28

Fallacies and Pitfalls (2)

• FALLACY: Typical programs exist– We wish!

• PITFALL: Ignoring the compiler– Design better code size, based on bad compiler

– Good compiler can blow your idea out of the water

• FALLACY: Flawed architectures can’t succeed– Ummm, x86?

– Every architecture has drawbacks

• FALLACY: You (YOU!) can design a flawless architecture– Always tradeoffs

– Always something new to learn

Page 29: CS 136 1 CS136, Advanced Architecture Instruction Set Architecture.

CS 136 29

Summary

• Instruction encoding is important

• Don’t forget to provide what the compiler needs– This is NOT what you think the compiler needs!

• Addresses will only get wider

• Data will only get wider– Including characters

• Cleverness to improve bandwidth (e.g., MADD)

• RISC is here to stay