18-447 Computer Architecturecourse.ece.cmu.edu/~ece447/s13/lib/exe/fetch.php?media=onur-447 … ·...

27
18-447 Computer Architecture Lecture 4: More ISA Tradeoffs Prof. Onur Mutlu Carnegie Mellon University Spring 2012, 1/23/2012

Transcript of 18-447 Computer Architecturecourse.ece.cmu.edu/~ece447/s13/lib/exe/fetch.php?media=onur-447 … ·...

Page 1: 18-447 Computer Architecturecourse.ece.cmu.edu/~ece447/s13/lib/exe/fetch.php?media=onur-447 … · Java machines, LISP machines, object-oriented machines, capability-based machines

18-447

Computer Architecture

Lecture 4: More ISA Tradeoffs

Prof. Onur Mutlu

Carnegie Mellon University

Spring 2012, 1/23/2012

Page 2: 18-447 Computer Architecturecourse.ece.cmu.edu/~ece447/s13/lib/exe/fetch.php?media=onur-447 … · Java machines, LISP machines, object-oriented machines, capability-based machines

Homework 0

Due now

2

Page 3: 18-447 Computer Architecturecourse.ece.cmu.edu/~ece447/s13/lib/exe/fetch.php?media=onur-447 … · Java machines, LISP machines, object-oriented machines, capability-based machines

Reminder: Homeworks for Next Two Weeks

Homework 1

Due Monday Jan 28, midnight

Turn in via AFS (hand-in directories) or box outside CIC 4th floor

MIPS warmup, ISA concepts, basic performance evaluation

Homework 2

Will be assigned next week. Stay tuned…

3

Page 4: 18-447 Computer Architecturecourse.ece.cmu.edu/~ece447/s13/lib/exe/fetch.php?media=onur-447 … · Java machines, LISP machines, object-oriented machines, capability-based machines

Reminder: Lab Assignment 1

Due next Friday (Feb 1), at the end of Friday lab

A functional C-level simulator for a subset of the MIPS ISA

Study the MIPS ISA Tutorial

TAs will cover this in Lab Sessions this week

4

Page 5: 18-447 Computer Architecturecourse.ece.cmu.edu/~ece447/s13/lib/exe/fetch.php?media=onur-447 … · Java machines, LISP machines, object-oriented machines, capability-based machines

Review of Last Lecture

ISA Principles and Tradeoffs

Elements of the ISA

Sequencing model, instruction processing style

Instructions, data types, memory organization, registers, addressing modes, orthogonality, I/O device interfacing …

What is the benefit of autoincrement addressing mode?

What is the downside of having an autoincrement addressing mode?

Is the LC-3b ISA orthogonal?

Can all addressing modes be used with all instructions?

5

Page 6: 18-447 Computer Architecturecourse.ece.cmu.edu/~ece447/s13/lib/exe/fetch.php?media=onur-447 … · Java machines, LISP machines, object-oriented machines, capability-based machines

Is the LC-3b ISA Orthogonal?

6

Page 7: 18-447 Computer Architecturecourse.ece.cmu.edu/~ece447/s13/lib/exe/fetch.php?media=onur-447 … · Java machines, LISP machines, object-oriented machines, capability-based machines

LC-3b: Addressing Modes of ADD

7

Page 8: 18-447 Computer Architecturecourse.ece.cmu.edu/~ece447/s13/lib/exe/fetch.php?media=onur-447 … · Java machines, LISP machines, object-oriented machines, capability-based machines

LC-3b: Addressing Modes of of JSR(R)

8

Page 9: 18-447 Computer Architecturecourse.ece.cmu.edu/~ece447/s13/lib/exe/fetch.php?media=onur-447 … · Java machines, LISP machines, object-oriented machines, capability-based machines

Another Question

Does the LC-3b ISA contain complex instructions?

9

Page 10: 18-447 Computer Architecturecourse.ece.cmu.edu/~ece447/s13/lib/exe/fetch.php?media=onur-447 … · Java machines, LISP machines, object-oriented machines, capability-based machines

Complex vs. Simple Instructions

Complex instruction: An instruction does a lot of work, e.g. many operations

Insert in a doubly linked list

Compute FFT

String copy

Simple instruction: An instruction does small amount of work, it is a primitive using which complex operations can be built

Add

XOR

Multiply

10

Page 11: 18-447 Computer Architecturecourse.ece.cmu.edu/~ece447/s13/lib/exe/fetch.php?media=onur-447 … · Java machines, LISP machines, object-oriented machines, capability-based machines

Complex vs. Simple Instructions

Advantages of Complex instructions

+ Denser encoding smaller code size better memory

utilization, saves off-chip bandwidth, better cache hit rate (better packing of instructions)

+ Simpler compiler: no need to optimize small instructions as much

Disadvantages of Complex Instructions

- Larger chunks of work compiler has less opportunity to

optimize (limited in fine-grained optimizations it can do)

- More complex hardware translation from a high level to

control signals and optimization needs to be done by hardware

11

Page 12: 18-447 Computer Architecturecourse.ece.cmu.edu/~ece447/s13/lib/exe/fetch.php?media=onur-447 … · Java machines, LISP machines, object-oriented machines, capability-based machines

ISA-level Tradeoffs: Semantic Gap

Where to place the ISA? Semantic gap

Closer to high-level language (HLL) Small semantic gap,

complex instructions

Closer to hardware control signals? Large semantic gap,

simple instructions

RISC vs. CISC machines

RISC: Reduced instruction set computer

CISC: Complex instruction set computer

FFT, QUICKSORT, POLY, FP instructions?

VAX INDEX instruction (array access with bounds checking)

12

Page 13: 18-447 Computer Architecturecourse.ece.cmu.edu/~ece447/s13/lib/exe/fetch.php?media=onur-447 … · Java machines, LISP machines, object-oriented machines, capability-based machines

ISA-level Tradeoffs: Semantic Gap

Some tradeoffs (for you to think about)

Simple compiler, complex hardware vs. complex compiler, simple hardware

Caveat: Translation (indirection) can change the tradeoff!

Burden of backward compatibility

Performance?

Optimization opportunity: Example of VAX INDEX instruction: who (compiler vs. hardware) puts more effort into optimization?

Instruction size, code size

13

Page 14: 18-447 Computer Architecturecourse.ece.cmu.edu/~ece447/s13/lib/exe/fetch.php?media=onur-447 … · Java machines, LISP machines, object-oriented machines, capability-based machines

X86: Small Semantic Gap: String Operations

An instruction operates on a string

Move one string of arbitrary length to another location

Compare two strings

Enabled by the ability to specify repeated execution of an instruction (in the ISA)

Using a “prefix” called REP prefix

Example: REP MOVS instruction

Only two bytes: REP prefix byte and MOVS opcode byte (F2 A4)

Implicit source and destination registers pointing to the two strings (ESI, EDI)

Implicit count register (ECX) specifies how long the string is

14

Page 15: 18-447 Computer Architecturecourse.ece.cmu.edu/~ece447/s13/lib/exe/fetch.php?media=onur-447 … · Java machines, LISP machines, object-oriented machines, capability-based machines

X86: Small Semantic Gap: String Operations

15

REP MOVS (DEST SRC)

How many instructions does this take in MIPS?

Page 16: 18-447 Computer Architecturecourse.ece.cmu.edu/~ece447/s13/lib/exe/fetch.php?media=onur-447 … · Java machines, LISP machines, object-oriented machines, capability-based machines

Small Semantic Gap Examples in VAX

FIND FIRST

Find the first set bit in a bit field

Helps OS resource allocation operations

SAVE CONTEXT, LOAD CONTEXT

Special context switching instructions

INSQUEUE, REMQUEUE

Operations on doubly linked list

INDEX

Array access with bounds checking

STRING Operations

Compare strings, find substrings, …

Cyclic Redundancy Check Instruction

EDITPC

Implements editing functions to display fixed format output

Digital Equipment Corp., “VAX11 780 Architecture Handbook,” 1977-78.

16

Page 17: 18-447 Computer Architecturecourse.ece.cmu.edu/~ece447/s13/lib/exe/fetch.php?media=onur-447 … · Java machines, LISP machines, object-oriented machines, capability-based machines

Small versus Large Semantic Gap

CISC vs. RISC

Complex instruction set computer complex instructions

Initially motivated by “not good enough” code generation

Reduced instruction set computer simple instructions

John Cocke, mid 1970s, IBM 801

Goal: enable better compiler control and optimization

RISC motivated by

Memory stalls (no work done in a complex instruction when there is a memory stall?)

When is this correct?

Simplifying the hardware lower cost, higher frequency

Enabling the compiler to optimize the code better

Find fine-grained parallelism to reduce stalls

17

Page 18: 18-447 Computer Architecturecourse.ece.cmu.edu/~ece447/s13/lib/exe/fetch.php?media=onur-447 … · Java machines, LISP machines, object-oriented machines, capability-based machines

How High or Low Can You Go?

Very large semantic gap

Each instruction specifies the complete set of control signals in the machine

Compiler generates control signals

Open microcode (John Cocke, circa 1970s)

Gave way to optimizing compilers

Very small semantic gap

ISA is (almost) the same as high-level language

Java machines, LISP machines, object-oriented machines, capability-based machines

18

Page 19: 18-447 Computer Architecturecourse.ece.cmu.edu/~ece447/s13/lib/exe/fetch.php?media=onur-447 … · Java machines, LISP machines, object-oriented machines, capability-based machines

A Note on ISA Evolution

ISAs have evolved to reflect/satisfy the concerns of the day

Examples:

Limited on-chip and off-chip memory size

Limited compiler optimization technology

Limited memory bandwidth

Need for specialization in important applications (e.g., MMX)

Use of translation (in HW and SW) enabled underlying implementations to be similar, regardless of the ISA

Concept of dynamic/static interface

Contrast it with hardware/software interface

19

Page 20: 18-447 Computer Architecturecourse.ece.cmu.edu/~ece447/s13/lib/exe/fetch.php?media=onur-447 … · Java machines, LISP machines, object-oriented machines, capability-based machines

Effect of Translation

One can translate from one ISA to another ISA to change the semantic gap tradeoffs

Examples

Intel’s and AMD’s x86 implementations translate x86 instructions into programmer-invisible microoperations (simple instructions) in hardware

Transmeta’s x86 implementations translated x86 instructions into “secret” VLIW instructions in software (code morphing software)

Think about the tradeoffs

20

Page 21: 18-447 Computer Architecturecourse.ece.cmu.edu/~ece447/s13/lib/exe/fetch.php?media=onur-447 … · Java machines, LISP machines, object-oriented machines, capability-based machines

ISA-level Tradeoffs: Instruction Length

Fixed length: Length of all instructions the same

+ Easier to decode single instruction in hardware

+ Easier to decode multiple instructions concurrently

-- Wasted bits in instructions (Why is this bad?)

-- Harder-to-extend ISA (how to add new instructions?)

Variable length: Length of instructions different (determined by opcode and sub-opcode)

+ Compact encoding (Why is this good?)

Intel 432: Huffman encoding (sort of). 6 to 321 bit instructions. How?

-- More logic to decode a single instruction

-- Harder to decode multiple instructions concurrently

Tradeoffs Code size (memory space, bandwidth, latency) vs. hardware complexity

ISA extensibility and expressiveness

Performance? Smaller code vs. imperfect decode 21

Page 22: 18-447 Computer Architecturecourse.ece.cmu.edu/~ece447/s13/lib/exe/fetch.php?media=onur-447 … · Java machines, LISP machines, object-oriented machines, capability-based machines

ISA-level Tradeoffs: Uniform Decode

Uniform decode: Same bits in each instruction correspond to the same meaning

Opcode is always in the same location

Ditto operand specifiers, immediate values, …

Many “RISC” ISAs: Alpha, MIPS, SPARC

+ Easier decode, simpler hardware

+ Enables parallelism: generate target address before knowing the instruction is a branch

-- Restricts instruction format (fewer instructions?) or wastes space

Non-uniform decode

E.g., opcode can be the 1st-7th byte in x86

+ More compact and powerful instruction format

-- More complex decode logic

22

Page 23: 18-447 Computer Architecturecourse.ece.cmu.edu/~ece447/s13/lib/exe/fetch.php?media=onur-447 … · Java machines, LISP machines, object-oriented machines, capability-based machines

x86 vs. Alpha Instruction Formats

x86:

Alpha:

23

Page 24: 18-447 Computer Architecturecourse.ece.cmu.edu/~ece447/s13/lib/exe/fetch.php?media=onur-447 … · Java machines, LISP machines, object-oriented machines, capability-based machines

MIPS Instruction Format

R-type, 3 register operands

I-type, 2 register operands and 16-bit immediate operand

J-type, 26-bit immediate operand

Simple Decoding

4 bytes per instruction, regardless of format

must be 4-byte aligned (2 lsb of PC must be 2b’00)

format and fields easy to extract in hardware

24

R-type 0 6-bit

rs 5-bit

rt 5-bit

rd 5-bit

shamt 5-bit

funct 6-bit

opcode 6-bit

rs 5-bit

rt 5-bit

immediate 16-bit

I-type

opcode 6-bit

immediate 26-bit

J-type

Page 25: 18-447 Computer Architecturecourse.ece.cmu.edu/~ece447/s13/lib/exe/fetch.php?media=onur-447 … · Java machines, LISP machines, object-oriented machines, capability-based machines

A Note on Length and Uniformity

Uniform decode usually goes with fixed length

In a variable length ISA, uniform decode can be a property of instructions of the same length

It is hard to think of it as a property of instructions of different lengths

25

Page 26: 18-447 Computer Architecturecourse.ece.cmu.edu/~ece447/s13/lib/exe/fetch.php?media=onur-447 … · Java machines, LISP machines, object-oriented machines, capability-based machines

A Note on RISC vs. CISC

Usually, …

RISC

Simple instructions

Fixed length

Uniform decode

Few addressing modes

CISC

Complex instructions

Variable length

Non-uniform decode

Many addressing modes

26

Page 27: 18-447 Computer Architecturecourse.ece.cmu.edu/~ece447/s13/lib/exe/fetch.php?media=onur-447 … · Java machines, LISP machines, object-oriented machines, capability-based machines

ISA-level Tradeoffs: Number of Registers

Affects:

Number of bits used for encoding register address

Number of values kept in fast storage (register file)

(uarch) Size, access time, power consumption of register file

Large number of registers:

+ Enables better register allocation (and optimizations) by compiler fewer saves/restores

-- Larger instruction size

-- Larger register file size

27