ECE 486/586 Computer Architecture Lecture # 5web.cecs.pdx.edu/~zeshan/ece586_lec5.pdf ·...

ECE 486/586

Computer Architecture

Lecture # 5

Spring 2019

Portland State University

Lecture Topics

• Quantitative Principles of Computer Design

• Fallacies and Pitfalls

• Instruction Set Principles

– Introduction

– Classifying Instruction Set Architectures

Reference:

• Chapter 1: Sections 1.9, 1.11

• Appendix A: Sections A.1, A.2

Key Principles of Computer Architecture

• Take Advantage of Parallelism

• Principle of Locality

• Focus on the Common Case

• Amdahl’s Law

• Processor Performance Equation

Principle #1: Exploit Parallelism

• System Level– Multiple processors

– Multiple disks

– Multiple memory channels

– Pipelined buses

• Processor Level– Pipelined instruction execution

– Multiple functional units

• Logic level– Carry lookahead adders

– Multi-banked caches

– Multi-ported register files

Principle #2: Exploit Locality

• Temporal– Recently accessed items likely to be accessed in the near future

– Code• Loops and function calls

– Data• Repeated access to the same variable, e.g., loop counter

• Spatial– Items whose addresses are near one another tend to be

referenced close together in time

– Code• Sequential instruction execution

– Data• Array elements, fields in a data structure

Principle # 3: Focus on Common Case

• Implication of Amdahl’s Law:– Speeding up 90% of the execution by only 10% is as good as

speeding up 10% of the execution by 10x

• Examples:– The number of add/subtract instructions in a typical program is

substantially higher than divide instructions• Focus more on building fast adders as compared to fast dividers

– Most loop branches are taken• Use branch prediction (fetch the branch target instead of the next

sequential instruction)

Fallacies and Pitfalls

• Fallacy– A falsehood – often widely believed to be true

• Pitfall– Easily made mistake

– Generalizations of principles that are true in a limited context

Fallacy

• The relative performance of two processors with the same ISA can be judged by clock rate or by the performance of a single benchmark suite

• Problems with the above argument:– The processors may have the same clock rate, but may differ

considerably in their pipelines and cache subsystems• Same clock rate but different CPIs

– A processor may be tuned to one particular benchmark suite, while performing poorly on other benchmarks

Fallacy

• Benchmarks remain valid indefinitely

• Why not?– Vulnerability to “Benchmark engineering”

– Once a benchmark becomes popular, there is tremendous pressure to improve performance by “bending” the rules for running the benchmark

– Kernels which spend majority of their time on a very small section of code are particularly vulnerable• Example: matrix300 kernel

Fallacy

• Peak performance tracks observed performance

• Problems with the above argument:– Peak performance is only useful as an upper bound on the

performance that a system can deliver

– Typical performance can vary 10x or more from peak performance

– Difference between typical performance and peak performance can vary greatly from program to program

Fallacy

• Multiprocessors are a silver bullet

• Why not?– The switch to multiprocessors happened due to ILP wall and

Power wall, not due to dramatically simplified parallel programming

– In the multi-core era, improving performance is now the burden of programmers

– Programmers must make their programs more and more parallel, an uphill task

Fallacy

• Synthetic benchmarks predict performance for real programs

• Why not?– Synthetic benchmarks may not take into account effects of real

world systems (loading, context switching)• System may not fare as well in practice as it does on the

benchmark

– Synthetic benchmarks may under-reward performance-enhancing optimizations• Whetstone loops with few iterations

– System which optimizes loop branch prediction won’t fare as well on the benchmark as in practice

Fallacy

• MIPS (Millions of Instructions per Second) is an accurate measure for comparing performance among computers

𝑀𝐼𝑃𝑆 =𝐼𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛 𝐶𝑜𝑢𝑛𝑡

𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑡𝑖𝑚𝑒 ∗ 106=

𝐶𝑙𝑜𝑐𝑘 𝑅𝑎𝑡𝑒

𝐶𝑃𝐼 ∗ 106

• Problems:– What’s an instruction? Depends upon ISA

• One instruction on an ISA may do as much “work” as ten instructions on another ISA

– MIPS can vary inversely with performance• HW floating point instructions vs. software routines

• HW faster but executes fewer instructions

– MIPS can vary among programs on same computer

Pitfall

• Comparing hand-coded assembly and compiler-generated, high-level language performance

• Potential issues:– Hand-coded assembly requires specialized programmers; less

likely to be used except in embedded systems

– Unless the compiler can perform the same optimizations that can be done by assembly language programmer, performance of the compiler-generated code will not match the hand-coded program

Pitfall

• Falling prey to Amdahl’s Law– Don’t forget to assess the potential usage/impact of a feature before

embarking on the long journey to implement it

Pitfall

• A single point of failure– Dependability is no stronger than the weakest link in the chain

– Make every component redundant so that no single component failure could bring down the whole system

Instruction Set Principles

• Reading:– Hennessy and Patterson, Appendix A

– RISC paper (Patterson & Sequin): posted on course website

Instruction Set Architecture

• Instruction Set Architecture (ISA)– Traditional meaning of computer architecture

– What is visible to the programmer/compiler writer

– Independent of organization and implementation• E.g., ISA doesn’t include caches and pipelines

– Instructions, Operands, Addressing Modes

Instruction Set Architecture

• Compiler– Input: high level language– Output: assembly language for target ISA– Global, local optimizations– Register allocation

• Assembler– Input: Assembly language– Output: Machine code (“object file”)

• Linker– Inputs: Object files, library files– Outputs: Executable program

• Loader– Reads executable from disk– Passes command line arguments– Optionally “fixes” absolute addresses

ISA Classification

ISA Classification

C = A + B

A, B and C are memory locations.R1, R2 and R3 are registers

ISA Examples

• Stack– HP calculator– Pentium FP (x87 co-processor)

• 8 registers organized as stack

• Accumulator– PDP-8– 8051 microcontroller

• Load/Store (Register/Register)– RISC: MIPS, Alpha, ARM, PowerPC, SPARC– Itanium

• Register/Memory– IA-32 (Intel x86), Motorola 68000, IBM 360– PDP-11– VAX (really Memory/Memory)

Register and Memory Operands

ISA Comparison

ECE 486/586 Computer Architecture Lecture # 5web.cecs.pdx.edu/~zeshan/ece586_lec5.pdf ·...

Documents

Transcript of ECE 486/586 Computer Architecture Lecture # 5web.cecs.pdx.edu/~zeshan/ece586_lec5.pdf ·...