ECE 486/586 Computer Architecture Lecture # 5web.cecs.pdx.edu/~zeshan/ece586_lec5.pdf ·...
Transcript of ECE 486/586 Computer Architecture Lecture # 5web.cecs.pdx.edu/~zeshan/ece586_lec5.pdf ·...
ECE 486/586
Computer Architecture
Lecture # 5
Spring 2019
Portland State University
Lecture Topics
• Quantitative Principles of Computer Design
• Fallacies and Pitfalls
• Instruction Set Principles
– Introduction
– Classifying Instruction Set Architectures
Reference:
• Chapter 1: Sections 1.9, 1.11
• Appendix A: Sections A.1, A.2
Key Principles of Computer Architecture
• Take Advantage of Parallelism
• Principle of Locality
• Focus on the Common Case
• Amdahl’s Law
• Processor Performance Equation
Principle #1: Exploit Parallelism
• System Level– Multiple processors
– Multiple disks
– Multiple memory channels
– Pipelined buses
• Processor Level– Pipelined instruction execution
– Multiple functional units
• Logic level– Carry lookahead adders
– Multi-banked caches
– Multi-ported register files
Principle #2: Exploit Locality
• Temporal– Recently accessed items likely to be accessed in the near future
– Code• Loops and function calls
– Data• Repeated access to the same variable, e.g., loop counter
• Spatial– Items whose addresses are near one another tend to be
referenced close together in time
– Code• Sequential instruction execution
– Data• Array elements, fields in a data structure
Principle # 3: Focus on Common Case
• Implication of Amdahl’s Law:– Speeding up 90% of the execution by only 10% is as good as
speeding up 10% of the execution by 10x
• Examples:– The number of add/subtract instructions in a typical program is
substantially higher than divide instructions• Focus more on building fast adders as compared to fast dividers
– Most loop branches are taken• Use branch prediction (fetch the branch target instead of the next
sequential instruction)
Fallacies and Pitfalls
• Fallacy– A falsehood – often widely believed to be true
• Pitfall– Easily made mistake
– Generalizations of principles that are true in a limited context
Fallacy
• The relative performance of two processors with the same ISA can be judged by clock rate or by the performance of a single benchmark suite
• Problems with the above argument:– The processors may have the same clock rate, but may differ
considerably in their pipelines and cache subsystems• Same clock rate but different CPIs
– A processor may be tuned to one particular benchmark suite, while performing poorly on other benchmarks
Fallacy
• Benchmarks remain valid indefinitely
• Why not?– Vulnerability to “Benchmark engineering”
– Once a benchmark becomes popular, there is tremendous pressure to improve performance by “bending” the rules for running the benchmark
– Kernels which spend majority of their time on a very small section of code are particularly vulnerable• Example: matrix300 kernel
Fallacy
• Peak performance tracks observed performance
• Problems with the above argument:– Peak performance is only useful as an upper bound on the
performance that a system can deliver
– Typical performance can vary 10x or more from peak performance
– Difference between typical performance and peak performance can vary greatly from program to program
Fallacy
• Multiprocessors are a silver bullet
• Why not?– The switch to multiprocessors happened due to ILP wall and
Power wall, not due to dramatically simplified parallel programming
– In the multi-core era, improving performance is now the burden of programmers
– Programmers must make their programs more and more parallel, an uphill task
Fallacy
• Synthetic benchmarks predict performance for real programs
• Why not?– Synthetic benchmarks may not take into account effects of real
world systems (loading, context switching)• System may not fare as well in practice as it does on the
benchmark
– Synthetic benchmarks may under-reward performance-enhancing optimizations• Whetstone loops with few iterations
– System which optimizes loop branch prediction won’t fare as well on the benchmark as in practice
Fallacy
• MIPS (Millions of Instructions per Second) is an accurate measure for comparing performance among computers
𝑀𝐼𝑃𝑆 =𝐼𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛 𝐶𝑜𝑢𝑛𝑡
𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑡𝑖𝑚𝑒 ∗ 106=
𝐶𝑙𝑜𝑐𝑘 𝑅𝑎𝑡𝑒
𝐶𝑃𝐼 ∗ 106
• Problems:– What’s an instruction? Depends upon ISA
• One instruction on an ISA may do as much “work” as ten instructions on another ISA
– MIPS can vary inversely with performance• HW floating point instructions vs. software routines
• HW faster but executes fewer instructions
– MIPS can vary among programs on same computer
Pitfall
• Comparing hand-coded assembly and compiler-generated, high-level language performance
• Potential issues:– Hand-coded assembly requires specialized programmers; less
likely to be used except in embedded systems
– Unless the compiler can perform the same optimizations that can be done by assembly language programmer, performance of the compiler-generated code will not match the hand-coded program
Pitfall
• Falling prey to Amdahl’s Law– Don’t forget to assess the potential usage/impact of a feature before
embarking on the long journey to implement it
Pitfall
• A single point of failure– Dependability is no stronger than the weakest link in the chain
– Make every component redundant so that no single component failure could bring down the whole system
Instruction Set Principles
• Reading:– Hennessy and Patterson, Appendix A
– RISC paper (Patterson & Sequin): posted on course website
Instruction Set Architecture
• Instruction Set Architecture (ISA)– Traditional meaning of computer architecture
– What is visible to the programmer/compiler writer
– Independent of organization and implementation• E.g., ISA doesn’t include caches and pipelines
– Instructions, Operands, Addressing Modes
Instruction Set Architecture
• Compiler– Input: high level language– Output: assembly language for target ISA– Global, local optimizations– Register allocation
• Assembler– Input: Assembly language– Output: Machine code (“object file”)
• Linker– Inputs: Object files, library files– Outputs: Executable program
• Loader– Reads executable from disk– Passes command line arguments– Optionally “fixes” absolute addresses
ISA Classification
ISA Classification
C = A + B
A, B and C are memory locations.R1, R2 and R3 are registers
ISA Examples
• Stack– HP calculator– Pentium FP (x87 co-processor)
• 8 registers organized as stack
• Accumulator– PDP-8– 8051 microcontroller
• Load/Store (Register/Register)– RISC: MIPS, Alpha, ARM, PowerPC, SPARC– Itanium
• Register/Memory– IA-32 (Intel x86), Motorola 68000, IBM 360– PDP-11– VAX (really Memory/Memory)
Register and Memory Operands
ISA Comparison