CET520 -- Gannod1 Chapter 1 Fundamentals of Computer Design.
-
Upload
beatrice-griffith -
Category
Documents
-
view
237 -
download
3
Transcript of CET520 -- Gannod1 Chapter 1 Fundamentals of Computer Design.
CET520 -- Gannod 2
A bit of history
• 1945 – no stored-program computers
• Today, less than $1,000 will buy a personal computer more powerful than a computer bought in 1980 for $1 million.
• What has contributed to this rapid increase?– Technology– Computer Design
CET520 -- Gannod 3
Technology vs. Design Improvements
COPYRIGHT 2003 MORGAN KAUFMANN PUBLISHERS, INC. ALL RIGHTS RESERVED
CET520 -- Gannod 4
Course Emphasis• “by 2001, the difference between
the highest-performance microprocessors and what would have been obtained by relying solely on technology, including improved circuit design, was about a factor of 15”
• In this course we will discuss:– Architectural design techniques
used– Associated compiler improvements– Quantitative approach to computer
design and analysis
CET520 -- Gannod 5
Figure 1.3
Feature Desktop Server EmbeddedPrice of system
$1000-$10,000
$10,000-$10,000,000
$10-$100,000
Price of micr. proc. module
$100-$1000 $200-$2000 $.20-$200
# sold per year 150,000,000 4,000,000 300,000,000
System design issues
price-performance, graphics performance
throughput, availability, scalability
price, power consumption, app-spec perf.
CET520 -- Gannod 6
Definitions• Instruction Set Architecture refers
to visible instruction set.• Implementation has 2 components:
– Organization– Hardware
• Organization includes high-level aspects of design.– SPARC-2 and SPARC-20 have same
ISA but different organizations.
• Hardware refers to specifics of a machine.– Two versions of Silicon Graphics Indy
have same ISA, same organization, but different hardware (clock rate, cache structure).
• Architecture covers all 3 aspects of computer design.
CET520 -- Gannod 7
Computer Design• Architects must design
computer under several constraints:– Price– Power– Performance– Functional requirements
• Application software often drives functional requirements (Figure 1.4, pg. 10)
• Architect must prioritize requirements and try to optimize design in light of all constraints.
CET520 -- Gannod 8
Functional Requirements
Application Area•General purpose
•Scientific
•Commercial servers
•Embedded Computing
Level of Software Compatibility•Programming Lang
•Binary Operating System requirements•Addr. Space
•Memory management
•Protection
Standards•Floating Point
•I/O bus
•OS
•Networks
•Programming Lang
CET520 -- Gannod 9
Cost in a System
Cabinet 6%Processor Board 37%I/O devices 37%Software 20%
-- processor is single most expensive item (22%)-- monitor is second most costly item (19%)
Cost vs. Price
-- cost is not the same as price-- price includes direct costs (labor, scrap, warranty), gross margin(R&D, marketing, sales, building rental, etc.), discounts-- Changing cost by $1000 could change price by $3000-$4000
CET520 -- Gannod 10
Performance
• When we say one computer has better performance than another, what do we mean?
• Different people may mean different things:– Single user– Manager of large, multi-user
system
CET520 -- Gannod 11
Performance – key terms• Execution time (response time)
– time to execute a job from start to finish
• Wall-clock time (elapsed time)• CPU time
– user time– system time
• Throughput– number of jobs processed per
time unit
• System performance refers to elapsed time on an unloaded system
• CPU performance refers to user CPU time on an unloaded system
CET520 -- Gannod 12
Performance – key formulas
• Performance for completing task X is
XX imeExecutionT
ePerformanc1
• Allows us to compare performance of different machines on the same task.
•Execution time for program:
CPCPIICExT
CET520 -- Gannod 13
Improving Performance• Improving performance is a
hardware designers main goal. Given the previous formulas, how can a designer improve performance?
CET520 -- Gannod 14
If only it were that simple...
• Unfortunately, these factors are NOT independent– Changing instruction set to lower
the instruction count may lead to an organization with a slower clock cycle time…
– small IC may not be fastest because complex instructions require more clock cycles.
• There are many tradeoffs when designing for better performance.
CET520 -- Gannod 15
Performance Equation
• IC: is a function of the instruction set architecture and compiler technology.
• CPI: is primarily a function of implementation.
• CP: is primarily a function of the hardware technology.
CET520 -- Gannod 16
Frequency vs. Period• Execution time is based on the
clock period.
• We are often given the clock rate (frequency) in MHz. What is MHz?
• What is relationship between rate (f) and clock period (cp)?
• Example– Clock rate 500 MHz – What’s the clock period?
CET520 -- Gannod 17
Benchmarking -- key terms• Workload: set of programs
(instructions) that run on the computer system.
• Benchmarks: programs chosen specifically to measure performance. Workload is meant to predict typical performance.– Real Applications
– Modified (or scripted) applications
– Kernels – pieces of real programs
– Toy benchmarks – small, well-known programs
– Synthetic benchmarks – match average frequency of operations in real programs.
CET520 -- Gannod 18
Benchmark suites
• A benchmark suite is a collection of benchmark programs that contain a variety of applications.– E.g., SPEC92
• Advantage: weakness of one benchmark is lessened by presence of other benchmarks.
• When we have several benchmarks we need to summarize performance of entire suite to determine which system has better performance.
CET520 -- Gannod 19
Summarizing Performance
• Typically we measure several different applications (not just 1)
• People like to have a single number to measure performance.
Computer A Computer B Computer C
Program 1 1 sec 10 sec 20 secProgram 2 1000 sec 100 sec 20 sec
• Total Execution Time
• Weighted Execution Time
• Normalized Execution Time
CET520 -- Gannod 20
Total Execution Time
• Simply add execution times of all benchmarks in suite.
• Prev Example:
• Arithmetic mean is closely related:
n
iiExT
nAM
1
1
CET520 -- Gannod 21
Weighted Execution Time• Are all programs run an equal
number of times?
• Weighted arithmetic mean:
• Prev Example:– w1 = .5 w2 = .5
– w1 = .909 w2 = .091
– w1= .999 w2 = .001
n
iii ExTwWAM
1
CET520 -- Gannod 22
Normalized ExT• Normalize times to a reference
machine and then summarize.• Geometric mean of normalized
times:
• Prev Example:– Normalize to A
– Normalize to C
n
n
i
ExTGM
1
normalized
CET520 -- Gannod 23
Pros and Cons of GM
• Pros:– GM is independent of running
times of individual programs– Independent of base machine.
• Cons:– Does not predict execution time.– Encourages hardware and
software designers to focus on benchmarks that are easiest to improved rather than the slowest ones.
CET520 -- Gannod 24
Amdahl’s Law• One important principle in
computer design is: Make the common case fast.
• Amdahl’s Law defines the speedup that can be gained by using a particular feature:
• Not always possible to use enhancement:
tenhancemen theusing ExT
tenhancemen using w/oExTSpeedup
enh
enhenholdnew Speedup
FracFracExTExT )1(
CET520 -- Gannod 26
Example 1• Implementations of FP square
root vary in performance. Suppose FPSQR is responsible for 20% of execution time, and all FP operations are responsible for 50% of execution time. Proposal 1: add hardware to speed up FPSQR by factor of 10. Proposal 2: make all FP instructions run 2 times faster. Which proposal should we accept?
CET520 -- Gannod 28
Calculating CPI• Calculate from Performance
Equation:
• During design, we don’t know what ExT time is…
• Calculate CPI from detailed understanding of the architecture:
CPIC
ExTCPI
)(
cycclock CPU
1
1
1
IC
ICCPI
IC
ICCPICPI
ICCPI
in
ii
n
iii
n
iii
CET520 -- Gannod 29
Example 2• Consider 3 compilers (1,2,3) for the
same machine. The machine has 3 classes (A,B,C) of instructions with the following characteristics:
Class CPIA 1B 2C 3
• The clock rate is f=100MHz• For a particular program, the compilers
generate code with the following IC values (in millions of instructions)
Compiler ICA ICB ICC
1 6 2 12 2 2 23 10 1 1
• Which compiler generates the fastest code?
CET520 -- Gannod 30
Solution• First, calculate the CPI for each
compiler:
• Execution time for the 3 compilers:
CET520 -- Gannod 31
Measuring Performance factors
• CP:– Easy to do after the machine is built!
– During design, use timing estimators to measure critical paths in design
• IC:– Compilers, profilers
– Often interested in finding instruction mix as well (simulators/execution-based monitoring)
• CPI:– Simplistic example we did previously
not very accurate in modern processors (pipeline/cache effects)
– CPIi = Pipeline CPIi + Memory CPIi
CET520 -- Gannod 32
MIPS
• Million Instructions Per Second
• Suppose we want to compare performance of task on two different implementations of the SAME architecture.
BBBB
AAAA
CPCPIICExT
CPCPIICExT
• Same architecture => same IC
A
A
A
A
A
AAA instrcycinstr
cycCPCPI
secsec
• Smaller sec/instr => Larger instr/sec
CET520 -- Gannod 33
(native) MIPS
• Note: this measure is ONLY valid for comparing SAME program on SAME architecture!!!
• Consider two implementations:– CPIA = 1.5
– freqA = 400 MHz
– CPIB = 1.8
– freqB = 500 MHz
• Which has better native MIPS rating?
610
ExT
ICMIPS
CET520 -- Gannod 35
MIPS, MOPS and other FLOPS
• MIPS : – Millions of Instructions Per
Second
• MOPS:– Millions of Operations Per
Second
• FLOPS– FLoating point Operations Per
Second
CET520 -- Gannod 36
MIPS is unreliable
Recall Example #2– f=100MHz
Compiler ICA ICB ICC
1 6 2 12 2 2 23 10 1 1
• Find MIPS rating– C1:
– C2:
– C3:
• Conclusions using MIPS:
CET520 -- Gannod 37
Memory Hierarchy• Computer systems usually have
several different types of memory organized in a hierarchy. WHY?
CPURegisters
Cache
Memory
disk
CET520 -- Gannod 38
Locality• Locality of reference: programs
tend to reuse data and instructions that they have used before (recently)
• Two types of locality– Temporal: recently axccessed
items are likely to be accessed in the near future
– Spatial: items whose addresses are near one another tend to be referenced close together in time.
• Do we expect instructions or data to have a higher degree of locality?
CET520 -- Gannod 39
Key Terms• Cache hit: CPU finds requested
data in cache• Cache miss: requested data not
in cache.• Miss rate: fraction of cache
accesses that result in miss• Block: amount of data
transferred between cache and memory
• Miss penalty: extra time taken to get requested cache block into cache.
• Page fault: requested data is not in memory.
CET520 -- Gannod 40
Example 4• Suppose cache is 10 times faster
than main memory and that cache can be used 90% of the time. How much speedup do we gain from using the cache?
• Using Amdahl’s Law:
CET520 -- Gannod 41
CPU ExT and the Memory Hierarchy
• Need to expand our definition of CPU ExT:
penalty miss rate miss Instr per Ref Mem IC
penalty miss missesNumber cyc stall Mem
CP cyc) stall Mem cycclock CPU( ExT CPU