CET520 -- Gannod1 Chapter 1 Fundamentals of Computer Design.

42
CET520 -- Gannod 1 Chapter 1 Fundamentals of Computer Design

Transcript of CET520 -- Gannod1 Chapter 1 Fundamentals of Computer Design.

CET520 -- Gannod 1

Chapter 1

Fundamentals of Computer Design

CET520 -- Gannod 2

A bit of history

• 1945 – no stored-program computers

• Today, less than $1,000 will buy a personal computer more powerful than a computer bought in 1980 for $1 million.

• What has contributed to this rapid increase?– Technology– Computer Design

CET520 -- Gannod 3

Technology vs. Design Improvements

COPYRIGHT 2003 MORGAN KAUFMANN PUBLISHERS, INC. ALL RIGHTS RESERVED

CET520 -- Gannod 4

Course Emphasis• “by 2001, the difference between

the highest-performance microprocessors and what would have been obtained by relying solely on technology, including improved circuit design, was about a factor of 15”

• In this course we will discuss:– Architectural design techniques

used– Associated compiler improvements– Quantitative approach to computer

design and analysis

CET520 -- Gannod 5

Figure 1.3

Feature Desktop Server EmbeddedPrice of system

$1000-$10,000

$10,000-$10,000,000

$10-$100,000

Price of micr. proc. module

$100-$1000 $200-$2000 $.20-$200

# sold per year 150,000,000 4,000,000 300,000,000

System design issues

price-performance, graphics performance

throughput, availability, scalability

price, power consumption, app-spec perf.

CET520 -- Gannod 6

Definitions• Instruction Set Architecture refers

to visible instruction set.• Implementation has 2 components:

– Organization– Hardware

• Organization includes high-level aspects of design.– SPARC-2 and SPARC-20 have same

ISA but different organizations.

• Hardware refers to specifics of a machine.– Two versions of Silicon Graphics Indy

have same ISA, same organization, but different hardware (clock rate, cache structure).

• Architecture covers all 3 aspects of computer design.

CET520 -- Gannod 7

Computer Design• Architects must design

computer under several constraints:– Price– Power– Performance– Functional requirements

• Application software often drives functional requirements (Figure 1.4, pg. 10)

• Architect must prioritize requirements and try to optimize design in light of all constraints.

CET520 -- Gannod 8

Functional Requirements

Application Area•General purpose

•Scientific

•Commercial servers

•Embedded Computing

Level of Software Compatibility•Programming Lang

•Binary Operating System requirements•Addr. Space

•Memory management

•Protection

Standards•Floating Point

•I/O bus

•OS

•Networks

•Programming Lang

CET520 -- Gannod 9

Cost in a System

Cabinet 6%Processor Board 37%I/O devices 37%Software 20%

-- processor is single most expensive item (22%)-- monitor is second most costly item (19%)

Cost vs. Price

-- cost is not the same as price-- price includes direct costs (labor, scrap, warranty), gross margin(R&D, marketing, sales, building rental, etc.), discounts-- Changing cost by $1000 could change price by $3000-$4000

CET520 -- Gannod 10

Performance

• When we say one computer has better performance than another, what do we mean?

• Different people may mean different things:– Single user– Manager of large, multi-user

system

CET520 -- Gannod 11

Performance – key terms• Execution time (response time)

– time to execute a job from start to finish

• Wall-clock time (elapsed time)• CPU time

– user time– system time

• Throughput– number of jobs processed per

time unit

• System performance refers to elapsed time on an unloaded system

• CPU performance refers to user CPU time on an unloaded system

CET520 -- Gannod 12

Performance – key formulas

• Performance for completing task X is

XX imeExecutionT

ePerformanc1

• Allows us to compare performance of different machines on the same task.

•Execution time for program:

CPCPIICExT

CET520 -- Gannod 13

Improving Performance• Improving performance is a

hardware designers main goal. Given the previous formulas, how can a designer improve performance?

CET520 -- Gannod 14

If only it were that simple...

• Unfortunately, these factors are NOT independent– Changing instruction set to lower

the instruction count may lead to an organization with a slower clock cycle time…

– small IC may not be fastest because complex instructions require more clock cycles.

• There are many tradeoffs when designing for better performance.

CET520 -- Gannod 15

Performance Equation

• IC: is a function of the instruction set architecture and compiler technology.

• CPI: is primarily a function of implementation.

• CP: is primarily a function of the hardware technology.

CET520 -- Gannod 16

Frequency vs. Period• Execution time is based on the

clock period.

• We are often given the clock rate (frequency) in MHz. What is MHz?

• What is relationship between rate (f) and clock period (cp)?

• Example– Clock rate 500 MHz – What’s the clock period?

CET520 -- Gannod 17

Benchmarking -- key terms• Workload: set of programs

(instructions) that run on the computer system.

• Benchmarks: programs chosen specifically to measure performance. Workload is meant to predict typical performance.– Real Applications

– Modified (or scripted) applications

– Kernels – pieces of real programs

– Toy benchmarks – small, well-known programs

– Synthetic benchmarks – match average frequency of operations in real programs.

CET520 -- Gannod 18

Benchmark suites

• A benchmark suite is a collection of benchmark programs that contain a variety of applications.– E.g., SPEC92

• Advantage: weakness of one benchmark is lessened by presence of other benchmarks.

• When we have several benchmarks we need to summarize performance of entire suite to determine which system has better performance.

CET520 -- Gannod 19

Summarizing Performance

• Typically we measure several different applications (not just 1)

• People like to have a single number to measure performance.

Computer A Computer B Computer C

Program 1 1 sec 10 sec 20 secProgram 2 1000 sec 100 sec 20 sec

• Total Execution Time

• Weighted Execution Time

• Normalized Execution Time

CET520 -- Gannod 20

Total Execution Time

• Simply add execution times of all benchmarks in suite.

• Prev Example:

• Arithmetic mean is closely related:

n

iiExT

nAM

1

1

CET520 -- Gannod 21

Weighted Execution Time• Are all programs run an equal

number of times?

• Weighted arithmetic mean:

• Prev Example:– w1 = .5 w2 = .5

– w1 = .909 w2 = .091

– w1= .999 w2 = .001

n

iii ExTwWAM

1

CET520 -- Gannod 22

Normalized ExT• Normalize times to a reference

machine and then summarize.• Geometric mean of normalized

times:

• Prev Example:– Normalize to A

– Normalize to C

n

n

i

ExTGM

1

normalized

CET520 -- Gannod 23

Pros and Cons of GM

• Pros:– GM is independent of running

times of individual programs– Independent of base machine.

• Cons:– Does not predict execution time.– Encourages hardware and

software designers to focus on benchmarks that are easiest to improved rather than the slowest ones.

CET520 -- Gannod 24

Amdahl’s Law• One important principle in

computer design is: Make the common case fast.

• Amdahl’s Law defines the speedup that can be gained by using a particular feature:

• Not always possible to use enhancement:

tenhancemen theusing ExT

tenhancemen using w/oExTSpeedup

enh

enhenholdnew Speedup

FracFracExTExT )1(

CET520 -- Gannod 25

Speedup

• speedup = old Ext / new Ext

CET520 -- Gannod 26

Example 1• Implementations of FP square

root vary in performance. Suppose FPSQR is responsible for 20% of execution time, and all FP operations are responsible for 50% of execution time. Proposal 1: add hardware to speed up FPSQR by factor of 10. Proposal 2: make all FP instructions run 2 times faster. Which proposal should we accept?

CET520 -- Gannod 27

Solution• SpeedupFPSQR

• SpeedupFP

CET520 -- Gannod 28

Calculating CPI• Calculate from Performance

Equation:

• During design, we don’t know what ExT time is…

• Calculate CPI from detailed understanding of the architecture:

CPIC

ExTCPI

)(

cycclock CPU

1

1

1

IC

ICCPI

IC

ICCPICPI

ICCPI

in

ii

n

iii

n

iii

CET520 -- Gannod 29

Example 2• Consider 3 compilers (1,2,3) for the

same machine. The machine has 3 classes (A,B,C) of instructions with the following characteristics:

Class CPIA 1B 2C 3

• The clock rate is f=100MHz• For a particular program, the compilers

generate code with the following IC values (in millions of instructions)

Compiler ICA ICB ICC

1 6 2 12 2 2 23 10 1 1

• Which compiler generates the fastest code?

CET520 -- Gannod 30

Solution• First, calculate the CPI for each

compiler:

• Execution time for the 3 compilers:

CET520 -- Gannod 31

Measuring Performance factors

• CP:– Easy to do after the machine is built!

– During design, use timing estimators to measure critical paths in design

• IC:– Compilers, profilers

– Often interested in finding instruction mix as well (simulators/execution-based monitoring)

• CPI:– Simplistic example we did previously

not very accurate in modern processors (pipeline/cache effects)

– CPIi = Pipeline CPIi + Memory CPIi

CET520 -- Gannod 32

MIPS

• Million Instructions Per Second

• Suppose we want to compare performance of task on two different implementations of the SAME architecture.

BBBB

AAAA

CPCPIICExT

CPCPIICExT

• Same architecture => same IC

A

A

A

A

A

AAA instrcycinstr

cycCPCPI

secsec

• Smaller sec/instr => Larger instr/sec

CET520 -- Gannod 33

(native) MIPS

• Note: this measure is ONLY valid for comparing SAME program on SAME architecture!!!

• Consider two implementations:– CPIA = 1.5

– freqA = 400 MHz

– CPIB = 1.8

– freqB = 500 MHz

• Which has better native MIPS rating?

610

ExT

ICMIPS

CET520 -- Gannod 34

Solution• Native MIPSA

• Native MIPSB

CET520 -- Gannod 35

MIPS, MOPS and other FLOPS

• MIPS : – Millions of Instructions Per

Second

• MOPS:– Millions of Operations Per

Second

• FLOPS– FLoating point Operations Per

Second

CET520 -- Gannod 36

MIPS is unreliable

Recall Example #2– f=100MHz

Compiler ICA ICB ICC

1 6 2 12 2 2 23 10 1 1

• Find MIPS rating– C1:

– C2:

– C3:

• Conclusions using MIPS:

CET520 -- Gannod 37

Memory Hierarchy• Computer systems usually have

several different types of memory organized in a hierarchy. WHY?

CPURegisters

Cache

Memory

disk

CET520 -- Gannod 38

Locality• Locality of reference: programs

tend to reuse data and instructions that they have used before (recently)

• Two types of locality– Temporal: recently axccessed

items are likely to be accessed in the near future

– Spatial: items whose addresses are near one another tend to be referenced close together in time.

• Do we expect instructions or data to have a higher degree of locality?

CET520 -- Gannod 39

Key Terms• Cache hit: CPU finds requested

data in cache• Cache miss: requested data not

in cache.• Miss rate: fraction of cache

accesses that result in miss• Block: amount of data

transferred between cache and memory

• Miss penalty: extra time taken to get requested cache block into cache.

• Page fault: requested data is not in memory.

CET520 -- Gannod 40

Example 4• Suppose cache is 10 times faster

than main memory and that cache can be used 90% of the time. How much speedup do we gain from using the cache?

• Using Amdahl’s Law:

CET520 -- Gannod 41

CPU ExT and the Memory Hierarchy

• Need to expand our definition of CPU ExT:

penalty miss rate miss Instr per Ref Mem IC

penalty miss missesNumber cyc stall Mem

CP cyc) stall Mem cycclock CPU( ExT CPU

CET520 -- Gannod 42

Example 5• A machine has CPI 2 when all

memory accesses hit cache. 40% of the instructions access data. If the miss penalty is 25 clock cycles and the miss rate is 2%, how much faster would the machine be if the hit rate was 100%?

• ExTideal =

• ExTcache =