1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding...

30
1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance

Transcript of 1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding...

Page 1: 1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.

1

CS/COE0447

Computer Organization & Assembly Language

CHAPTER 4

Assessing and Understanding Performance

Page 2: 1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.

2

Program Performance

• Program performance is measured in terms of time!

• Program execution time deals with– Number of instructions executed to complete a job– How many clock cycles are needed to execute a

single instruction– The length of the clock cycle (clock cycle time)

Page 3: 1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.

3

Clock, Clock Cycle Time

• Circuits in computers are “clocked”• At each clock rising (or falling) edge, some specified actions

are done, usually within the next rising (or falling) edge• Instructions typically require more than one cycle to execute

Function block(made of circuits)

clock

clock cycle time

Page 4: 1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.

4

Program Performance

• time = (# of clock cycles) (clock cycle time)• # of clock cycles = (# of instructions executed) (averag

e cycles per instruction) • time = (# of instructions executed) (average clock cycl

es per instruction) (clock cycle time)• time = cycle x s cycle• cycle = instruction x cycle (ave) SO: instruction• time (s) = instruction x cycle (ave) x s instruction cycle

Page 5: 1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.

5

Example 1

• You have a machine with a CPU running at 1GHz. The same company releases its 2GHz CPU with 100% compatibility with the existing 1GHz CPU, and you are considering upgrading. What is the expected performance improvement from doing so? Assume that programs have 40% memory-access instructions, and each memory access takes 10ns on average. All other instructions take exactly one cycle for execution. Answer: in class

Page 6: 1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.

6

From WikiPedia

• Amdahl's law, named after computer architect Gene Amdahl, is used to find the maximum expected improvement to an overall system when only part of the system is improved. It is often used in parallel computing to predict the theoretical maximum speedup using multiple processors.

Page 7: 1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.

7

Amdahl’s Law (cont)

• The law is concerned with the speedup achievable from an improvement to a computation that affects a proportion P of that computation where the improvement has a speedup of S.

• Amdahl's law states that the overall speedup of applying the improvement will be:

1((1-P) + P/S)• Our example: P = .6 and S = 2• 1/((1-.6) + (.6/2)) = 1.43• This is the maximum speedup possible

Page 8: 1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.

8

Example 2

• If a computer issues 30 network requests per second and each request is on average 64 KB, will a 100 Mbit Ethernet link be sufficient? (printer, accessing files, …)

• KB = 10^3 bytes• Byte = 8 bits• Mbit = 10^6• A 100 Mbit Ethernet: 10^8 bit/s “bitrate”

Page 9: 1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.

9

Answer

• Ethernet: 10^8 bit/s• KB = Kilobyte; Kilo = 10^3; byte = 8 bits• 30 request/s * 64 KB/request * 10^3 x 8

bit/KB

(the units cancel to leave bit/s)• 30 * 64 * 8 * 10^3 = 3 * 6.4 * 8 * 10^5 < 10^8

(or use a calculator to compute it exactly)

So, yes, it is sufficient

Page 10: 1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.

10

Why Performance Evaluation?

DESIGN

EVALUATION

Page 11: 1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.

11

Defining Performance

• What do you mean when you say a computer has better performance than another?

• We need a “metric” for comparison– One metric may not fully characterize a system

• a number of metrics may be relevant

– Important metrics for computer systems• Response time (a.k.a. execution time)

• Throughput

Page 12: 1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.

12

Response Time vs. Throughput

• Which has higher performance?– Time to deliver 1 passenger– Time to deliver 400 passengers

• Time for 1 job is called– Response time or execution time

• Jobs per day is called– Throughput or bandwidth

Plane DC to Paris Top Speed PassengersThroughput

(pmph)

Boeing 747 6.5 hours 610 mph 470 286,700

BAD/Sud Concorde

3 hours 1350 mph 132 178,200

Page 13: 1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.

13

Some Definitions

• Throughput is in units of things per second– Bigger is better

• If we are primarily concerned with response time– Performance = 1 / execution time– Bigger is better shorter execution time

• “Machine A is N times faster than B” – = performance (A) / performance (B)

= execution time (B) / execution time (A)

Page 14: 1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.

14

Response Time vs. Throughput

• Time of Concorde vs. Boeing 747?– Concord is (6.5 hours/3 hours) faster– 2.2 times faster

• Throughput of Boeing 747 vs. Concorde– 286,700 pmph / 178,200 pmph– 1.6 times higher

• Boeing 747 is 1.6 times (or 60%) higher in terms of throughput

• Concorde is 2.2 times (or 120%) faster in terms of flying time (response time)

• We will focus primarily on execution time for a single job for the remaining discussions

Page 15: 1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.

15

Regarding Time

• Straightforward definition of time– Total time to complete a task, including disk accesses,

memory accesses, other I/O activities, operating system overheads, …

– Terms for this: “Real time”, “response time”, “elapsed time”

• Alternative: time spent by CPU only on your program (since multiple processes may run at the same time)– “CPU execution time” or “CPU time”– Often divided into system CPU time (OS) and user CPU

time (user program)

Page 16: 1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.

16

Clock

Page 17: 1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.

17

Measuring Time

• In terms of seconds• CPU time: computers are constructed using

digital circuitry running at a “clock”– Constant rate– Determines when events take place

• Clock cycle time = length of a clock or clock period = 1 / clock rate– 1ns if 1GHz clock– 0.5ns if 2GHz clock– 0.25ns if 4GHz clock

Page 18: 1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.

18

Measuring Time w/ Clocks

• CPU execution time for program– Clock cycles for a program clock cycle time– Clock cycles for a program / clock rate

Page 19: 1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.

19

Measuring Time w/ Clocks, cont’d

• Total clock cycles for a program– Instructions for a program (=instruction count)

average clock cycles per instruction CPI

• Time=(# of instr.)CPI(clock cycle time)

• Looking at the units:– s = inst * cycle/inst * s/cycle

Page 20: 1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.

20

Workload

• A set of programs run on a computer is a workload– Actual collection of applications– Synthetic programs (for experimentation)

• To evaluate two computer systems, a user would simply compare the execution time of the workload on the two computers

Page 21: 1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.

21

Benchmarks• A set of applications relevant for performance evaluation

• SPEC (Standard Performance Evaluation Corporation)– CPU benchmarks– Server benchmarks– Graphics benchmarks– …

• EEMBC (Embedded Microprocessor Benchmark Consortium)– Automotive– Consumer– Network– Telecom– Office– …

Page 22: 1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.

22

Summarizing Performance

• A is 10 times faster than B for program 1• B is 10 times faster than A for program 2

• Although the above statements are correct individually, they present a confusing picture!

Computer A Computer B

Program 1 (sec) 1 10

Program 2 (sec) 1000 100

Total time (sec) 1001 110

Page 23: 1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.

23

Summarizing Performance, cont’d

• Arithmetic mean (AM) = ( Timei) / N

• Weighted AM = ( TimeiWi), Wi = 1

• AM is a special case of weighted AM where Wi = 1/N

Page 24: 1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.

24

SPEC Benchmark

• SPEC CPU2000 benchmark– 12 integer benchmarks– 14 floating-point benchmarks

• To get a SPECmark– Run each program on the target machine– Get the performance ratio by dividing the pre-

provided execution time (based on an old SUN workstation) with the execution time obtained

Page 25: 1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.

25

Amdahl’s Law(in terms of time)

• An optimization is usually applicable to only a limited portion of program execution– E.g., A larger cache; improved CPU frequenc

y; improved FSB frequency; …

• Timeimproved = Timeunaffected + Timeaffected/(Improvement Factor)

• “Make the common case fast!”

Page 26: 1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.

26

Amdahl’s Law - example

• A program runs in 100 seconds on a computer, with multiply operations responsible for 80 seconds of this time

• How much do I have to improve the speed of multiplication, if I want my program to run 5 times faster?

• Timeimproved = Timeunaffected + Timeaffected/(Improvement Factor)

• 20 s = 20 s + 80 s / n• 0 = 80 s / n• There is no amount by which we can improve multiply to

achieve a fivefold increase in performance!

Page 27: 1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.

27

Fallacies and Pitfalls

• Pitfall: Expecting the improvement of one aspect of a computer to increase performance by an amount proportional to the size of the improvement

• Pitfall: Using a subset of the performance equation as a performance metric

Page 28: 1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.

28

To Summarize…

• Performance evaluation is an important stage of an engineering process

• We are interested in measuring computer performance– Software improvement– Hardware improvement– …

• Defining performance– Need relevant metric!

• Latency vs. throughput

Page 29: 1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.

29

To Summarize…, cont’d

• Response time = time to finish a given single job

• Throughput = # of jobs done in a second

• Time = # of clock cycles clock cycle time

• # of clock cycles = # of instructions CPI

Page 30: 1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.

30

To Summarize…, cont’d

• Best workload is one that comes from real applications

• Benchmarks are a set of applications to aid performance evaluation

• Summarizing results– Arithmetic mean (AM)– Weighted mean

• Amdahl’s law– Specifies overall performance improvement due to a

limited-scope optimization