Lecture 2: Performance Evaluation

22
Lecture 2: Performance Lecture 2: Performance Evaluation Evaluation Performance definition, Performance definition, benchmark, summarizing benchmark, summarizing performance, Amdahl’s performance, Amdahl’s law, and CPI law, and CPI

description

Lecture 2: Performance Evaluation. Performance definition, benchmark, summarizing performance, Amdahl’s law, and CPI. What Does Performance Mean?. Response time A simulation program finishes in 5 minutes Throughput A web server serves 5 million request per second Other metrics - PowerPoint PPT Presentation

Transcript of Lecture 2: Performance Evaluation

Page 1: Lecture 2: Performance Evaluation

Lecture 2: Performance Lecture 2: Performance EvaluationEvaluation

Performance definition, Performance definition, benchmark, summarizing benchmark, summarizing

performance, Amdahl’s law, performance, Amdahl’s law, and CPIand CPI

Page 2: Lecture 2: Performance Evaluation

What Does Performance What Does Performance Mean?Mean?

Response timeResponse time– A simulation program finishes in 5 A simulation program finishes in 5

minutesminutes ThroughputThroughput

– A web server serves 5 million request per A web server serves 5 million request per secondsecond

Other metricsOther metrics– MIPS (million instruction per second)MIPS (million instruction per second)– MFLOPSMFLOPS– Clock frequencyClock frequency

Page 3: Lecture 2: Performance Evaluation

Execution TimeExecution Time Processor design is concerned with Processor design is concerned with

processor consumed by program processor consumed by program execution. Shorter execution time=>execution. Shorter execution time=>– Shorter response timeShorter response time– Higher throughputHigher throughput

Execution time = Execution time = #inst×CPI×Cycletime#inst×CPI×Cycletime– What affects #inst, CPI, and cycle time?What affects #inst, CPI, and cycle time?– Almost all designs can be interpretedAlmost all designs can be interpreted

Any other metrics is meaningful only if Any other metrics is meaningful only if consistent with execution timeconsistent with execution time

Page 4: Lecture 2: Performance Evaluation

Performance of ComputersPerformance of ComputersPerformance is defined for Performance is defined for a program and a program and

a machinea machine..How to compare computers? Need How to compare computers? Need

benchmark programs:benchmark programs:– Real applications: scientific programs, Real applications: scientific programs,

compilers, text-processing software, image compilers, text-processing software, image processingprocessing

– Modified applications: providing portability and Modified applications: providing portability and focusfocus

– Kernels: good to isolate performance of Kernels: good to isolate performance of individual featuresindividual features

Lmbench: measure latency and bandwidth of Lmbench: measure latency and bandwidth of memory, file system, networking, etc.memory, file system, networking, etc.

– Toy benchmarksToy benchmarks– Synthetic benchmarks: matching average Synthetic benchmarks: matching average

execution profileexecution profile

Page 5: Lecture 2: Performance Evaluation

Performance ComparisonPerformance Comparison

nn: speedup if we are considering an : speedup if we are considering an enhancement, optimization, etc.enhancement, optimization, etc.

What does “improving” mean?What does “improving” mean?– Improve performance: decrease execution time, Improve performance: decrease execution time,

increase throughputincrease throughput– Improve execution time: decrease execution timeImprove execution time: decrease execution time– Degrade performance: the reverse of the above; Degrade performance: the reverse of the above;

brings negative speedupbrings negative speedup

nx

y

y

x

timeExecution

timeExecution

ePerformanc

ePerformanc

“X is n times faster than Y”:

Page 6: Lecture 2: Performance Evaluation

Benchmark SuiteBenchmark Suite Benchmark suite is a collection of benchmarks Benchmark suite is a collection of benchmarks

with a variety of applicationswith a variety of applications– Alleviating weakness of a single benchmarkAlleviating weakness of a single benchmark– More representative for computer designers to evaluate More representative for computer designers to evaluate

their designtheir design– Benchmarks test both computer and compilers, Benchmarks test both computer and compilers,

and OS in many casesand OS in many cases Desktop benchmarks: CPU, memory, and graphics Desktop benchmarks: CPU, memory, and graphics

performanceperformance Sever benchmarks: throughput-oriented, I/O and Sever benchmarks: throughput-oriented, I/O and

OS intensiveOS intensive Embedded benchmarks: measuring the ability to Embedded benchmarks: measuring the ability to

meet deadline and save powermeet deadline and save power

Page 7: Lecture 2: Performance Evaluation

Summarizing PerformanceSummarizing PerformanceGiven the performance of a set of programs, Given the performance of a set of programs,

how to evaluate the performance of how to evaluate the performance of machines?machines?

AA BB CC

P1 (secs)P1 (secs) 11 1010 2020

P2 (secs)P2 (secs) 10001000 100100 2020

Total (secs)Total (secs) 10011001 110110 4040

Which computer is the “best” one?Which computer is the “best” one?

Page 8: Lecture 2: Performance Evaluation

Arithmetic MeanArithmetic Mean

Total execution time / (number of Total execution time / (number of programs)programs)

– Simple and intuitiveSimple and intuitive– Representative if the user run the Representative if the user run the

programs an equal number of timesprograms an equal number of times

n

in 1iTime

1

Page 9: Lecture 2: Performance Evaluation

Weighted Arithmetic MeanWeighted Arithmetic Mean

Give (different) weights to different Give (different) weights to different programsprograms

– Considering the frequencies of programs Considering the frequencies of programs in the workloadin the workload

11

iWeight ,TimeWeight1

ii

n

i

n

i

Page 10: Lecture 2: Performance Evaluation

Geometric MeansGeometric Means Based on relative performance to a Based on relative performance to a

reference machinereference machine

Relative performance is consistent with Relative performance is consistent with different reference machinesdifferent reference machines

– If C is 2x faster than B (using B as the reference), If C is 2x faster than B (using B as the reference), B is 2x faster than A (A as the reference), then C B is 2x faster than A (A as the reference), then C is 4x faster than A (A as the reference)is 4x faster than A (A as the reference)

n

n

i

1iratio timeExecution

)Y

Xmean( Geometric

)mean(Y Geometric

)mean(X Geometric

i

i

i

i

Page 11: Lecture 2: Performance Evaluation

Harmonic MeanHarmonic Mean

Given speedups s1, s2, …, s_n, the Given speedups s1, s2, …, s_n, the average speedup by harmonic mean average speedup by harmonic mean isis

n / (1/s1 + 1/s2 + … + 1/s_n)n / (1/s1 + 1/s2 + … + 1/s_n)

Why not arithmetic mean?Why not arithmetic mean?

Page 12: Lecture 2: Performance Evaluation

Amdahl’s LawAmdahl’s Law

We know about performance: defining, We know about performance: defining, measuring, and summarizingmeasuring, and summarizing

How to maximize performance gains How to maximize performance gains from the beginning in our design?from the beginning in our design?

Principle: Make the Common Case Principle: Make the Common Case Fast!Fast!

Page 13: Lecture 2: Performance Evaluation

Amdahl’s LawAmdahl’s Law

Predict overall speedup from “local Predict overall speedup from “local speedup” by an enhancement, speedup” by an enhancement, provided the frequency to use the provided the frequency to use the enhancement is know.enhancement is know.

– ““Local speedup” is related to design and Local speedup” is related to design and optimization objectives, like to double optimization objectives, like to double CPU frequency, to reduce cache latency CPU frequency, to reduce cache latency by halfby half

Page 14: Lecture 2: Performance Evaluation

Amdahl’s LawAmdahl’s Law

enhance

enhancedenhanced

oldnew

Speedup

FractionFraction1

TimeExecution timeExecution

enhanced

enhancedenhanced

new

oldoverall

SpeedupFraction

Fraction-1

1

timeExecution

timeExecution Speedup

Page 15: Lecture 2: Performance Evaluation

Equation Based on Equation Based on Instruction TypesInstruction Types

n

1ii CPIfrequencyn Instructio CPI

timecycleClock n

1ii

CPIi

IC timeCPU

n

1ii

CPIi

IC CyclesClock CPU

timecycleClock CyclesClock CPU timeCPU

i

Page 16: Lecture 2: Performance Evaluation

Make Design Choice Using Make Design Choice Using CPU Time EquationCPU Time Equation

Assume we need to improve the Assume we need to improve the performance of a graphics engine:performance of a graphics engine:

FPFP FPSQRFPSQR OtherOtherFrequencyFrequency 25%25% 2%2% 75%75%CPICPI 4.04.0 2020 1.331.33

Alternative 1: CPIAlternative 1: CPIFPSQRFPSQR 20 20 2 2Alternative 2: CPIAlternative 2: CPIFP FP 44 2.5 2.5

Which one is better? Calculate speedups. Which one is better? Calculate speedups.

Page 17: Lecture 2: Performance Evaluation

Amdahl’s LawAmdahl’s Law

Choice oneChoice one: Speed up FP Square root by 10x: Speed up FP Square root by 10x

Choice twoChoice two: Speed up all FP instruction by 1.6x: Speed up all FP instruction by 1.6x

20% 20% timetime are used by FP Square root, 50% for are used by FP Square root, 50% for all FP install FP inst

Which choice is better?Which choice is better?

Implication: Optimizing for the common case Implication: Optimizing for the common case firstfirst

Page 18: Lecture 2: Performance Evaluation

SPEC CPU BenchmarkSPEC CPU Benchmark

SPEC: Standard Performance SPEC: Standard Performance Evaluation CorporationEvaluation Corporation

CPU-intensive benchmark for CPU-intensive benchmark for evaluating processor performance of evaluating processor performance of workstationworkstation

Four generations: SPEC89, SPEC92, Four generations: SPEC89, SPEC92, SPEC95, and SPEC2000SPEC95, and SPEC2000

Two types of programs: INT and FPTwo types of programs: INT and FP Emphasizing memory system Emphasizing memory system

performance in SPEC2000performance in SPEC2000

Page 19: Lecture 2: Performance Evaluation

SPEC CPU2000 ProfilingSPEC CPU2000 Profiling

Dynamic instruction mixDynamic instruction mixInstructionInstruction Int avgInt avg FP avgFP avg

Load intLoad int 26%26% 15%15%

Store intStore int 10%10% 2%2%

Load fpLoad fp -- 15%15%

Store fpStore fp -- 7%7%

AddAdd 19%19% 23%23%

All fp instAll fp inst -- 41%41%

Cond br.Cond br. 12%12% 4%4%

All ctrl instAll ctrl inst 16%16% 4%4%

Page 20: Lecture 2: Performance Evaluation

Other SPEC BenchmarksOther SPEC Benchmarks

SPECviewperf and SPEapc: 3D SPECviewperf and SPEapc: 3D graphics performancegraphics performance

SPEC JVM98: performance of client-SPEC JVM98: performance of client-side Java virtual machineside Java virtual machine

SPEC JBB2000: Server-cline Java SPEC JBB2000: Server-cline Java applicationapplication

SPEC WEB99: evaluating WWW serversSPEC WEB99: evaluating WWW servers SPEC HPC96: parallel and distributed SPEC HPC96: parallel and distributed

computingcomputing

Page 21: Lecture 2: Performance Evaluation

Server BenchmarksServer Benchmarks

SPEC CPU2000, WBB99, SFS97 SPEC CPU2000, WBB99, SFS97 TPC Measuring the ability of a system TPC Measuring the ability of a system

to handle transactionsto handle transactions– TPC-C: online transaction processing TPC-C: online transaction processing

(OLTP) benchmark (for bank systems)(OLTP) benchmark (for bank systems)– TPC-H: ad hoc decision make supportTPC-H: ad hoc decision make support– TPC-R: decision make support with TPC-R: decision make support with

standard queriesstandard queries– TPC-W: simulating business-oriented TPC-W: simulating business-oriented

transactional web servertransactional web server

Page 22: Lecture 2: Performance Evaluation

Embedded BenchmarkEmbedded Benchmark

EEMBC (Embedded Microprocessor EEMBC (Embedded Microprocessor Benchmark Consortium) benchmarksBenchmark Consortium) benchmarks– Based on kernel performanceBased on kernel performance– Five classes: automotive/industrial, Five classes: automotive/industrial,

consumer networking, office consumer networking, office automation, and telecommunicationsautomation, and telecommunications

Embedded benchmarks are not matureEmbedded benchmarks are not mature