SYSTEM-LEVEL PERFORMANCE METRICS FOR MULTIPROGRAM WORKLOADS Presented by Ankit Patel Authors: Stijn...

33
SYSTEM-LEVEL SYSTEM-LEVEL PERFORMANCE PERFORMANCE METRICS FOR METRICS FOR MULTIPROGRAM MULTIPROGRAM WORKLOADS WORKLOADS Presented Presented by Ankit Patel by Ankit Patel Authors: Stijn Everman Authors: Stijn Everman Lieven Lieven Eeckhout Eeckhout

Transcript of SYSTEM-LEVEL PERFORMANCE METRICS FOR MULTIPROGRAM WORKLOADS Presented by Ankit Patel Authors: Stijn...

Page 1: SYSTEM-LEVEL PERFORMANCE METRICS FOR MULTIPROGRAM WORKLOADS Presented by Ankit Patel Authors: Stijn Everman Lieven Eeckhout Lieven Eeckhout.

SYSTEM-LEVEL SYSTEM-LEVEL PERFORMANCE PERFORMANCE METRICS FOR METRICS FOR

MULTIPROGRAM MULTIPROGRAM WORKLOADSWORKLOADS

Presented Presented by Ankit Patelby Ankit Patel

Authors: Stijn EvermanAuthors: Stijn Everman Lieven EeckhoutLieven Eeckhout

Page 2: SYSTEM-LEVEL PERFORMANCE METRICS FOR MULTIPROGRAM WORKLOADS Presented by Ankit Patel Authors: Stijn Everman Lieven Eeckhout Lieven Eeckhout.

Summary of this paperSummary of this paper

Creates theoretical foundation for Creates theoretical foundation for performance measurement of a performance measurement of a given system, from a mathematical given system, from a mathematical standpointstandpoint

From whose perspective should we From whose perspective should we measure the performance of a given measure the performance of a given system?system? UserUser SystemSystem Combination of both Combination of both

Page 3: SYSTEM-LEVEL PERFORMANCE METRICS FOR MULTIPROGRAM WORKLOADS Presented by Ankit Patel Authors: Stijn Everman Lieven Eeckhout Lieven Eeckhout.

Current performance Current performance measurementmeasurement

Researchers have reached the Researchers have reached the consensus that the performance metric consensus that the performance metric of choice for assessing a single of choice for assessing a single program’s performance is its execution program’s performance is its execution timetime

For single-threaded programs, For single-threaded programs, execution time is proportional to CPI execution time is proportional to CPI (Cycles Per Instructions) or inversely (Cycles Per Instructions) or inversely proportional to IPC (Instructions Per proportional to IPC (Instructions Per Cycle)Cycle)

Page 4: SYSTEM-LEVEL PERFORMANCE METRICS FOR MULTIPROGRAM WORKLOADS Presented by Ankit Patel Authors: Stijn Everman Lieven Eeckhout Lieven Eeckhout.

Performance for Performance for multithreaded programsmultithreaded programs

Only CPI calculation is poor Only CPI calculation is poor performance metrics.performance metrics.

It should use total execution time It should use total execution time while measuring performance.while measuring performance.

Page 5: SYSTEM-LEVEL PERFORMANCE METRICS FOR MULTIPROGRAM WORKLOADS Presented by Ankit Patel Authors: Stijn Everman Lieven Eeckhout Lieven Eeckhout.

How should I measure How should I measure system performance?system performance?

Page 6: SYSTEM-LEVEL PERFORMANCE METRICS FOR MULTIPROGRAM WORKLOADS Presented by Ankit Patel Authors: Stijn Everman Lieven Eeckhout Lieven Eeckhout.

System-level performance System-level performance criteriacriteria

The criteria for evaluating The criteria for evaluating multiprogram computer systems are multiprogram computer systems are based on the user’s perspective and based on the user’s perspective and the system’s perspective.the system’s perspective.

What is User’s perspective?What is User’s perspective? How fast a single program is executedHow fast a single program is executed

What is system’s perspective?What is system’s perspective? ThroughputThroughput

Page 7: SYSTEM-LEVEL PERFORMANCE METRICS FOR MULTIPROGRAM WORKLOADS Presented by Ankit Patel Authors: Stijn Everman Lieven Eeckhout Lieven Eeckhout.

Its Time For Some Its Time For Some TerminologiesTerminologies

Page 8: SYSTEM-LEVEL PERFORMANCE METRICS FOR MULTIPROGRAM WORKLOADS Presented by Ankit Patel Authors: Stijn Everman Lieven Eeckhout Lieven Eeckhout.

TerminologiesTerminologies

Turnaround timeTurnaround time: Quantifies the time : Quantifies the time between submitting a job and its between submitting a job and its completion.completion.

Response timeResponse time: Measures the time : Measures the time between submitting a job and receiving between submitting a job and receiving its first response; this metric is its first response; this metric is important for interactive applications.important for interactive applications.

Throughput:Throughput: quantifies the number of quantifies the number of programs completed per unit of time.programs completed per unit of time.

at

Page 9: SYSTEM-LEVEL PERFORMANCE METRICS FOR MULTIPROGRAM WORKLOADS Presented by Ankit Patel Authors: Stijn Everman Lieven Eeckhout Lieven Eeckhout.

Continues….Continues….

Single-program modeSingle-program mode: A single : A single program has exclusive access to the program has exclusive access to the computer system. It has all system computer system. It has all system resources at its disposal and is never resources at its disposal and is never interrupted or preempted during its interrupted or preempted during its execution. execution.

Multiprogram mode: Multiprogram mode: Multiple Multiple programs are coexecuting on the programs are coexecuting on the computer system.computer system.

Page 10: SYSTEM-LEVEL PERFORMANCE METRICS FOR MULTIPROGRAM WORKLOADS Presented by Ankit Patel Authors: Stijn Everman Lieven Eeckhout Lieven Eeckhout.

Its Time For Some Its Time For Some Mathematics and few more Mathematics and few more

terminologiesterminologies

Page 11: SYSTEM-LEVEL PERFORMANCE METRICS FOR MULTIPROGRAM WORKLOADS Presented by Ankit Patel Authors: Stijn Everman Lieven Eeckhout Lieven Eeckhout.

Turnaround TimeTurnaround Time

Normalized Turnaround Time(NTT):Normalized Turnaround Time(NTT):

Average NTTAverage NTT

Max NTTMax NTT

Page 12: SYSTEM-LEVEL PERFORMANCE METRICS FOR MULTIPROGRAM WORKLOADS Presented by Ankit Patel Authors: Stijn Everman Lieven Eeckhout Lieven Eeckhout.

System throughputSystem throughput

Normalized Progress:Normalized Progress:

System ThroughputSystem Throughput

Page 13: SYSTEM-LEVEL PERFORMANCE METRICS FOR MULTIPROGRAM WORKLOADS Presented by Ankit Patel Authors: Stijn Everman Lieven Eeckhout Lieven Eeckhout.

Practical Practical (Why I say practical???)(Why I say practical???)

Adjusted ANTT:Adjusted ANTT:

Adjusted STP:Adjusted STP:

Page 14: SYSTEM-LEVEL PERFORMANCE METRICS FOR MULTIPROGRAM WORKLOADS Presented by Ankit Patel Authors: Stijn Everman Lieven Eeckhout Lieven Eeckhout.

IPC Throughput (…keep this in IPC Throughput (…keep this in mind…):mind…):

Weighted Speedup:Weighted Speedup:

Harmonic Average (Hmean):Harmonic Average (Hmean):

Page 15: SYSTEM-LEVEL PERFORMANCE METRICS FOR MULTIPROGRAM WORKLOADS Presented by Ankit Patel Authors: Stijn Everman Lieven Eeckhout Lieven Eeckhout.

Co-executing Co-executing programs in programs in multiprogram mode multiprogram mode experience equal experience equal relative progress with relative progress with respect to single-respect to single-program modeprogram mode

Fairness:Fairness:

Proportional Progress (Proportional Progress (for different for different

prioritiespriorities):):

Page 16: SYSTEM-LEVEL PERFORMANCE METRICS FOR MULTIPROGRAM WORKLOADS Presented by Ankit Patel Authors: Stijn Everman Lieven Eeckhout Lieven Eeckhout.

So….fairness becomes…So….fairness becomes…

Page 17: SYSTEM-LEVEL PERFORMANCE METRICS FOR MULTIPROGRAM WORKLOADS Presented by Ankit Patel Authors: Stijn Everman Lieven Eeckhout Lieven Eeckhout.

Enough theories……..Enough theories……..

How can I apply this in real How can I apply this in real world performance world performance

measurements?measurements?

Page 18: SYSTEM-LEVEL PERFORMANCE METRICS FOR MULTIPROGRAM WORKLOADS Presented by Ankit Patel Authors: Stijn Everman Lieven Eeckhout Lieven Eeckhout.

OK……..OK……..

Then lets do a case study ….Then lets do a case study ….

Page 19: SYSTEM-LEVEL PERFORMANCE METRICS FOR MULTIPROGRAM WORKLOADS Presented by Ankit Patel Authors: Stijn Everman Lieven Eeckhout Lieven Eeckhout.

Case study: Evaluating SMT Case study: Evaluating SMT fetch policiesfetch policies

What should be used in performance measurements?What should be used in performance measurements? Researchers should use multiple metrics for Researchers should use multiple metrics for

characterizing multiprogram system performance.characterizing multiprogram system performance. Combination of ANTT and STP provides a clear picture Combination of ANTT and STP provides a clear picture

of overall system performance as a balance between of overall system performance as a balance between user-oriented program turnaround time and system-user-oriented program turnaround time and system-oriented throughput.oriented throughput.

Involves user level single-threaded workloads, does not Involves user level single-threaded workloads, does not affect the general applicability of the multiprogram affect the general applicability of the multiprogram performance metrics.performance metrics.

ANTT-STP characterization is applicable to multithreaded ANTT-STP characterization is applicable to multithreaded and full-system workloads.and full-system workloads.

Used ANTT and STP metrics to evaluate performance and Used ANTT and STP metrics to evaluate performance and for multithreaded full-system workloads, used the cycle-for multithreaded full-system workloads, used the cycle-count-based equations.count-based equations.

Page 20: SYSTEM-LEVEL PERFORMANCE METRICS FOR MULTIPROGRAM WORKLOADS Presented by Ankit Patel Authors: Stijn Everman Lieven Eeckhout Lieven Eeckhout.

Ooops…Ooops…I have to introduce few I have to introduce few more terminologies !!!more terminologies !!!

Page 21: SYSTEM-LEVEL PERFORMANCE METRICS FOR MULTIPROGRAM WORKLOADS Presented by Ankit Patel Authors: Stijn Everman Lieven Eeckhout Lieven Eeckhout.

Six SMT fetch policiesSix SMT fetch policies Icount: Icount:

Strive to have an equal # of instructions from all co-executing Strive to have an equal # of instructions from all co-executing programsprograms

Stall fetch:Stall fetch: Stalls the fetch of a program that experiences a long-latency Stalls the fetch of a program that experiences a long-latency

load until data returns from memory.load until data returns from memory. Predictive stall fetch:Predictive stall fetch:

Extends the stall fetch policy by predicting long-latency loads in Extends the stall fetch policy by predicting long-latency loads in the front-end pipelinethe front-end pipeline

MLP-aware stall fetch:MLP-aware stall fetch: Predicts long latency loads and their associated memory-level Predicts long latency loads and their associated memory-level

parallelismparallelism Flush:Flush:

Flushes on long-latency loadsFlushes on long-latency loads MLP-aware flush:MLP-aware flush:

Extends the MLP aware stall fetch policy by flushing instructions Extends the MLP aware stall fetch policy by flushing instructions if more than m instructions have been fetched since the first if more than m instructions have been fetched since the first burst of long-latency loads.burst of long-latency loads.

Page 22: SYSTEM-LEVEL PERFORMANCE METRICS FOR MULTIPROGRAM WORKLOADS Presented by Ankit Patel Authors: Stijn Everman Lieven Eeckhout Lieven Eeckhout.

…….And this was the last .And this was the last theory …I promise !!!theory …I promise !!!

Page 23: SYSTEM-LEVEL PERFORMANCE METRICS FOR MULTIPROGRAM WORKLOADS Presented by Ankit Patel Authors: Stijn Everman Lieven Eeckhout Lieven Eeckhout.

Simulation environmentSimulation environment

Software used: SimPointSoftware used: SimPoint 36 two program workload36 two program workload 30 four program workload30 four program workload Simulation points are shosen for SPEC Simulation points are shosen for SPEC

2000 benchmarks (200 million 2000 benchmarks (200 million instructions each)instructions each)

Four-wide superscaler, out-of-order Four-wide superscaler, out-of-order SMT processor with an aggressive SMT processor with an aggressive hardware data prefetcher with eight hardware data prefetcher with eight stream buffersstream buffers

Page 24: SYSTEM-LEVEL PERFORMANCE METRICS FOR MULTIPROGRAM WORKLOADS Presented by Ankit Patel Authors: Stijn Everman Lieven Eeckhout Lieven Eeckhout.
Page 25: SYSTEM-LEVEL PERFORMANCE METRICS FOR MULTIPROGRAM WORKLOADS Presented by Ankit Patel Authors: Stijn Everman Lieven Eeckhout Lieven Eeckhout.

MLPaware flush policy MLPaware flush policy outperforms Icount for outperforms Icount for both the two- and four-both the two- and four-program workloadsprogram workloads

That is, it achieves a That is, it achieves a higher system higher system throughput and a throughput and a lower average lower average normalized turnaround normalized turnaround time, while achieving a time, while achieving a comparable fairness comparable fairness level.level.

Page 26: SYSTEM-LEVEL PERFORMANCE METRICS FOR MULTIPROGRAM WORKLOADS Presented by Ankit Patel Authors: Stijn Everman Lieven Eeckhout Lieven Eeckhout.

The same is true when we The same is true when we compare MLP-aware flush compare MLP-aware flush against flush for the two-against flush for the two-program workloads; for program workloads; for the four-program the four-program workloads, MLP-aware workloads, MLP-aware flush achieves a much flush achieves a much lower normalized lower normalized turnaround time than turnaround time than flush at a comparable flush at a comparable system throughput.system throughput.

MLP-aware stall fetch MLP-aware stall fetch achieves a smaller ANTT, achieves a smaller ANTT, whereas predictive stall whereas predictive stall fetch achieves a higher fetch achieves a higher STP.STP.

Page 27: SYSTEM-LEVEL PERFORMANCE METRICS FOR MULTIPROGRAM WORKLOADS Presented by Ankit Patel Authors: Stijn Everman Lieven Eeckhout Lieven Eeckhout.
Page 28: SYSTEM-LEVEL PERFORMANCE METRICS FOR MULTIPROGRAM WORKLOADS Presented by Ankit Patel Authors: Stijn Everman Lieven Eeckhout Lieven Eeckhout.

Interesting…….Interesting…….

So what are you trying to So what are you trying to conclude here???conclude here???

Page 29: SYSTEM-LEVEL PERFORMANCE METRICS FOR MULTIPROGRAM WORKLOADS Presented by Ankit Patel Authors: Stijn Everman Lieven Eeckhout Lieven Eeckhout.

What does this show?What does this show?

Delicate balance between user-oriented Delicate balance between user-oriented and system-oriented views of and system-oriented views of performance. performance.

If user-perceived performance is the If user-perceived performance is the primary objective, MLP-aware stall fetch primary objective, MLP-aware stall fetch is the better fetch policy. is the better fetch policy.

If system perceived performance is the If system perceived performance is the primary objective, predictive stall fetch is primary objective, predictive stall fetch is the policy of choice.the policy of choice.

Page 30: SYSTEM-LEVEL PERFORMANCE METRICS FOR MULTIPROGRAM WORKLOADS Presented by Ankit Patel Authors: Stijn Everman Lieven Eeckhout Lieven Eeckhout.

While I was introducing While I was introducing terminologies, IPC terminologies, IPC

throughput,throughput, I said I said

…………keep this in mindkeep this in mind…….…….

remember?remember?

Page 31: SYSTEM-LEVEL PERFORMANCE METRICS FOR MULTIPROGRAM WORKLOADS Presented by Ankit Patel Authors: Stijn Everman Lieven Eeckhout Lieven Eeckhout.

IPC Throughput as IPC Throughput as performance measurement performance measurement

is misleadingis misleadingUsing IPC throughput as a performance metric, you would conclude that the MLP-aware flush policy is comparable to the flush policy. However, it achieves a significantly higher system throughput (STP). Thus, IPC throughput is a potentially misleading performance metric.

Page 32: SYSTEM-LEVEL PERFORMANCE METRICS FOR MULTIPROGRAM WORKLOADS Presented by Ankit Patel Authors: Stijn Everman Lieven Eeckhout Lieven Eeckhout.

SummarySummary

Gives theoretical foundation for Gives theoretical foundation for measuring system performancemeasuring system performance

Don’t judge the system performance Don’t judge the system performance for multicore systems merely based for multicore systems merely based on IPC throughput or CPIon IPC throughput or CPI

Use quantitative approach for Use quantitative approach for performance measurements for performance measurements for multicore systems. Few of those are multicore systems. Few of those are mentioned in this papermentioned in this paper

Page 33: SYSTEM-LEVEL PERFORMANCE METRICS FOR MULTIPROGRAM WORKLOADS Presented by Ankit Patel Authors: Stijn Everman Lieven Eeckhout Lieven Eeckhout.

Questions, Comments, Questions, Comments, Concerns ???Concerns ???