Twelve Ways to Fool the Masses When Giving Performance Results on Parallel Supercomputers – David...

13
Twelve Ways to Fool the Masses When Giving Performance Results on Parallel Supercomputers – David Bailey (1991) Eileen Kraemer August 25, 2002

Transcript of Twelve Ways to Fool the Masses When Giving Performance Results on Parallel Supercomputers – David...

Page 1: Twelve Ways to Fool the Masses When Giving Performance Results on Parallel Supercomputers – David Bailey (1991) Eileen Kraemer August 25, 2002.

Twelve Ways to Fool the Masses When Giving Performance Results on Parallel Supercomputers – David Bailey (1991)

Eileen Kraemer

August 25, 2002

Page 2: Twelve Ways to Fool the Masses When Giving Performance Results on Parallel Supercomputers – David Bailey (1991) Eileen Kraemer August 25, 2002.

1. Quote 32-bit performance results, not 64-bit results 32-bit performance generally faster, but

64-bit arithmetic often needed for types of applications performed on supercomputers

Page 3: Twelve Ways to Fool the Masses When Giving Performance Results on Parallel Supercomputers – David Bailey (1991) Eileen Kraemer August 25, 2002.

2. Present performance figures for an inner kernel, and then represent these figures as the performance of the entire application.

Although the application typically spends a good deal of time in the inner kernel, it tends to exhibit greater parallelism than the overall application.

Thus, representing speedups for the inner kernel as representative of speedup for overall application is misleading.

See: Amdahl’s Law

Page 4: Twelve Ways to Fool the Masses When Giving Performance Results on Parallel Supercomputers – David Bailey (1991) Eileen Kraemer August 25, 2002.

3. Quietly employ assembly code and other low-level language constructs. The compiler for a parallel supercomputer

may not take full advantage of the hardware of the system. Using assembly code or other low-level constructs will permit better use of the underlying hardware.

However, the use of such low-level constructs should be reported when providing performance results.

Page 5: Twelve Ways to Fool the Masses When Giving Performance Results on Parallel Supercomputers – David Bailey (1991) Eileen Kraemer August 25, 2002.

4. Scale up the problem size with the number of processors, but omit any

mention of this fact. For a fixed problem size, as you add more

processors, the benefits of additional processors drops off as you introduce more overhead relative to the amount of computation done, and speed-up is thus less than linear.

Scaling up the size of the problem as you add processors improves the ratio of useful work to overhead.

Failing to state how you’ve measured speedup is misleading.

Page 6: Twelve Ways to Fool the Masses When Giving Performance Results on Parallel Supercomputers – David Bailey (1991) Eileen Kraemer August 25, 2002.

5. Quote performance results projected to a full system. Such projections assume linear

functions – not likely true.

Page 7: Twelve Ways to Fool the Masses When Giving Performance Results on Parallel Supercomputers – David Bailey (1991) Eileen Kraemer August 25, 2002.

6. Compare your results against scalar, unoptimized code on Crays.

You should compare your parallel version of a code to the best serial implementation that is known.

Similarly, you should compare your parallel version of a code to the best implementation on whatever architecture you’re comparing to – not to the naïve version or worst version.

Page 8: Twelve Ways to Fool the Masses When Giving Performance Results on Parallel Supercomputers – David Bailey (1991) Eileen Kraemer August 25, 2002.

7. When direct run time comparisons are required, compare with an old code on an obsolete system.

Same idea here …

Page 9: Twelve Ways to Fool the Masses When Giving Performance Results on Parallel Supercomputers – David Bailey (1991) Eileen Kraemer August 25, 2002.

8. If MFLOPS rates must be quoted, base the operation count on the parallel implementation, not on the best serial implementation.

Parallel version for single processor is typically slower than serial version – due to added overhead

Page 10: Twelve Ways to Fool the Masses When Giving Performance Results on Parallel Supercomputers – David Bailey (1991) Eileen Kraemer August 25, 2002.

9. Quote performance in terms of processor utilization, parallel speedups

or MFLOPS per dollar. Runtime or MFLOPS, though likely

more informative, don’t make your codes look quite so impressive

Page 11: Twelve Ways to Fool the Masses When Giving Performance Results on Parallel Supercomputers – David Bailey (1991) Eileen Kraemer August 25, 2002.

10. Mutilate the algorithm used in the parallel implementation to match the architecture.

For example: to get higher MFLOPS (but longer run time)

Page 12: Twelve Ways to Fool the Masses When Giving Performance Results on Parallel Supercomputers – David Bailey (1991) Eileen Kraemer August 25, 2002.

11. Measure parallel run times on a dedicated system, but measure conventional run times in a busy environment.

Again, you should be comparing “your best” to “their best”.

Page 13: Twelve Ways to Fool the Masses When Giving Performance Results on Parallel Supercomputers – David Bailey (1991) Eileen Kraemer August 25, 2002.

12. If all else fails, show pretty pictures and animated videos, and don’t talk about performance.

… you get the idea ….