BENEFITS and RETIREMENT Orientation (561) 297-3071-Benefits (561) 297-2061-Retirement
Daniel dauwe ece 561 Benchmarking Results Trial 2
-
Upload
cinedan -
Category
Technology
-
view
46 -
download
0
Transcript of Daniel dauwe ece 561 Benchmarking Results Trial 2
![Page 1: Daniel dauwe ece 561 Benchmarking Results Trial 2](https://reader036.fdocuments.in/reader036/viewer/2022081603/558cfabad8b42a206f8b471c/html5/thumbnails/1.jpg)
Benchmarking ECE 561
Sudeep Pasricha
Daniel Dauwe1/9/2014
![Page 2: Daniel dauwe ece 561 Benchmarking Results Trial 2](https://reader036.fdocuments.in/reader036/viewer/2022081603/558cfabad8b42a206f8b471c/html5/thumbnails/2.jpg)
Presentation Outline• Project Goals
• Tools for Benchmarking:
• Performance counters, PAPI,
• HPC Toolkit, Phoronix Test Suits,
• Power Measurement
• How testing was accomplished
• List of additional data points for application to processor affinity
• A simple continuation of Ryan’s test work
• Results from Memory/Cache Interference Testing for multiple applications run simultaneously pinned to specific cores
![Page 3: Daniel dauwe ece 561 Benchmarking Results Trial 2](https://reader036.fdocuments.in/reader036/viewer/2022081603/558cfabad8b42a206f8b471c/html5/thumbnails/3.jpg)
Project Goals• Benchmarking Processors
– Monitor both performance counters and the system's power usage
– Gathering more data for looking at application affinity for performance on a particular processor architecture• Memory Intensive Applications• CPU Intensive Applications
– Analyze the Interaction/Interference of multiple applications run simultaneously on different cores of the same processor
• This data collection is intermediate work for future unspecified projects
![Page 4: Daniel dauwe ece 561 Benchmarking Results Trial 2](https://reader036.fdocuments.in/reader036/viewer/2022081603/558cfabad8b42a206f8b471c/html5/thumbnails/4.jpg)
Performance Counters and PAPI
• Performance counters– Counters built into processor hardware that record the number
of occurrences of user specified events in hardware• PAPI – Performance Application Programming Interface
– PAPI was developed in the hope of identifying bottlenecks in current architectural development of high performance computing
– A standardized list of performance counters available for most processors
– PAPI makes it easier to have consistent tests across multiple processor architectures
![Page 5: Daniel dauwe ece 561 Benchmarking Results Trial 2](https://reader036.fdocuments.in/reader036/viewer/2022081603/558cfabad8b42a206f8b471c/html5/thumbnails/5.jpg)
What do the Performance Counter Measurements mean?
• Can mean different things based on which counters are being monitored Ex:– PAPI_L1_DCA - Level 1 data cache accesses– PAPI_FAD_INS - Floating point add instructions– PAPI_L2_DCM - Level 2 data cache misses
• The raw count data provided by the Performance Counter will need to be meaningfully interpreted by the user
![Page 6: Daniel dauwe ece 561 Benchmarking Results Trial 2](https://reader036.fdocuments.in/reader036/viewer/2022081603/558cfabad8b42a206f8b471c/html5/thumbnails/6.jpg)
Matching Performance counters to Processor Architectures
• Performance Counters used for these tests :– PAPI_TOT_INS – Total Instructions Executed– PAPI_L2_TCM – Data and Instruction Level 2 Cache Misses
• These should be pretty universally available across different processor architectures
• Future inclusion of other tests may require other Performance Counters, but available Performance Counters vary greatly between processor architectures…
![Page 7: Daniel dauwe ece 561 Benchmarking Results Trial 2](https://reader036.fdocuments.in/reader036/viewer/2022081603/558cfabad8b42a206f8b471c/html5/thumbnails/7.jpg)
HPC Toolkit
• “An Integrated suite of tools for measurement and analysis of program performance”
• Essentially – HPC Toolkit makes it easier to interface with the local machine's
performance counters– Makes collecting program performance data easier
![Page 8: Daniel dauwe ece 561 Benchmarking Results Trial 2](https://reader036.fdocuments.in/reader036/viewer/2022081603/558cfabad8b42a206f8b471c/html5/thumbnails/8.jpg)
Phoronix Test Suite• Phoronix Provides lots of test applications capable of testing many
aspects of processor performance– Phoronix tests are responsible for all of the benchmarking data
gathered for this presentation• However many other groups write application suites useful for
benchmarking– SPEC CPU2000 / 2006– PARSEC
• Several resources such as “OpenBenchmarking.org” provide a substantial amount of results from tests run from these suites on many processor architectures– This could prove to be a useful resource, however they do not
include information about power usage
![Page 9: Daniel dauwe ece 561 Benchmarking Results Trial 2](https://reader036.fdocuments.in/reader036/viewer/2022081603/558cfabad8b42a206f8b471c/html5/thumbnails/9.jpg)
Applications used for testing Cross-Core cache interference
• C-Ray– A Ray Tracing Program– CPU Intensive– Many Floating Point Calculation Operations– Relatively Little Memory Access
• Ramspeed– Integer and Floating Point Writes and Reads to memory– Memory Intensive– More interaction with the caches
![Page 10: Daniel dauwe ece 561 Benchmarking Results Trial 2](https://reader036.fdocuments.in/reader036/viewer/2022081603/558cfabad8b42a206f8b471c/html5/thumbnails/10.jpg)
Monitoring Power Usage• “Watts Up? PRO” power meter
– Measures power consumption from a single standard power outlet
– Has a USB port to interface with a computer and dump recorded power measurements
![Page 11: Daniel dauwe ece 561 Benchmarking Results Trial 2](https://reader036.fdocuments.in/reader036/viewer/2022081603/558cfabad8b42a206f8b471c/html5/thumbnails/11.jpg)
How tests were run• Minimalist Ubuntu Operating System allows the processor's
attention to be dedicated to the test applications– Terminal Based User Interface– Unnecessary background processes not included in the
operating system• Power usage and selected program counters are recorded and
saved while the various test applications are run.• For Testing Interference between programs:
– “taskset” was used to pin the applications to specific processor cores
– The applications were run concurrently, while performance counter results were measured
![Page 12: Daniel dauwe ece 561 Benchmarking Results Trial 2](https://reader036.fdocuments.in/reader036/viewer/2022081603/558cfabad8b42a206f8b471c/html5/thumbnails/12.jpg)
Measuring Memory Interference between Applications
• How this is tested:• Simultaneously pin different types of applications to run only on specific cores in the
processor,• Then use performance counters and the power meter to measure the interference
• Interference could be defined as:• An increase in the number of cache misses • Increase in application execution time• Possibly defined by an increase in power consumption
• Test plan:• Tests were run:
• First on an AMD Turion II Dual-Core M520 Processor (2 cores, 5 P-states)• Later also on an Intel Pentium Dual Core CPU (2 cores, 4 P-states)
• Run control tests for running each processor alone (pinned to a single core )• Run the tests together and analyze the differences
![Page 13: Daniel dauwe ece 561 Benchmarking Results Trial 2](https://reader036.fdocuments.in/reader036/viewer/2022081603/558cfabad8b42a206f8b471c/html5/thumbnails/13.jpg)
Control Results: Intel Pentium dual CPU T2330
0 1 2 30
100
200
300
400
Intel Pentium Dual Core: C-Ray L2
Cache Miss Control Results
CPU Control Test
0 1 2 30
500100015002000
Intel Pentium Dual Core : C-Ray Exe-
cution Time Control Results CPU Control
Test
0 1 2 3140
180
220
Intel Pentium Dual Core: Ramspeed Execution Time Control Results Memory
Control Test
0 1 2 353950
54000
54050
54100
Intel Pentium Dual Core: Ramspeed L2 Cache Miss Control
Results Memory Control Test
0 1 2 30
1000020000300004000050000
Intel Pentium Dual Core C-Ray Power Usage Control Re-
sults CPU Control Energy
0 1 2 38000
8500
9000
9500
10000
Intel Pentium Dual Core Ramspeed
Power Usage Con-trol Results Memory
Control En-ergy
![Page 14: Daniel dauwe ece 561 Benchmarking Results Trial 2](https://reader036.fdocuments.in/reader036/viewer/2022081603/558cfabad8b42a206f8b471c/html5/thumbnails/14.jpg)
Control Results: AMD Turion II Dual Core Mobile M520
0 1 2 3 4576
578
580
582
AMD Turion II Dual-Core C-ray Execution Time control Results CPU Control
Test
0 1 2 3 40
200
400
600
AMD Turion II Dual-Core C-ray L2 Cache Miss control Results
CPU Control Test
0 1 2 3 475
80
85
AMD Turion II Dual-Core Ram-speed Execution
Time control Results
Memory Control Test
0 1 2 3 44000420044004600
AMD Turion II Dual-Core Ram-speed L2 Cache
Miss control Results
Memory Control Test
0 1 2 3 40
1000020000300004000050000
AMD Turion II Dual-Core C-ray Power Usage control Re-
sults CPU Control Energy
0 1 2 3 40
5000
10000
15000
AMD Turion II Dual-Core Ramspeed
Power Usage con-trol Results Memory
Control En-ergy
![Page 15: Daniel dauwe ece 561 Benchmarking Results Trial 2](https://reader036.fdocuments.in/reader036/viewer/2022081603/558cfabad8b42a206f8b471c/html5/thumbnails/15.jpg)
Taking a Closer Look at the AMD Control Results from the previous slide:
• It seems suspect that the results from the control test should produce the same execution time across all p-states, even though this result for the C-Ray execution control test was consistent over multiple runs on the AMD Turion II processor, a test execution on a secondary Intel Pentium Dual Core processor produced results that were closer to what seems realistic:
0 1 2 3 40
500
1000
1500
2000
2500
C-Ray Execution Time
(AMD First Run)
Control TestInterference Test
0 1 2 3 40
200
400
600
800
1000
1200
1400
1600
1800
C-ray Execution Time
(AMD Second Run)
CPU Control TestCPU Inter-ference Test
0 1 2 30
200
400
600
800
1000
1200
1400
1600
1800
C-ray Execution Time (Intel
Run)
CPU Control TestCPU Inter-ference Test
![Page 16: Daniel dauwe ece 561 Benchmarking Results Trial 2](https://reader036.fdocuments.in/reader036/viewer/2022081603/558cfabad8b42a206f8b471c/html5/thumbnails/16.jpg)
• The third column of data represents Adjusted interference results
Interference Results (Joint Pinning Results on C-Ray):
Intel Pentium dual CPU T2330
0 1 2 30
200
400
600
800
1000
1200
1400
1600
1800
C-ray Execution Time Interference (Ramspeed test on
second core)
CPU Control TestOriginal CPU Interference TestAdjusted CPU Interference Test
0 1 2 30
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
C-ray L2 Cache Misses Interference (Ramspeed test on
second core)
CPU Control TestOriginal CPU Interference TestAdjusted CPU Interference Test
0 1 2 30
5000
10000
15000
20000
25000
30000
35000
40000
45000
Power usage for C-ray and Ramspeed tests run together
CPU Control Energy1 CPU and 1 Memory In-terference Test Energy
![Page 17: Daniel dauwe ece 561 Benchmarking Results Trial 2](https://reader036.fdocuments.in/reader036/viewer/2022081603/558cfabad8b42a206f8b471c/html5/thumbnails/17.jpg)
Interference Results (Joint Pinning Results on Ramspeed):
Intel Pentium dual CPU T2330
0 1 2 3150
160
170
180
190
200
210
220
Ramspeed Execution Time Interference
(C-ray test on second core)
Memory Control TestMemory Inter-ference Test
0 1 2 353850
53900
53950
54000
54050
54100
Ramspeed L2 Cache Misses Interference
(C-ray test on second core)
Memory Control TestMemory Inter-ference Test
![Page 18: Daniel dauwe ece 561 Benchmarking Results Trial 2](https://reader036.fdocuments.in/reader036/viewer/2022081603/558cfabad8b42a206f8b471c/html5/thumbnails/18.jpg)
Interference Results (2 CPU Intensive Application Pinning Results):
Intel Pentium dual CPU T2330
0 1 2 30
200
400
600
800
1000
1200
1400
1600
C-ray Execution Time Interference
(C-ray test on second core)
CPU Control TestCPU Inter-ference TestCPU Inter-ference Test
0 1 2 30
100
200
300
400
500
600
700
C-ray L2 Cache Misses Inter-
ference (C-ray test on second core)
CPU Control TestCPU Inter-ference TestCPU Inter-ference Test
0 1 2 30
5000
10000
15000
20000
25000
30000
35000
40000
45000
Power usage for 2 C-ray tests run-ning on separate
cores
CPU Control Energy2 CPU Inter-ference Test Energy
![Page 19: Daniel dauwe ece 561 Benchmarking Results Trial 2](https://reader036.fdocuments.in/reader036/viewer/2022081603/558cfabad8b42a206f8b471c/html5/thumbnails/19.jpg)
Interference Results (2 Memory Intensive Application
Pinning Results): Intel Pentium dual CPU T2330
0 1 2 30
50
100
150
200
250
300
350
400
Ramspeed Execu-tion Time Inter-
ference (Ramspeed test on
second core) Memory Control TestMemory Interfer-ence TestMemory Interfer-ence Test
0 1 2 353900
53950
54000
54050
54100
54150
54200
54250
Ramspeed L2 Cache Misses In-
terference (Ramspeed test on
second core) Memory Control TestMemory Interfer-ence TestMemory Interfer-ence Test
0 1 2 30
5000
10000
15000
20000
25000
30000
35000
40000
45000
Power usage for 2 Ramspeed tests running on sepa-
rate cores
Memory Control Energy2 Memory Interference Test Energy
![Page 20: Daniel dauwe ece 561 Benchmarking Results Trial 2](https://reader036.fdocuments.in/reader036/viewer/2022081603/558cfabad8b42a206f8b471c/html5/thumbnails/20.jpg)
Interference between simultaneous applications:
Future TestsThe foundation scripts have been written so in the future it will be very easy to add support for testing:
– Interference of 1 type of application pinned to N cores for a processor with a substantial number of cores (ie >2)
– Interference from 2 CPU intensive or 2 Memory intensive test applications
– Measure memory interference with M applications mapped to N cores (Obviously N > 2)– Testing a larger sample size might produce more interesting results
– Find which application to core mappings can provide the best performance for specific architectures/cache sizes
![Page 21: Daniel dauwe ece 561 Benchmarking Results Trial 2](https://reader036.fdocuments.in/reader036/viewer/2022081603/558cfabad8b42a206f8b471c/html5/thumbnails/21.jpg)
Presentation Outline• Project Goals
• Tools for Benchmarking:
• Performance counters, PAPI,
• HPC Toolkit, Phoronix Test Suits,
• Power Measurement
• How testing was accomplished
• List of additional data points for application to processor affinity
• A simple continuation of Ryan’s test work
• Results from Interference Testing for applications pinned to specific cores
![Page 22: Daniel dauwe ece 561 Benchmarking Results Trial 2](https://reader036.fdocuments.in/reader036/viewer/2022081603/558cfabad8b42a206f8b471c/html5/thumbnails/22.jpg)
Thank You For Your Attention