PAPI 3.0.8.1 on Blue Gene L Using network performance counters to layout tasks for improved...
-
Upload
mae-simpson -
Category
Documents
-
view
218 -
download
0
description
Transcript of PAPI 3.0.8.1 on Blue Gene L Using network performance counters to layout tasks for improved...
![Page 1: PAPI 3.0.8.1 on Blue Gene L Using network performance counters to layout tasks for improved performance.](https://reader036.fdocuments.in/reader036/viewer/2022062600/5a4d1b427f8b9ab0599a1d17/html5/thumbnails/1.jpg)
PAPI 3.0.8.1 on Blue Gene L
Using network performance counters to layout tasks for
improved performance
![Page 2: PAPI 3.0.8.1 on Blue Gene L Using network performance counters to layout tasks for improved performance.](https://reader036.fdocuments.in/reader036/viewer/2022062600/5a4d1b427f8b9ab0599a1d17/html5/thumbnails/2.jpg)
Presentation overview Project objectives PAPI explanation Blue Gene L explanation Current state of research
![Page 3: PAPI 3.0.8.1 on Blue Gene L Using network performance counters to layout tasks for improved performance.](https://reader036.fdocuments.in/reader036/viewer/2022062600/5a4d1b427f8b9ab0599a1d17/html5/thumbnails/3.jpg)
Project objectives Upgrade PAPI on BG/L
Provide interface for network counters
Allow Lawrence Livermore National Lab users to also have access to PAPI
Using network counters to place tasks optimally on BG/L
![Page 4: PAPI 3.0.8.1 on Blue Gene L Using network performance counters to layout tasks for improved performance.](https://reader036.fdocuments.in/reader036/viewer/2022062600/5a4d1b427f8b9ab0599a1d17/html5/thumbnails/4.jpg)
PAPI – Intro
Courtesy of http://icl.cs.utk.edu/papi/
![Page 5: PAPI 3.0.8.1 on Blue Gene L Using network performance counters to layout tasks for improved performance.](https://reader036.fdocuments.in/reader036/viewer/2022062600/5a4d1b427f8b9ab0599a1d17/html5/thumbnails/5.jpg)
PAPI – Intro PAPI useful to profile your own
programs. Many tools based on PAPI
PapiEx – Command line measurement tool PerfSuite – Aggregate measurement and
statistical profiling package and API HPCToolkit – Statistical profiling package Many more!
![Page 6: PAPI 3.0.8.1 on Blue Gene L Using network performance counters to layout tasks for improved performance.](https://reader036.fdocuments.in/reader036/viewer/2022062600/5a4d1b427f8b9ab0599a1d17/html5/thumbnails/6.jpg)
PAPI – Supported platforms IBM – POWER3, 604, 604e, POWER4 Cray T3E, Cray X1 AMD – Athlon, Opteron Intel – P1 to P4, Itanium I and II UltraSparc I, II & III MIPS R10K, R12K, R14K Alpha
![Page 7: PAPI 3.0.8.1 on Blue Gene L Using network performance counters to layout tasks for improved performance.](https://reader036.fdocuments.in/reader036/viewer/2022062600/5a4d1b427f8b9ab0599a1d17/html5/thumbnails/7.jpg)
PAPI – Generic Interface Call sequence for generic interface
PAPI_library_init – Initialize memory for PAPI’s data structures
PAPI_create_eventset – Create an empty list of events
PAPI_add_event – Add events to be counted PAPI_start – Begin counting all events within
the specified eventset PAPI_stop – Stop all counters and read their
current values
![Page 8: PAPI 3.0.8.1 on Blue Gene L Using network performance counters to layout tasks for improved performance.](https://reader036.fdocuments.in/reader036/viewer/2022062600/5a4d1b427f8b9ab0599a1d17/html5/thumbnails/8.jpg)
PAPI – Events: Presets Presets – list of predefined events
implemented on all systems where they can be supported Not all presets available on every
architecture (e.g. BG/L has no cache lower than L3 – thus L1 cache hit preset not applicable)
Native events form the basic building blocks for PAPI presets
![Page 9: PAPI 3.0.8.1 on Blue Gene L Using network performance counters to layout tasks for improved performance.](https://reader036.fdocuments.in/reader036/viewer/2022062600/5a4d1b427f8b9ab0599a1d17/html5/thumbnails/9.jpg)
PAPI – Events: Presets
Courtesy of http://icl.cs.utk.edu/papi/
![Page 10: PAPI 3.0.8.1 on Blue Gene L Using network performance counters to layout tasks for improved performance.](https://reader036.fdocuments.in/reader036/viewer/2022062600/5a4d1b427f8b9ab0599a1d17/html5/thumbnails/10.jpg)
PAPI – Events: Native In addition to the predefined PAPI
preset events, the PAPI library also exposes a majority of the events native to each platform
Can be added to eventsets in the same manner as presets
![Page 11: PAPI 3.0.8.1 on Blue Gene L Using network performance counters to layout tasks for improved performance.](https://reader036.fdocuments.in/reader036/viewer/2022062600/5a4d1b427f8b9ab0599a1d17/html5/thumbnails/11.jpg)
PAPI – Events: Native
![Page 12: PAPI 3.0.8.1 on Blue Gene L Using network performance counters to layout tasks for improved performance.](https://reader036.fdocuments.in/reader036/viewer/2022062600/5a4d1b427f8b9ab0599a1d17/html5/thumbnails/12.jpg)
PAPI – Internals Array of eventsets is the main
portion
![Page 13: PAPI 3.0.8.1 on Blue Gene L Using network performance counters to layout tasks for improved performance.](https://reader036.fdocuments.in/reader036/viewer/2022062600/5a4d1b427f8b9ab0599a1d17/html5/thumbnails/13.jpg)
PAPI – Other features Multiplexing – If there are not
enough hardware counters Thread safe – Profiling is thread
safe Overflow detection – Hardware
counters have limited space
![Page 14: PAPI 3.0.8.1 on Blue Gene L Using network performance counters to layout tasks for improved performance.](https://reader036.fdocuments.in/reader036/viewer/2022062600/5a4d1b427f8b9ab0599a1d17/html5/thumbnails/14.jpg)
PAPI – PAPI2 vs PAPI3 PAPI 3 significantly reduced
overheads for starting, stopping and reading the counters
Courtesy of http://icl.cs.utk.edu/papi/
![Page 15: PAPI 3.0.8.1 on Blue Gene L Using network performance counters to layout tasks for improved performance.](https://reader036.fdocuments.in/reader036/viewer/2022062600/5a4d1b427f8b9ab0599a1d17/html5/thumbnails/15.jpg)
PAPI – PAPI2 vs PAPI3 Better native event support in
PAPI3 Better thread support in PAPI3 Overflow and Profiling
enhancements in PAPI3 Myriad bug fixes and code cleanup
in PAPI3
![Page 16: PAPI 3.0.8.1 on Blue Gene L Using network performance counters to layout tasks for improved performance.](https://reader036.fdocuments.in/reader036/viewer/2022062600/5a4d1b427f8b9ab0599a1d17/html5/thumbnails/16.jpg)
PAPI – PAPI2 vs PAPI3 Overlapping eventsets supported
in PAPI2 Minor changes in the API – mostly
dereferencing variables
![Page 17: PAPI 3.0.8.1 on Blue Gene L Using network performance counters to layout tasks for improved performance.](https://reader036.fdocuments.in/reader036/viewer/2022062600/5a4d1b427f8b9ab0599a1d17/html5/thumbnails/17.jpg)
Blue Gene L – Intro 65,536 nodes connected in 64 x 32
x 32 3D torus Nodes made up of PowerPC 440
embedded processors Smaller than most super
computers Consumes less power
![Page 18: PAPI 3.0.8.1 on Blue Gene L Using network performance counters to layout tasks for improved performance.](https://reader036.fdocuments.in/reader036/viewer/2022062600/5a4d1b427f8b9ab0599a1d17/html5/thumbnails/18.jpg)
Blue Gene L
![Page 19: PAPI 3.0.8.1 on Blue Gene L Using network performance counters to layout tasks for improved performance.](https://reader036.fdocuments.in/reader036/viewer/2022062600/5a4d1b427f8b9ab0599a1d17/html5/thumbnails/19.jpg)
Blue Gene L - Networks
3D torus network(node to node)
Tree network(broadcasts)
![Page 20: PAPI 3.0.8.1 on Blue Gene L Using network performance counters to layout tasks for improved performance.](https://reader036.fdocuments.in/reader036/viewer/2022062600/5a4d1b427f8b9ab0599a1d17/html5/thumbnails/20.jpg)
Blue Gene L – HW counters 48 universal performance counters 4 floating point unit counters Counters 32 bit – must use virtual
counters to prevent overflow
![Page 21: PAPI 3.0.8.1 on Blue Gene L Using network performance counters to layout tasks for improved performance.](https://reader036.fdocuments.in/reader036/viewer/2022062600/5a4d1b427f8b9ab0599a1d17/html5/thumbnails/21.jpg)
Blue Gene L – HW counters
![Page 22: PAPI 3.0.8.1 on Blue Gene L Using network performance counters to layout tasks for improved performance.](https://reader036.fdocuments.in/reader036/viewer/2022062600/5a4d1b427f8b9ab0599a1d17/html5/thumbnails/22.jpg)
Research – Overall goals Network hardware counters new Use network counters to determine
traffic between tasks Try to optimize placement of tasks
to minimize communication latency Given counts and distances: cost =
counts * distance. Minimize over all nodes
![Page 23: PAPI 3.0.8.1 on Blue Gene L Using network performance counters to layout tasks for improved performance.](https://reader036.fdocuments.in/reader036/viewer/2022062600/5a4d1b427f8b9ab0599a1d17/html5/thumbnails/23.jpg)
Research – Counting First goal to determine what is
being counted
![Page 24: PAPI 3.0.8.1 on Blue Gene L Using network performance counters to layout tasks for improved performance.](https://reader036.fdocuments.in/reader036/viewer/2022062600/5a4d1b427f8b9ab0599a1d17/html5/thumbnails/24.jpg)
Research – Networks For each MPI call – determine
which network counters are being used Tree is supposed to be for broadcasts Torus is supposed to be for point to
point communication Ambiguities in the specification
![Page 25: PAPI 3.0.8.1 on Blue Gene L Using network performance counters to layout tasks for improved performance.](https://reader036.fdocuments.in/reader036/viewer/2022062600/5a4d1b427f8b9ab0599a1d17/html5/thumbnails/25.jpg)
Research – Future decisions How to profile a target application
Manually insert PAPI instrumentation: a lot of work
Instrument binaries with counting code What information to store
All counts on each node: a lot of data Sample of all nodes: not as accurate
(what if the tasks behave / communicate differently?
![Page 26: PAPI 3.0.8.1 on Blue Gene L Using network performance counters to layout tasks for improved performance.](https://reader036.fdocuments.in/reader036/viewer/2022062600/5a4d1b427f8b9ab0599a1d17/html5/thumbnails/26.jpg)
Research – Future decisions How to use collected information
Profile an application to obtain counter feedback to determine optimized static task layout
Dynamically migrate tasks in response to counters