Using Dyninst to Dynamically Control Memory Reference Tracing
description
Transcript of Using Dyninst to Dynamically Control Memory Reference Tracing
![Page 1: Using Dyninst to Dynamically Control Memory Reference Tracing](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813fe5550346895daad278/html5/thumbnails/1.jpg)
University of Maryland
Using Dyninst to Dynamically Control Memory Reference Tracing
Jeff Odom
![Page 2: Using Dyninst to Dynamically Control Memory Reference Tracing](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813fe5550346895daad278/html5/thumbnails/2.jpg)
University of Maryland2
Sigma Goals
Collaboration between IBM and UMD Family of tools to understand caches
– Focus of detailed statistics– Complement existing hardware counters
Ability to handle real applications– MPI and OpenMP programs– Fortran and C
Provide hints about restructuring– Padding (both inter and intra data
structures)– Blocking
![Page 3: Using Dyninst to Dynamically Control Memory Reference Tracing](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813fe5550346895daad278/html5/thumbnails/3.jpg)
University of Maryland3
Approach
Run instrumented program– Capture full information about memory use– Produce compact trace
• Extracts loops and memory strides Post execution tools
– Detailed simulator• Full discrete event simulator
– Memory profiler• share of accesses due to each data
structure– Cache Prediction Tool
• Predict cache misses using symbolic equations
![Page 4: Using Dyninst to Dynamically Control Memory Reference Tracing](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813fe5550346895daad278/html5/thumbnails/4.jpg)
University of Maryland4
Representing Program Execution
Capture full execution behavior– Record all basic blocks and memory
addresses– Produces large traces (due to looping)
Trace compression– Maintain pattern buffer – Scan for repeating patterns
• Extract memory strides– Repeat algorithms for nested loopsBLK1 ADR ADR ADRBLK2
100 200 300
4 4 4
300 500
4 4
ADR ADR
250
7
BLK3RPT
Count
Length
Base
Stride
![Page 5: Using Dyninst to Dynamically Control Memory Reference Tracing](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813fe5550346895daad278/html5/thumbnails/5.jpg)
University of Maryland5
Not EnoughA few seconds generates gigabytes
– Regularity of data critical to compression
Original Program
Trace Size Application
5.9 s 1,900,591,649
seis_s
1.2 s 2,154,062,238
cg.SLossy tracing
– Statistically “rebuild” trace from sampled set
![Page 6: Using Dyninst to Dynamically Control Memory Reference Tracing](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813fe5550346895daad278/html5/thumbnails/6.jpg)
University of Maryland6
Sampling
Leverages Sigma– Most scientific apps loop based– Regular data access gives better
compresion
Time step boundary– Outermost loop– Non-uniform memory access OK
![Page 7: Using Dyninst to Dynamically Control Memory Reference Tracing](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813fe5550346895daad278/html5/thumbnails/7.jpg)
University of Maryland7
Sigma + Dyninst
Dyninst natural choice– Vary sample rate without recompilation– Adaptive/progressive rate during execution
Leverage existing Sigma infrastructure– Only generate trace– Offline simulation step unchanged
![Page 8: Using Dyninst to Dynamically Control Memory Reference Tracing](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813fe5550346895daad278/html5/thumbnails/8.jpg)
University of Maryland8
DynSigma
Mutator parses executable, inserts instrumentation, generates aux files– Instructions/module– Stack/global variables– Functions/line #
Group points by basic block (NEW)– Find load/store instrumentation via
BPatch_basicBlock::findPoint()
Mutatee generates trace– Inserted Sigma library
![Page 9: Using Dyninst to Dynamically Control Memory Reference Tracing](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813fe5550346895daad278/html5/thumbnails/9.jpg)
University of Maryland9
Sample Application
Seismic simulation from SPEC-HPC 2002– Models multiple seismic processes– Process results pipelined
Variable time steps– Different data pattern for each process
C & Fortran– Fortran – data processing– C – dynamic memory management, IO
![Page 10: Using Dyninst to Dynamically Control Memory Reference Tracing](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813fe5550346895daad278/html5/thumbnails/10.jpg)
University of Maryland10
L1 cache memtime by data structure
0.000001
0.00001
0.0001
0.001
0.01
0.1
1
10
Tim
e (
s)Full
0.1%0.5%
1%
5%10%
![Page 11: Using Dyninst to Dynamically Control Memory Reference Tracing](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813fe5550346895daad278/html5/thumbnails/11.jpg)
University of Maryland11
L2 cache memtime by data structure
0.00000001
0.0000001
0.000001
0.00001
0.0001
0.001
0.01
0.1
1
10
Tim
e (
s)Full
0.1%
0.5%
1%
5%
10%
![Page 12: Using Dyninst to Dynamically Control Memory Reference Tracing](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813fe5550346895daad278/html5/thumbnails/12.jpg)
University of Maryland12
L1 + L2 memtime by data structure
0.000001
0.00001
0.0001
0.001
0.01
0.1
1
10
Tim
e (
s)
Full
0.1%
0.5%
1%
5%
10%
![Page 13: Using Dyninst to Dynamically Control Memory Reference Tracing](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813fe5550346895daad278/html5/thumbnails/13.jpg)
University of Maryland13
L1 + L2 memtime by data structure
init 0.091
process 5.713
report 0.095
0.000001
0.00001
0.0001
0.001
0.01
0.1
1
10
Tim
e (
s)
Full
1%
1% init
![Page 14: Using Dyninst to Dynamically Control Memory Reference Tracing](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813fe5550346895daad278/html5/thumbnails/14.jpg)
University of Maryland14
Why go to all the trouble?
How about just one time step?Single Time Step
Full 1% First Middle Last
sa 3.46 3.86 2.46 6.32 9.96
otr 1.18 1.50 4.14 0.291 0.22
ra 0.0512 0.0973 0 0 0.299166
sxyz@coord 0.00153 0.00181 0 0 0.004883
rxyz@coord 0.001459 0.002038 0 0 0.005640
xn@dgen 0.001311 0.001934 0 0 0.005895
ityp@dgen 0.000202 0.000394 0 0 0.001211
ref@dgen 0.000184 0.000344 0 0 0.001050
z0@dgen 0.000159 0.000031 0 0 0.000899
name@sys 2.77E-05 2.48E-05 0 0 0
kra@sys 9.54E-06 2.48E-05 0 0 0
lenra@sys 9.54E-06 9.54E-06 9.46E-06 9.46E-06 9.46E-06
![Page 15: Using Dyninst to Dynamically Control Memory Reference Tracing](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813fe5550346895daad278/html5/thumbnails/15.jpg)
University of Maryland15
Size does matterSample Trace Size Time Correlation
0.10% 0.90 MB 0:31 0.999506
0.50% 4.56 MB 1:07 0.999516
1.00% 9.55 MB 1:58 0.999021
1.00% w/ init 9.79 MB 2:00 0.999433
5.00% 46.8 MB 8:22 0.999581
10.00% 116 MB 18:00 0.995556
Full 1,813 MB 43:03
Uninstrumented 0:06
Includes 0:12 mutator overhead
![Page 16: Using Dyninst to Dynamically Control Memory Reference Tracing](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813fe5550346895daad278/html5/thumbnails/16.jpg)
University of Maryland16
Conclusions
Compressed traces may be very large for short runtimes
Sampling single time step no good Concentrate on main processing loop Small (1%) samples accurate enough
![Page 17: Using Dyninst to Dynamically Control Memory Reference Tracing](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813fe5550346895daad278/html5/thumbnails/17.jpg)
University of Maryland17
Ongoing & Future Work
Measure another application Determining time steps at runtime
– Extending code coverage with counters
Adaptive sampling rates– Multi-pass memory profiling
Irregular accesses– Sampling
Multithreaded applications