Stochastic Program Execution Tracing
description
Transcript of Stochastic Program Execution Tracing
![Page 1: Stochastic Program Execution Tracing](https://reader035.fdocuments.in/reader035/viewer/2022062422/56813a92550346895da28e4d/html5/thumbnails/1.jpg)
Stochastic Program Execution Tracing
Jeff Odom, UMD
![Page 2: Stochastic Program Execution Tracing](https://reader035.fdocuments.in/reader035/viewer/2022062422/56813a92550346895da28e4d/html5/thumbnails/2.jpg)
University of Maryland2
SIGMA Goals
IBM/UMD tools to understand caches– Focus of detailed statistics– Complement existing hardware counters
Ability to handle real applications– MPI and OpenMP programs– Fortran and C
Provide hints about restructuring– Padding (both inter and intra data
structures)– Blocking
UMD effort funded by PERC2
![Page 3: Stochastic Program Execution Tracing](https://reader035.fdocuments.in/reader035/viewer/2022062422/56813a92550346895da28e4d/html5/thumbnails/3.jpg)
University of Maryland3
Original SIGMA Approach
Static instrumentation– Capture full information about memory use– Produce compact trace
• Extracts loops and memory strides
Post execution tools– Detailed simulator
• Full discrete event simulator– Memory profiler
• Portion of accesses attributed to each data structure
![Page 4: Stochastic Program Execution Tracing](https://reader035.fdocuments.in/reader035/viewer/2022062422/56813a92550346895da28e4d/html5/thumbnails/4.jpg)
University of Maryland4
Representing Program Execution
Capture full execution behavior– Record all basic blocks and memory
addresses– Produces large traces (due to looping)
Trace compression– Maintain pattern buffer – Scan for repeating patterns
• Extract memory strides– Repeat algorithms for nested loopsBLK1 ADR ADR ADRBLK2
100 200 300
4 4 4
300 500
4 4
ADR ADR
250
7
BLK3RPT
Count
Length
Base
Stride
![Page 5: Stochastic Program Execution Tracing](https://reader035.fdocuments.in/reader035/viewer/2022062422/56813a92550346895da28e4d/html5/thumbnails/5.jpg)
University of Maryland5
Trace Compression Isn’t Enough
A few seconds…– Slows execution considerably– Generates gigabytes
Orig Time (s)
Slowdown Trace Size (KB)
seis 8 4463x 1,934,667
BT 8 6000x 74,221
swim 396 777x 29
![Page 6: Stochastic Program Execution Tracing](https://reader035.fdocuments.in/reader035/viewer/2022062422/56813a92550346895da28e4d/html5/thumbnails/6.jpg)
University of Maryland6
Sampling
We want…– Shorter execution times– Smaller traces
We need…– Representative traces– Where to sample?
Timestep boundary– Outermost loop– Requires manual identification (for now)
![Page 7: Stochastic Program Execution Tracing](https://reader035.fdocuments.in/reader035/viewer/2022062422/56813a92550346895da28e4d/html5/thumbnails/7.jpg)
University of Maryland7
Dyninst + SIGMA = dynSIGMA
Dyninst adds flexibility– Vary sample rate without recompilation– Adaptive/progressive rate during execution– Target application runs at native speed when
instrumentation turned off
Leverage existing SIGMA infrastructure– Only generate trace– Offline simulation/profiling steps unchanged
Dual application framework– Mutatee generates trace– Mutator toggles instrumentation
![Page 8: Stochastic Program Execution Tracing](https://reader035.fdocuments.in/reader035/viewer/2022062422/56813a92550346895da28e4d/html5/thumbnails/8.jpg)
University of Maryland8
Memtime
Simple but effective metric of application memory performance
n
iii TlatencyTmisslhmemtime
1
)(
miss TLB of penalty
misses TLB
levelat latency cache
levelat hits
levels cache
Tlatency
Tmiss
il
ih
n
i
i
![Page 9: Stochastic Program Execution Tracing](https://reader035.fdocuments.in/reader035/viewer/2022062422/56813a92550346895da28e4d/html5/thumbnails/9.jpg)
University of Maryland9
Characteristic Pattern
Local and global data objects given canonical name
Vector of objects’ memtime is characteristic data pattern
Comparison of characteristic patterns done with simple linear correlation
Can also be applied for function objects
![Page 10: Stochastic Program Execution Tracing](https://reader035.fdocuments.in/reader035/viewer/2022062422/56813a92550346895da28e4d/html5/thumbnails/10.jpg)
University of Maryland10
Example Application: seis
Seismic simulation from SPEChpc2002– Models multiple seismic processes– Process results pipelined
Variable timesteps– Different data pattern for each process
C & Fortran– Fortran – data processing– C – dynamic memory management, IO
![Page 11: Stochastic Program Execution Tracing](https://reader035.fdocuments.in/reader035/viewer/2022062422/56813a92550346895da28e4d/html5/thumbnails/11.jpg)
University of Maryland11
Space & Time Gains From Sampling
Trace Size (MB)
Time (h:m:s) Correlation
1.00% 13.51 9:04 0.996139
2.50% 33.14 40:00 0.997124
5.00% 66.33 1:12:48 0.997307
10.00% 133.17 2:16:00 0.997131
Full (SIGMA) 1,889.32 9:55:04
Original seis 0:08
Includes 0:12 instrumentation overhead
![Page 12: Stochastic Program Execution Tracing](https://reader035.fdocuments.in/reader035/viewer/2022062422/56813a92550346895da28e4d/html5/thumbnails/12.jpg)
University of Maryland12
Challenge of Irregularity
Compression requires regular accesses
Sampling may hide poor compression– Each sample may compress poorly– Offset by low sampling rate
Sampling may not be accurate enough– Control flow sampled as well– Sample boundary requires manual definition
![Page 13: Stochastic Program Execution Tracing](https://reader035.fdocuments.in/reader035/viewer/2022062422/56813a92550346895da28e4d/html5/thumbnails/13.jpg)
University of Maryland13
Hybrid Traces
Accuracy may be more important than execution time, but storage capacity may be limited
Modeling data access at particular points can be more accurate than timestep sampling
Many codes are mostly regular, but irregular patterns spoil compression
![Page 14: Stochastic Program Execution Tracing](https://reader035.fdocuments.in/reader035/viewer/2022062422/56813a92550346895da28e4d/html5/thumbnails/14.jpg)
University of Maryland14
Modified Linear Regression
Establish linear pattern (min 3 points) at each memory access location
Look for repetitions of pattern with higher-level strides
Once input no longer matches pattern, treat further input as irregular until new pattern discovered
![Page 15: Stochastic Program Execution Tracing](https://reader035.fdocuments.in/reader035/viewer/2022062422/56813a92550346895da28e4d/html5/thumbnails/15.jpg)
University of Maryland15
Modified Linear Regression
Irregular sequence modeled using uniform distribution
Pattern matching done local to each instrumentation (memory access) point– Original SIGMA pattern matches globally
![Page 16: Stochastic Program Execution Tracing](https://reader035.fdocuments.in/reader035/viewer/2022062422/56813a92550346895da28e4d/html5/thumbnails/16.jpg)
University of Maryland16
Modified Linear Regression
Example: 0, 1, 2, 5, 9, 10, 11, 12, 2, 5
![Page 17: Stochastic Program Execution Tracing](https://reader035.fdocuments.in/reader035/viewer/2022062422/56813a92550346895da28e4d/html5/thumbnails/17.jpg)
University of Maryland17
Modified Linear Regression
Example: 0, 1, 2, 5, 9, 10, 11, 12, 2, 5
![Page 18: Stochastic Program Execution Tracing](https://reader035.fdocuments.in/reader035/viewer/2022062422/56813a92550346895da28e4d/html5/thumbnails/18.jpg)
University of Maryland18
Modified Linear Regression
Example: 0, 1, 2, 5, 9, 10, 11, 12, 2, 5
Becomes: 0 + x + 10y + {5,9,2,5}
![Page 19: Stochastic Program Execution Tracing](https://reader035.fdocuments.in/reader035/viewer/2022062422/56813a92550346895da28e4d/html5/thumbnails/19.jpg)
University of Maryland19
Modified Linear Regression
Example: 0, 1, 2, 5, 9, 10, 11, 12, 2, 5
Becomes: 0 + x + 10y + {5,9,2,5}
Becomes: 0 + x + 10y + {l:2, h:9}
![Page 20: Stochastic Program Execution Tracing](https://reader035.fdocuments.in/reader035/viewer/2022062422/56813a92550346895da28e4d/html5/thumbnails/20.jpg)
University of Maryland20
Experiment Setup
NAS Parallel Benchmarks 3.2 Serial Version, Class S
IBM XL C 8.0, XL Fortran 10.1 DyninstAPI 5.0, including
– Liveness analysis• Up to 90% runtime reduction by excluding
one SPR (MQ)• Additional 3% improvement with other
GPR/FPR– Transactional instrumentation
Instrumentation always on (no sampling)
![Page 21: Stochastic Program Execution Tracing](https://reader035.fdocuments.in/reader035/viewer/2022062422/56813a92550346895da28e4d/html5/thumbnails/21.jpg)
University of Maryland21
Transactional Instrumentation
Reduces– Memory allocation– Insertion time
Atomic operation
BPatch_thread *thr;
BPatch_process *proc;
proc = thr->getProcess();
proc->beginInsertionSet();
…
thr->insertSnippet(…);
thr->insertSnippet(…);
…
proc->finalizeInsertionSet(true);
![Page 22: Stochastic Program Execution Tracing](https://reader035.fdocuments.in/reader035/viewer/2022062422/56813a92550346895da28e4d/html5/thumbnails/22.jpg)
University of Maryland22
Trace Size
BT CG EP FT LU MG SP
OriginalSize (KB) 16,732 489,81
7648,32
3344 1,011 495 1,405
Reduction w/ Irreg Comp (KB)
(20) 289,551
98,620 0 (53) (90) 78
-30.0%
-20.0%-10.0%
0.0%10.0%
20.0%
30.0%40.0%
50.0%60.0%
70.0%
BT CG EP FT LU MG SP
![Page 23: Stochastic Program Execution Tracing](https://reader035.fdocuments.in/reader035/viewer/2022062422/56813a92550346895da28e4d/html5/thumbnails/23.jpg)
University of Maryland23
Accuracy
Memtime (s)1 – CorrelationOriginal New
BT 1.2139 1.2139 2.3 E-8
CG 0.2442 0.2403 5.7 E-8
EP 2.2881 2.2898 9.4 E-7
LU 0.3205 0.3206 8.2 E-8
MG 0.0558 0.0558 1.3 E-5
SP 0.5162 0.5161 4.0 E-8
![Page 24: Stochastic Program Execution Tracing](https://reader035.fdocuments.in/reader035/viewer/2022062422/56813a92550346895da28e4d/html5/thumbnails/24.jpg)
University of Maryland24
Future Work
Larger datasets (NPB Class B,C)– Some results already gathered for W
Distributions other than uniform Irregular control flow
– Example: Upper triangular matrix does not need to iterate all MxN values
– Uses edge instrumentation• BPatch_basicBlock::getIncomingEdges• BPatch_basicBlock::getOutgoingEdges• BPatch_edge::getPoint
![Page 25: Stochastic Program Execution Tracing](https://reader035.fdocuments.in/reader035/viewer/2022062422/56813a92550346895da28e4d/html5/thumbnails/25.jpg)
University of Maryland25
Questions?