TAU for Accelerating AI Applications · TAU_TRACK_POWER 0 Tracks power usage by sampling...
Transcript of TAU for Accelerating AI Applications · TAU_TRACK_POWER 0 Tracks power usage by sampling...
![Page 1: TAU for Accelerating AI Applications · TAU_TRACK_POWER 0 Tracks power usage by sampling periodically. TAU_CALLPATH_DEPTH 2 Specifies depth of callpath. Setting to 0 generates no](https://reader033.fdocuments.in/reader033/viewer/2022052000/6011d14ca54c3b3ae15f6514/html5/thumbnails/1.jpg)
SameerShendeUniversityofOregon
JointheConversation#OpenPOWERSummit
TAU for Accelerating AI Applications
![Page 2: TAU for Accelerating AI Applications · TAU_TRACK_POWER 0 Tracks power usage by sampling periodically. TAU_CALLPATH_DEPTH 2 Specifies depth of callpath. Setting to 0 generates no](https://reader033.fdocuments.in/reader033/viewer/2022052000/6011d14ca54c3b3ae15f6514/html5/thumbnails/2.jpg)
TAU Performance System® http://tau.uoregon.edu
• Tuning and Analysis Utilities (20+ year project)
• Comprehensive performance profiling and tracing • Integrated, scalable, flexible, portable • Targets all parallel programming/execution paradigms
• Integrated performance toolkit • Instrumentation, measurement, analysis, visualization • Widely-ported performance profiling / tracing system • Performance data management and data mining • Open source (BSD-style license)
• Integrates with application frameworks
2
![Page 3: TAU for Accelerating AI Applications · TAU_TRACK_POWER 0 Tracks power usage by sampling periodically. TAU_CALLPATH_DEPTH 2 Specifies depth of callpath. Setting to 0 generates no](https://reader033.fdocuments.in/reader033/viewer/2022052000/6011d14ca54c3b3ae15f6514/html5/thumbnails/3.jpg)
TAU Performance System®
![Page 4: TAU for Accelerating AI Applications · TAU_TRACK_POWER 0 Tracks power usage by sampling periodically. TAU_CALLPATH_DEPTH 2 Specifies depth of callpath. Setting to 0 generates no](https://reader033.fdocuments.in/reader033/viewer/2022052000/6011d14ca54c3b3ae15f6514/html5/thumbnails/4.jpg)
Performance Engineering using TAU • How much time is spent in each application routine and outer loops? Within loops, what
is the contribution of each statement? What is the time spent in OpenMP loops? Kernels on the GPU?
• How many instructions are executed in these code regions? Floating point, Level 1 and 2 data cache misses, hits, branches taken?
• What is the memory usage of the code? When and where is memory allocated/de-allocated? Are there any memory leaks? What is the memory footprint of the application? What is the memory high water mark?
• How much energy does the application use in Joules? What is the peak power usage? • What are the I/O characteristics of the code? What is the peak read and write
bandwidth of individual calls, total volume? • How does the application scale? What is the efficiency, runtime breakdown of
performance across different core counts?
![Page 5: TAU for Accelerating AI Applications · TAU_TRACK_POWER 0 Tracks power usage by sampling periodically. TAU_CALLPATH_DEPTH 2 Specifies depth of callpath. Setting to 0 generates no](https://reader033.fdocuments.in/reader033/viewer/2022052000/6011d14ca54c3b3ae15f6514/html5/thumbnails/5.jpg)
Instrumentation • Sourceinstrumentationusingapreprocessor
• Addtimerstart/stopcallsinacopyofthesourcecode.• UseProgramDatabaseToolkit(PDT)forparsingsourcecode.• RequiresrecompilingthecodeusingTAUshellscripts(tau_cc.sh,tau_f90.sh)• Selectiveinstrumentation(filterfile)canreduceruntimeoverheadandnarrowinstrumentationfocus.
• Compiler-basedinstrumentation• Usesystemcompilertoaddaspecialflagtoinserthooksatroutineentry/exit.• RequiresrecompilingusingTAUcompilerscripts(tau_cc.sh,tau_f90.sh…)
• RuntimepreloadingofTAU’sDynamicSharedObject(DSO)• Noneedtorecompilecode!Usetau_exec./appwithoptions.
![Page 6: TAU for Accelerating AI Applications · TAU_TRACK_POWER 0 Tracks power usage by sampling periodically. TAU_CALLPATH_DEPTH 2 Specifies depth of callpath. Setting to 0 generates no](https://reader033.fdocuments.in/reader033/viewer/2022052000/6011d14ca54c3b3ae15f6514/html5/thumbnails/6.jpg)
Profiling and Tracing
• Tracing shows you when the events take place on a timeline
Profiling Tracing
• Profiling shows you how much (total) time was spent in each routine
• Profilingandtracing
Profilingshowsyouhowmuch(total)timewasspentineachroutineTracingshowsyouwhentheeventstakeplaceonatimeline
![Page 7: TAU for Accelerating AI Applications · TAU_TRACK_POWER 0 Tracks power usage by sampling periodically. TAU_CALLPATH_DEPTH 2 Specifies depth of callpath. Setting to 0 generates no](https://reader033.fdocuments.in/reader033/viewer/2022052000/6011d14ca54c3b3ae15f6514/html5/thumbnails/7.jpg)
Inclusive vs. Exclusive values ■ Inclusive
■ Informationofallsub-elementsaggregatedintosinglevalue
■ Exclusive■ Informationcannotbesubdividedfurther
Inclusive Exclusive
int foo() { int a; a = 1 + 1; bar(); a = a + 1; return a; }
![Page 8: TAU for Accelerating AI Applications · TAU_TRACK_POWER 0 Tracks power usage by sampling periodically. TAU_CALLPATH_DEPTH 2 Specifies depth of callpath. Setting to 0 generates no](https://reader033.fdocuments.in/reader033/viewer/2022052000/6011d14ca54c3b3ae15f6514/html5/thumbnails/8.jpg)
Performance Data Measurement DirectviaProbes IndirectviaSampling
• Exact measurement • Fine-grain control • Calls inserted into
code
• No code modification • Minimal effort • Relies on debug
symbols (-g)
Call START(‘potential’) // code Call STOP(‘potential’)
![Page 9: TAU for Accelerating AI Applications · TAU_TRACK_POWER 0 Tracks power usage by sampling periodically. TAU_CALLPATH_DEPTH 2 Specifies depth of callpath. Setting to 0 generates no](https://reader033.fdocuments.in/reader033/viewer/2022052000/6011d14ca54c3b3ae15f6514/html5/thumbnails/9.jpg)
Sampling
• Runningprogramisperiodicallyinterruptedtotakemeasurement
• Timerinterrupt,OSsignal,orHWCoverflow• Serviceroutineexaminesreturn-addressstack• Addressesaremappedtoroutinesusingsymboltableinformation
• Statisticalinferenceofprogrambehavior• Notverydetailedinformationonhighlyvolatilemetrics• Requireslong-runningapplications
• Workswithunmodifiedexecutables
Time main foo(0) foo(1) foo(2) int main()
{ int i; for (i=0; i < 3; i++) foo(i); return 0; } void foo(int i) { if (i > 0) foo(i – 1); }
Measurement
t9 t7 t6 t5 t4 t1 t2 t3 t8
![Page 10: TAU for Accelerating AI Applications · TAU_TRACK_POWER 0 Tracks power usage by sampling periodically. TAU_CALLPATH_DEPTH 2 Specifies depth of callpath. Setting to 0 generates no](https://reader033.fdocuments.in/reader033/viewer/2022052000/6011d14ca54c3b3ae15f6514/html5/thumbnails/10.jpg)
Instrumentation
• Measurementcodeisinsertedsuchthateveryeventofinterestiscaptureddirectly
• Canbedoneinvariousways• Advantage:
• Muchmoredetailedinformation
• Disadvantage:• Processingofsource-code/executablenecessary
• Largerelativeoverheadsforsmallfunctions
Time Measurement int main()
{ int i; for (i=0; i < 3; i++) foo(i); return 0; } void foo(int i) { if (i > 0) foo(i – 1); }
Time
t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14
main foo(0) foo(1) foo(2)
Start(“main”);
Stop (“main”);
Start(“foo”);
Stop (“foo”);
![Page 11: TAU for Accelerating AI Applications · TAU_TRACK_POWER 0 Tracks power usage by sampling periodically. TAU_CALLPATH_DEPTH 2 Specifies depth of callpath. Setting to 0 generates no](https://reader033.fdocuments.in/reader033/viewer/2022052000/6011d14ca54c3b3ae15f6514/html5/thumbnails/11.jpg)
Using TAU’s Runtime Preloading Tool: tau_exec
• Preload a wrapper that intercepts the runtime system call and substitutes with another • MPI, CUDA, OpenACC, pthread, OpenCL
• OpenMP
• POSIX I/O
• Memory allocation/deallocation routines
• Wrapper library for an external package
• No modification to the binary executable! • Enable other TAU options (communication matrix, OTF2, event-
based sampling) • For Python: tau_python replaces python. Similar to tau_exec.
![Page 12: TAU for Accelerating AI Applications · TAU_TRACK_POWER 0 Tracks power usage by sampling periodically. TAU_CALLPATH_DEPTH 2 Specifies depth of callpath. Setting to 0 generates no](https://reader033.fdocuments.in/reader033/viewer/2022052000/6011d14ca54c3b3ae15f6514/html5/thumbnails/12.jpg)
TAU Execution Command (tau_exec) • Uninstrumented execution
• % ./a.out • Track GPU operations
• % tau_exec –cupti ./a.out • % tau_exec –cupti -um ./a.out (for Unified Memory) • % tau_exec –opencl ./a.out • % tau_exec –openacc ./a.out
• Track MPI performance • % tau_exec ./a.out
• Track OpenMP, and MPI performance (MPI enabled by default) • % export TAU_OMPT_SUPPORT_LEVEL=full;
% export TAU_OMPT_RESOLVE_ADDRESS_EAGERLY=1 • % tau_exec –T ompt,tr6,mpi –ompt ./a.out
• Track I/O operations • % tau_exec –io –T pthread ./a.out
• Use event based sampling (compile with –g) • % tau_exec –ebs ./a.out • Also –ebs_source=<PAPI_COUNTER> -ebs_period=<overflow_count>
12
![Page 13: TAU for Accelerating AI Applications · TAU_TRACK_POWER 0 Tracks power usage by sampling periodically. TAU_CALLPATH_DEPTH 2 Specifies depth of callpath. Setting to 0 generates no](https://reader033.fdocuments.in/reader033/viewer/2022052000/6011d14ca54c3b3ae15f6514/html5/thumbnails/13.jpg)
Configuration tags for tau_exec % ./configure –pdt=<dir> -mpi –papi=<dir>; make install Creates in $TAU: Makefile.tau-papi-mpi-pdt(Configuration parameters in stub makefile) shared-papi-mpi-pdt/libTAU.so % ./configure –pdt=<dir> -mpi; make install creates Makefile.tau-mpi-pdt shared-mpi-pdt/libTAU.so To explicitly choose preloading of shared-<options>/libTAU.so change: % mpirun -np 256 ./a.out to % mpirun -np 256 tau_exec –T <comma_separated_options> ./a.out % mpirun -np 256 tau_exec –T papi,mpi,pdt ./a.out Preloads $TAU/shared-papi-mpi-pdt/libTAU.so % mpirun -np 256 tau_exec –T papi ./a.out Preloads $TAU/shared-papi-mpi-pdt/libTAU.so by matching. % aprun –n 256 tau_exec –T papi,mpi,pdt –s ./a.out Does not execute the program. Just displays the library that it will preload if executed without the –s option. NOTE: -mpi configuration is selected by default. Use –T serial for Sequential programs.
![Page 14: TAU for Accelerating AI Applications · TAU_TRACK_POWER 0 Tracks power usage by sampling periodically. TAU_CALLPATH_DEPTH 2 Specifies depth of callpath. Setting to 0 generates no](https://reader033.fdocuments.in/reader033/viewer/2022052000/6011d14ca54c3b3ae15f6514/html5/thumbnails/14.jpg)
ParaProf Profile Browser
% paraprof
![Page 15: TAU for Accelerating AI Applications · TAU_TRACK_POWER 0 Tracks power usage by sampling periodically. TAU_CALLPATH_DEPTH 2 Specifies depth of callpath. Setting to 0 generates no](https://reader033.fdocuments.in/reader033/viewer/2022052000/6011d14ca54c3b3ae15f6514/html5/thumbnails/15.jpg)
ParaProf Profile Browser
![Page 16: TAU for Accelerating AI Applications · TAU_TRACK_POWER 0 Tracks power usage by sampling periodically. TAU_CALLPATH_DEPTH 2 Specifies depth of callpath. Setting to 0 generates no](https://reader033.fdocuments.in/reader033/viewer/2022052000/6011d14ca54c3b3ae15f6514/html5/thumbnails/16.jpg)
ParaProf 3D Window
![Page 17: TAU for Accelerating AI Applications · TAU_TRACK_POWER 0 Tracks power usage by sampling periodically. TAU_CALLPATH_DEPTH 2 Specifies depth of callpath. Setting to 0 generates no](https://reader033.fdocuments.in/reader033/viewer/2022052000/6011d14ca54c3b3ae15f6514/html5/thumbnails/17.jpg)
Tensorflow mnist example with tau_python
![Page 18: TAU for Accelerating AI Applications · TAU_TRACK_POWER 0 Tracks power usage by sampling periodically. TAU_CALLPATH_DEPTH 2 Specifies depth of callpath. Setting to 0 generates no](https://reader033.fdocuments.in/reader033/viewer/2022052000/6011d14ca54c3b3ae15f6514/html5/thumbnails/18.jpg)
![Page 19: TAU for Accelerating AI Applications · TAU_TRACK_POWER 0 Tracks power usage by sampling periodically. TAU_CALLPATH_DEPTH 2 Specifies depth of callpath. Setting to 0 generates no](https://reader033.fdocuments.in/reader033/viewer/2022052000/6011d14ca54c3b3ae15f6514/html5/thumbnails/19.jpg)
ParaProf: Thread Statistics Window
![Page 20: TAU for Accelerating AI Applications · TAU_TRACK_POWER 0 Tracks power usage by sampling periodically. TAU_CALLPATH_DEPTH 2 Specifies depth of callpath. Setting to 0 generates no](https://reader033.fdocuments.in/reader033/viewer/2022052000/6011d14ca54c3b3ae15f6514/html5/thumbnails/20.jpg)
TAU: Tracking Data Transfers
![Page 21: TAU for Accelerating AI Applications · TAU_TRACK_POWER 0 Tracks power usage by sampling periodically. TAU_CALLPATH_DEPTH 2 Specifies depth of callpath. Setting to 0 generates no](https://reader033.fdocuments.in/reader033/viewer/2022052000/6011d14ca54c3b3ae15f6514/html5/thumbnails/21.jpg)
Thread Profile on Thread 341
![Page 22: TAU for Accelerating AI Applications · TAU_TRACK_POWER 0 Tracks power usage by sampling periodically. TAU_CALLPATH_DEPTH 2 Specifies depth of callpath. Setting to 0 generates no](https://reader033.fdocuments.in/reader033/viewer/2022052000/6011d14ca54c3b3ae15f6514/html5/thumbnails/22.jpg)
ParaProf: Thread Statistics Window Thread 0
![Page 23: TAU for Accelerating AI Applications · TAU_TRACK_POWER 0 Tracks power usage by sampling periodically. TAU_CALLPATH_DEPTH 2 Specifies depth of callpath. Setting to 0 generates no](https://reader033.fdocuments.in/reader033/viewer/2022052000/6011d14ca54c3b3ae15f6514/html5/thumbnails/23.jpg)
ParaProf 3D Window for Caffe: Googlenet
![Page 24: TAU for Accelerating AI Applications · TAU_TRACK_POWER 0 Tracks power usage by sampling periodically. TAU_CALLPATH_DEPTH 2 Specifies depth of callpath. Setting to 0 generates no](https://reader033.fdocuments.in/reader033/viewer/2022052000/6011d14ca54c3b3ae15f6514/html5/thumbnails/24.jpg)
ParaProf Manager Window
![Page 25: TAU for Accelerating AI Applications · TAU_TRACK_POWER 0 Tracks power usage by sampling periodically. TAU_CALLPATH_DEPTH 2 Specifies depth of callpath. Setting to 0 generates no](https://reader033.fdocuments.in/reader033/viewer/2022052000/6011d14ca54c3b3ae15f6514/html5/thumbnails/25.jpg)
ParaProf Node Window
![Page 26: TAU for Accelerating AI Applications · TAU_TRACK_POWER 0 Tracks power usage by sampling periodically. TAU_CALLPATH_DEPTH 2 Specifies depth of callpath. Setting to 0 generates no](https://reader033.fdocuments.in/reader033/viewer/2022052000/6011d14ca54c3b3ae15f6514/html5/thumbnails/26.jpg)
Thread Statistics Window: Caffe Googlenet
![Page 27: TAU for Accelerating AI Applications · TAU_TRACK_POWER 0 Tracks power usage by sampling periodically. TAU_CALLPATH_DEPTH 2 Specifies depth of callpath. Setting to 0 generates no](https://reader033.fdocuments.in/reader033/viewer/2022052000/6011d14ca54c3b3ae15f6514/html5/thumbnails/27.jpg)
TAU’s Static Analysis System: Program Database Toolkit (PDT)
Application/ Library
C / C++parser
Fortran parserF77/90/95
C / C++IL analyzer
FortranIL analyzer
ProgramDatabase
Files
IL IL
DUCTAPETAU �
instrumentorAutomatic sourceinstrumentation
.
.
.
![Page 28: TAU for Accelerating AI Applications · TAU_TRACK_POWER 0 Tracks power usage by sampling periodically. TAU_CALLPATH_DEPTH 2 Specifies depth of callpath. Setting to 0 generates no](https://reader033.fdocuments.in/reader033/viewer/2022052000/6011d14ca54c3b3ae15f6514/html5/thumbnails/28.jpg)
tau_instrumentor
Parsedprogram
Instrumentationspecificationfile
Instrumentedcopyofsource
TAU source analyzer
Applicationsource
PDT: automatic source instrumentation
![Page 29: TAU for Accelerating AI Applications · TAU_TRACK_POWER 0 Tracks power usage by sampling periodically. TAU_CALLPATH_DEPTH 2 Specifies depth of callpath. Setting to 0 generates no](https://reader033.fdocuments.in/reader033/viewer/2022052000/6011d14ca54c3b3ae15f6514/html5/thumbnails/29.jpg)
EnvironmentVariable Default Description
TAU_TRACE 0 Settingto1turnsontracing
TAU_CALLPATH 0 Settingto1turnsoncallpathprofiling
TAU_TRACK_MEMORY_FOOTPRINT 0 Settingto1turnsontrackingmemoryusagebysamplingperiodicallytheresidentsetsizeandhighwatermarkofmemoryusage
TAU_TRACK_POWER 0 Trackspowerusagebysamplingperiodically.
TAU_CALLPATH_DEPTH 2 Specifiesdepthofcallpath.Settingto0generatesnocallpathorroutineinformation,settingto1generatesflatprofileandcontexteventshavejustparentinformation(e.g.,HeapEntry:foo)
TAU_SAMPLING 1 Settingto1enablesevent-basedsampling.
TAU_TRACK_SIGNALS 0 Settingto1generatedebuggingcallstackinfowhenaprogramcrashes
TAU_COMM_MATRIX 0 Settingto1generatescommunicationmatrixdisplayusingcontextevents
TAU_THROTTLE 1 Settingto0turnsoffthrottling.Throttlesinstrumentationinlightweightroutinesthatarecalledfrequently
TAU_THROTTLE_NUMCALLS 100000 Specifiesthenumberofcallsbeforetestingforthrottling
TAU_THROTTLE_PERCALL 10 Specifiesvalueinmicroseconds.Throttlearoutineifitiscalledover100000timesandtakeslessthan10usecofinclusivetimepercall
TAU_CALLSITE 0 Settingto1enablescallsiteprofilingthatshowswhereaninstrumentedfunctionwascalled.Alsocompatiblewithtracing.
TAU_PROFILE_FORMAT Profile Settingto“merged”generatesasinglefile.“snapshot”generatesxmlformat
TAU_METRICS TIME Settingtoacommaseparatedlistgeneratesothermetrics.(e.g.,ENERGY,TIME,P_VIRTUAL_TIME,PAPI_FP_INS,PAPI_NATIVE_<event>:<subevent>)
Runtime Environment Variables
![Page 30: TAU for Accelerating AI Applications · TAU_TRACK_POWER 0 Tracks power usage by sampling periodically. TAU_CALLPATH_DEPTH 2 Specifies depth of callpath. Setting to 0 generates no](https://reader033.fdocuments.in/reader033/viewer/2022052000/6011d14ca54c3b3ae15f6514/html5/thumbnails/30.jpg)
EnvironmentVariable Default Description
TAU_TRACE 0 Settingto1turnsontracing
TAU_TRACE_FORMAT Default Settingto“otf2”turnsonTAU’snativeOTF2tracegeneration(configurewith–otf=download)
TAU_EBS_UNWIND 0 Settingto1turnsonunwindingthecallstackduringsampling(usewithtau_exec–ebsorTAU_SAMPLING=1)
TAU_EBS_RESOLUTION line Settingto“function”or“file”changesthesamplingresolutiontofunctionorfilelevelrespectively.
TAU_TRACK_LOAD 0 Settingto1trackssystemloadonthenode
TAU_SELECT_FILE Default Settingtoafilename,enablesselectiveinstrumentationbasedonexclude/includelistsspecifiedinthefile.
TAU_OMPT_SUPPORT_LEVEL basic Settingto“full”improvesresolutionofOMPTTR6regionsonthreads1..N-1.Also,“lowoverhead”optionisavailable.
TAU_OMPT_RESOLVE_ADDRESS_EAGERLY 0 Settingto1isnecessaryforeventbasedsamplingtoresolveaddresseswithOMPTTR6(-ompt=download-tr6)
Runtime Environment Variables
![Page 31: TAU for Accelerating AI Applications · TAU_TRACK_POWER 0 Tracks power usage by sampling periodically. TAU_CALLPATH_DEPTH 2 Specifies depth of callpath. Setting to 0 generates no](https://reader033.fdocuments.in/reader033/viewer/2022052000/6011d14ca54c3b3ae15f6514/html5/thumbnails/31.jpg)
EnvironmentVariable Default Description
TAU_TRACK_MEMORY_LEAKS 0 Tracksallocatesthatwerenotde-allocated(needs–optMemDbgortau_exec–memory)
TAU_EBS_SOURCE TIME AllowsusingPAPIhardwarecountersforperiodicinterruptsforEBS(e.g.,TAU_EBS_SOURCE=PAPI_TOT_INSwhenTAU_SAMPLING=1)
TAU_EBS_PERIOD 100000 Specifiestheoverflowcountforinterrupts
TAU_MEMDBG_ALLOC_MIN/MAX 0 Bytesizeminimumandmaximumsubjecttoboundschecking(usedwithTAU_MEMDBG_PROTECT_*)
TAU_MEMDBG_OVERHEAD 0 SpecifiesthenumberofbytesforTAU’smemoryoverheadformemorydebugging.
TAU_MEMDBG_PROTECT_BELOW/ABOVE 0 Settingto1enablestrackingruntimeboundscheckingbeloworabovethearraybounds(requires–optMemDbgwhilebuildingortau_exec–memory)
TAU_MEMDBG_ZERO_MALLOC 0 Settingto1enablestrackingzerobyteallocationsasinvalidmemoryallocations.
TAU_MEMDBG_PROTECT_FREE 0 Settingto1detectsinvalidaccessestodeallocatedmemorythatshouldnotbereferenceduntilitisreallocated(requires–optMemDbgortau_exec–memory)
TAU_MEMDBG_ATTEMPT_CONTINUE 0 Settingto1allowsTAUtorecordandcontinueexecutionwhenamemoryerroroccursatruntime.
TAU_MEMDBG_FILL_GAP Undefined Initialvalueforgapbytes
TAU_MEMDBG_ALINGMENT Sizeof(int) Bytealignmentformemoryallocations
TAU_EVENT_THRESHOLD 0.5 Defineathresholdvalue(e.g.,.25is25%)totriggermarkereventsformin/max
Runtime Environment Variables (contd.)
![Page 32: TAU for Accelerating AI Applications · TAU_TRACK_POWER 0 Tracks power usage by sampling periodically. TAU_CALLPATH_DEPTH 2 Specifies depth of callpath. Setting to 0 generates no](https://reader033.fdocuments.in/reader033/viewer/2022052000/6011d14ca54c3b3ae15f6514/html5/thumbnails/32.jpg)
DownloadTAUfromU.Oregon
http://www.hpclinux.com[OVAfile]http://tau.uoregon.eduformoreinformation
Freedownload,opensource,BSDlicense
![Page 33: TAU for Accelerating AI Applications · TAU_TRACK_POWER 0 Tracks power usage by sampling periodically. TAU_CALLPATH_DEPTH 2 Specifies depth of callpath. Setting to 0 generates no](https://reader033.fdocuments.in/reader033/viewer/2022052000/6011d14ca54c3b3ae15f6514/html5/thumbnails/33.jpg)
Performance Research Lab, UO, Eugene
www.uoregon.edu
![Page 34: TAU for Accelerating AI Applications · TAU_TRACK_POWER 0 Tracks power usage by sampling periodically. TAU_CALLPATH_DEPTH 2 Specifies depth of callpath. Setting to 0 generates no](https://reader033.fdocuments.in/reader033/viewer/2022052000/6011d14ca54c3b3ae15f6514/html5/thumbnails/34.jpg)
Support Acknowledgments • US Department of Energy (DOE)
• ANL • Office of Science contracts, ECP • SciDAC, LBL contracts • LLNL-LANL-SNL ASC/NNSA contract • Battelle, PNNL and ORNL contract
• Department of Defense (DoD) • PETTT, HPCMP
• National Science Foundation (NSF) • SI2-SSI, Glassbox
• NASA
• CEA
• IBM
• Partners: • The Ohio State University • ParaTools, Inc. • University of Tennessee, Knoxville • T.U. Dresden, GWT • Jülich Supercomputing Center