Post on 30-Mar-2015
Paradyn/Condor Week 2004
MATE:Monitoring, Analysis and Tuning
Environment
Anna Morajko, Tomàs Margalef and Emilio LuqueUniversitat Autònoma de Barcelona
Paradyn/Condor Week 2004April 2004
2Paradyn/Condor Week 2004
1. Introduction
2. Dynamic Performance Tuning
3. MATE
4. Tuning Techniques
5. Conclusions and future work
Content
3Paradyn/Condor Week 2004
Introduction
Application performance
• Demand of high performance computation
• The main goal of parallel/distributed applications: solve a considered problem in the possible fastest way
• Performance is one of the most important issues
• Developers must optimize application performance to provide efficient and useful applications
4Paradyn/Condor Week 2004
Introduction
Application performance optimization
Steps: • monitoring,• analysis,• tuning
Bottlenecks
Application development
Monitored execution
Solutions
Source code relation
Performance data
Application
SourceInstrumentation
Modifications
Monitoring Tuning
Performance analysis
Measurements Changes
5Paradyn/Condor Week 2004
Introduction
Application performance optimization
• Difficulties in finding bottlenecks and determining their solutions for parallel/distributed applications– Many tasks that cooperate with each other
• High degree of expertise
• Application behavior may change on input data or environment
• Difficult task especially for non-expert users
6Paradyn/Condor Week 2004
Introduction
Our goals
• Investigate if it is possible to optimize performance of parallel/distributed applications dynamically without user intervention
• Investigate the applicability of dynamic tuning
• Create a tool that is able to dynamically optimize applications: – automatically improve application performance – improve the application execution during run time – tune without recompiling and rerunning– adapt application to existing conditions
• Practically evaluate profitability of dynamic tuning
7Paradyn/Condor Week 2004
Introduction
Dynamic automatic tuning
User
TuningMonitoring
Tool
SolutionProblem /
Performance analysis
Modifications
Performance data
Application development
Application
Execution
Source
Instrumentation
Events
8Paradyn/Condor Week 2004
1. Introduction
2. Dynamic Performance Tuning
3. MATE
4. Tuning Techniques
5. Conclusions and future work
Content
9Paradyn/Condor Week 2004
Dynamic Performance Tuning
Requirements
• No user intervention• No source recompilation• Performance analysis on the fly
– Global analysis– Decisions taken in a short time– Not complex analysis and modifications
• Run time monitoring • Run time tuning
– Modifications performed carefully
• Parallel/distributed application control• Low intrusion
10Paradyn/Condor Week 2004
Dynamic Performance Tuning
Key question
What can be tuned in an application?
Application knowledgeLimited information about the application
Tuning layersApproaches to tuning
11Paradyn/Condor Week 2004
Dynamic Performance Tuning
Tuning layers
• Application specific code• Standard and custom libraries (API+code)• Operating system libraries (API+code)• Hardware
Hardware
Operating System kernel
OS API
Libraries code
API
Application code
12Paradyn/Condor Week 2004
Dynamic Performance Tuning
Application • Application code changes
– Different bottlenecks that depend on the application implementation
Libraries• Library code changes• API usage
– Standard• C/C++ library -> memory management,
dynamic containers– Custom
• PVM, MPI -> communication
OS• Kernel code changes• API usage
– Adjustment of options (e.g. TCP/IP socket), I/O request grouping
More bottlenecks common for wider group of applications
Hardware
Operating System kernel
OS API
Libraries code
API
Application code
13Paradyn/Condor Week 2004
Dynamic Performance Tuning
Approaches to tuning
• Cooperative– Application must be prepared
for tuning– Application-specific knowledge
is provided
• Automatic - black-box– Tuning of any application– No application-specific
knowledge is required– Knowledge about bottleneck is required– No changes are introduced
into the application source code
More automatic, more generic
information available
More cooperative, more application-
specific
Hardware
Operating System kernel
OS API
Libraries code
API
Application code
14Paradyn/Condor Week 2004
Dynamic Performance Tuning
Knowledge representation
• Measure points– Where the instrumentation must be inserted to provide
measurements
• Performance model– Determines minimal execution
time of the entire application
• Tuning points/actions/synchronization– What and when can be changed in the application
• point – element that may be changed• action – what to invoke on a point• synchronization – when a tuning action can be invoked to ensure
application correctness
Formulasand conditions
for optimal behavior
measurements
optimal values
15Paradyn/Condor Week 2004
Dynamic Performance Tuning
Application knowledge
Measure points
Performance model
Tuning point, action, sync
Provided by the user
Provided automatically by a tuning system Hardware
Operating System kernel
OS API
Libraries code
API
Application code
16Paradyn/Condor Week 2004
Dynamic Performance Tuning
Manipulation of a running application• monitoring – collect information about the behavior of a running
application
• tuning – insert tuning code into a running application that improves its performance
Dynamic instrumentation – DynInst
17Paradyn/Condor Week 2004
Dynamic Performance Tuning
Dynamic modifications of a running application with DynInst
• Function replacement
• Function invocation
• One-time function invocation
• Function call elimination
• Function parameter changes
• Variable changes
18Paradyn/Condor Week 2004
1. Introduction
2. Dynamic Performance Tuning
3. MATE
4. Tuning Techniques
5. Conclusions and future work
Content
19Paradyn/Condor Week 2004
MATE
MATE – Monitoring, Analysis and Tuning Environment
• prototype implementation in C++• for PVM based applications• Sun Solaris 2.x / SPARC
20Paradyn/Condor Week 2004
MATE
Machine 1 Machine 2
Machine 3
pvmd
Analyzer
pvmd
AC
instr.
events
modif.
events
DMLibDMLibDMLib
Task1 Task2Task3
instr.
AC
• Application Controller - AC• Dynamic Monitoring Library - DMLib• Analyzer
21Paradyn/Condor Week 2004
MATE: Application Controller
Services
• Distributed application control– Startup/exit of tasks (Tasker)– Startup/exit of PVM daemons, slave ACs (Hoster)– Clock synchronization
• Application model management (Task Manager)• Performance monitoring (Monitors)
– Manage monitoring instrumentation– Provide monitoring API for Analyzer
• Performance tuning (Tuners)– Manage tuning instrumentation– Provide tuning API for Analyzer
22Paradyn/Condor Week 2004
MATE: Application Controller
Machine 1
DMLib DMLib
Task2Task1
InstrumentVia
DynInst
Machine 2
Analyzer
add event/remove event
ACMonitor
Monitors
• Instrumentation management via DynInst– Dynamically load DMLib– Generate monitoring snippets that
call appropriate library functions– Insert/remove snippets in/from
requested points
• API– AddEventTrace(tid,
eventId, funcName, instrPlace, attrs)
– RemoveEventTrace(tid,eventId)
23Paradyn/Condor Week 2004
MATE: Application Controller
Tuners
• Tuning via DynInst– Generate tuning snippet according to
the request– Insert tuning snippet
• API– LoadLibrary(tid,path)– SetVariableValue(tid,params,brkpt)– ReplaceFunction(…)– InsertFunctionCall(…)– OneTimeFunctionCall(…)– RemoveFunctionCall(…)– FunctionParamChange(…)
Machine 1Task2Task1
TuneVia
DynInst
Machine 2
Analyzer
Apply tuning
ACTuner
24Paradyn/Condor Week 2004
MATE: Dynamic Monitoring Library
Services
• Register event• What – event type (id, place)
• When – global timestamp
• Where – task identifier
• Requested attributes – e.g. function call parameters, return value
• Deliver event to the Analyzer
• API– DMLib_InitLogger(tid,
analyzerHost,port,clockDiff)– DMLib_OpenEvent(id, nAttrs)– DMLib_AddIntAttr(value)– DMLib_AddFloatAttr(value)– DMLib_AddCharAttr(value)– DMLib_AddStringAttr(value)– DMLib_CloseEvent()– DMLib_DoneLogger()
Machine 1
DMLib
Task1
pvm_send (p1, p2){
}
pvm_send (p1, p2){
}
DMLib_OpenEvent();DMLib_AddIntAttr();DMLib_AddIntAttr();DMLib_CloseEvent();
DMLib_OpenEvent();DMLib_AddIntAttr();DMLib_AddIntAttr();DMLib_CloseEvent();
Analyzer
entry
1 0
64884 524247 262149
1
TCP/IP
event
API implementationAPI implementation
25Paradyn/Condor Week 2004
MATE: Analyzer
Services
• Automatic performance analysis on the fly– Request for events– Collect incoming events– Find bottlenecks among events applying performance model– Find solutions that overcome bottlenecks– Send tuning request
• Analyzer is provided with an application knowledge about performance problems
• Information related to one problem we call a tuning technique • A tuning technique describes a complete performance
optimization scenario
26Paradyn/Condor Week 2004
MATE: Analyzer
Tunlets
• Each technique is implemented in MATE as a tunlet• A tunlet contains specific code (analysis logic) related to one
concrete performance problem– measure points – what events are needed– performance model – how to determine bottlenecks and solutions– tuning actions/points/synchronization - what to change, where, when
• A tunlet is a C/C++ library dynamically loaded to the Analyzer process
Analyzer
Tunlet
Measure points Tuning point, action, sync
Performance model
27Paradyn/Condor Week 2004
MATE: Analyzer
Events (from DMLibs) via TCP/IP
Event Collector
thread
DTAPI
Controller
Tunlet
Tunlet
EventRepository
Application model
AC Proxy
Tuning request (to tuner)
via TCP/IP
Instrument. request (to monitor)
via TCP/IP
MetaData (from ACs) via TCP/IP
Tunlet
28Paradyn/Condor Week 2004
1. Introduction
2. Dynamic Performance Tuning
3. MATE
4. Tuning Example
5. Conclusions and future work
Content
30Paradyn/Condor Week 2004
Tuning Example
Workload balancing (App layer)
• Imbalance problem: – Heterogeneous computing and communication powers– Varying amount of distributed work
• Goal: – minimize the idle time by balancing the work among the
processes considering efficiency of machines
• Balancing -> faster machines process more work than slower• It cannot be statically balanced before program execution
(different input data, network load, machine power and load)
31Paradyn/Condor Week 2004
Tuning Example
Workload balancing (App layer)
• Many scheduling methods -> Factoring Scheduling method– Work is divided into different-size tuples according to the factor
• Application must be tunable:– well known variable that represents the factor– the factor must be checked before each iteration of the work
distribution– the work tuples are calculated using the factoring scheduling
method and according to the current factor value
32Paradyn/Condor Week 2004
Tuning Example
Example application• Forest Fire propagation – Xfire• High computation cost
1967
3919
1885 1953 2071
3768
0
500
1000
1500
2000
2500
3000
3500
4000
4500
1 2 3 scenarios
application execution time [sec]
No tuning
Tuning
Scenarios: 1) homogeneous and dedicated 2) heterogeneous and dedicated 3) heterogeneous and non-dedicated
Benefits: 1) Up to 2%2) Up to 49% 3) Up to 48%
33Paradyn/Condor Week 2004
1. Introduction
2. Dynamic Performance Tuning
3. MATE
4. Tuning Techniques
5. Conclusions and future work
Content
34Paradyn/Condor Week 2004
Conclusions
• The principal conclusion: dynamic tuning works, is applicable, effective and useful in certain conditions
• Limits of such tuning -> incomplete application information
• Classification of layers where tuning can be performed (OS, libraries, apps)
• Approaches to tuning: automatic and cooperative
• Application knowledge representation: – measure points, performance model, tuning point/action/sync
35Paradyn/Condor Week 2004
Conclusions
• Working prototype environment – MATE – that automatically monitors, analyses and tunes running applications
• Practical experiments conducted with MATE and parallel/distributed applications prove that it automatically adapts application behavior to existing conditions during run time!
36Paradyn/Condor Week 2004
Future work
• Global and local analysis– Scalability (problems with global analysis) – Some problems can be treated locally
• Performance analysis– How tuning techniques influence other techniques– Other approaches than performance model
• Metrics– Complementary information provided by metrics
• Provision of the application knowledge– Tunlet provided externally in a declarative manner
• Instrumentation evaluation– Prediction of monitoring and tuning instrumentation cost
37Paradyn/Condor Week 2004
Future work
• Tuning techniques– OS layer
• TCP/IP options (e.g. sending without delay – Nagle’s algorithm)
• I/O operations (e.g. read/write operations, I/O buffer size)
– Library layer• Investigation of problems in MPI, numerical libraries
– Application layer • Automatic selection of algorithm (e.g. sorting algorithm)
• Recommendations– Provision of good explanation to the user
• Towards grid
Paradyn/Condor Week 2004
ThesisMarch, 2004
Thank you very much