Pablo and Autopilot: Performance Tuning in Distributed Computing Environments Ruth Aydt Pablo...
-
Upload
lambert-jennings -
Category
Documents
-
view
217 -
download
0
Transcript of Pablo and Autopilot: Performance Tuning in Distributed Computing Environments Ruth Aydt Pablo...
Pablo and Autopilot: Performance Tuning in Distributed Computing
Environments
Ruth AydtPablo Research Group
Department of Computer ScienceUniversity of Illinois at Urbana-Champaign
http://www-pablo.cs.uiuc.eduPablo Research Group - Department of Computer Science - UIUC
Presentation Outline• Requirements for successful performance
tuning• Pablo toolkit components - how we got here• Autopilot
– Basic concepts– Component interactions– Fuzzy Logic decision infrastructure
• Pablo-provided monitor/control programs– Autodriver– Virtue
• Case study of Parallel Rocket Simulation Code• Current Work
Pablo Research Group - Department of Computer Science - UIUC
Requirements for Successful Performance Tuning in a Distributed Environment:
• top to bottom and end to end real-time performance data capture
• “appropriate” performance data detail and granularity… just enough but not too much!
• tools to help correlate and interpret captured data
• dynamic policy selection in response to current resource availability and application demands
Pablo Research Group - Department of Computer Science - UIUC
Pablo Toolkit Components:
a Decade of Performance
Monitoring and Analysis Tools
Pablo Research Group - Department of Computer Science - UIUC
Pablo Trace Library and Extensions
• Libraries linked with application to trace “generic events” and also loops, message passing, procedure calls, Unix I/O, MPI I/O, HDF routines
• Standard function names (e.g. read) replaced with tracing version (e.g. traceREAD) by preprocessor for C codes. For Fortran, calls bracketed by traceReadBegin / traceReadEnd manually
• Timestamped event data written to buffer and flushed periodically to per-processor files
Pablo Research Group - Department of Computer Science - UIUC
Pablo I/O, MPI I/O, HDF Analysis
• Produce reports from I/O event data• Sample MPI-IO summary report shown:
Pablo Research Group - Department of Computer Science - UIUC
Pablo Self-Defining Data Format
• A performance data metaformat that specifies both data record structures and data record instances
• Unlimited set of event types supported depending on the “interesting” performance data
• SDDF library provides classes to read and write files in SDDF format
• General-purpose tools can be written using the library and the Record/Field names in the SDDF files
Pablo Research Group - Department of Computer Science - UIUC
Sample SDDF File showing Data Structure and Data
Instance
Pablo Research Group - Department of Computer Science - UIUC
SDDFA
#337:// "description" "IO Read""Read" {
// "Seconds" "Timestamp"double "Seconds";// "Event ID" "Corresponding event"// "700009" "read"// "700011" "fread"int "Event Identifier";// "Node" "Processor number"int "Processor Number";// "Duration" "Event duration in seconds"double "Duration";// "File ID" "Unique file identifier"int "File ID";// "Number Bytes" "Number of bytes read"int "Number Bytes";
};;
"Read" { 0.019991, 700011, 0, 0.000203, 3, 3 };;
SDDFStatistics Analysis Program for SDDF Files
Pablo Research Group - Department of Computer Science - UIUC
SvPablo• A graphical source code browser and
performance capture/correlation tool
• Allows user to select loops and procedures to instrument in C, F77, F90 code. Automatic instrumentation for HPF via PGI performance interface.
• Collects performance data and later displays it relative to source code line
• Option for real-time data transmission via Autopilot tagged sensors (more later)
Pablo Research Group - Department of Computer Science - UIUC
SvPablo GUI
Pablo Research Group - Department of Computer Science - UIUC
Virtue • A collaborative virtual environment for direct
software manipulation– Hierarchical graph representations that show
software structure, dynamics, and performance
– Manipulation tools for augmented interactions with the virtual environment
– Annotation tools for distributed, collaborative exploration and recording
• Uses OpenGL and EVL CAVE library for 3-d effects in CAVE, ImmersaDesk, and desktop environments
Pablo Research Group - Department of Computer Science - UIUC
Autopilot :
Performance Tuning in Distributed Computing
Environments
Pablo Research Group - Department of Computer Science - UIUC
Autopilot Toolkit• Provides a framework for the capture and analysis
of real-time application and infrastructure data in a multi-threaded distributed environment
• Offers the ability to control volume of performance data through – selective registration and property matching – analysis and data reduction at point of collection – constant, periodic, or on-demand transmission of data– ability to dynamically enable/disable data collection
• Includes a control interface to allow steering of infrastructure policies and applications, either interactively or via automated decision procedures
Pablo Research Group - Department of Computer Science - UIUC
Basic Autopilot Concepts
• Sensors: provide data to remote processes, allowing real-time monitoring– intrinsic (procedural - push)
– extrinsic (threaded - push)
– transfer data when requested by remote process (pull)
• Sensor Attached Functions: transform sensed data via user-defined functions before it is recorded by the sensor, providing an important data-reduction technique
Pablo Research Group - Department of Computer Science - UIUC
Basic Autopilot Concepts• Actuators: provide remote processes the ability
to invoke local functions or update data, allowing remote steering – synchronous (application controls when updates are
made; requests may be held in pending buffer)
– asynchronous (updates are made when request received from external agent)
• Properties: key-value pairs that are associated with and used to identify a sensor or actuator, allowing remote processes to be selective about the sensors and actuators they connect to
Pablo Research Group - Department of Computer Science - UIUC
Basic Autopilot Concepts
• Sensor Client: a process that connects to one or more sensors with matching properties and receives data from those sensors
• Actuator Client: a process that connects to one or more actuators with matching properties and sends data to those actuators, causing application variables controlled by the actuators to be updated or functions to be invoked
Pablo Research Group - Department of Computer Science - UIUC
Basic Autopilot Concepts
• Autopilot Manager: a daemon process that is responsible for handling registration requests from sensors and actuators, and matching sensor client and actuator client requests to registered sensors and actuators.
* AutopilotManager daemons may be run on multiple hosts throughout the computational grid, allowing sensors, actuators, and clients to tailor data transfer volumes to appropriate levels for local and distant tasks.
Pablo Research Group - Department of Computer Science - UIUC
Tagged Sensors, Actuators, Clients
• Information about the structure of the data is forwarded when a client first connects to a matching sensor or actuator, allowing the client to perform verification checks and ignore unwanted data.
• Tagged data sets map naturally into what we normally think of as event trace records.
• Sometimes called “SDDF-enabled” because the buffer contents can easily be translated to SDDF
Pablo Research Group - Department of Computer Science - UIUC
• Autopilot uses the Nexus component of the Globus toolkit (http://www-globus.org) to provide...– communication substrate & multithreading capabilities
• Nexus creates a global address space that encompasses all processes executing on a distributed network
• Nexus Remote Service Requests (RSRs) used by Autopilot classes to transmit messages, insuring optimal underlying transfer protocol
• Nexus multi-threaded handlers used by Autopilot classes to process RSRs
• Most Nexus details hidden by Autopilot classes
Autopilot and Nexus/Globus
Pablo Research Group - Department of Computer Science - UIUC
Autopilot Component Interactions
3. global pointers returned for
matches
Autopilot Manager
Monitor/ControlTask
1. sensors and actuators
register with
their properties
2. clients request matching sensors
and actuators
4. sensor and actuator
controls and actuator data
5. sensor data
InstrumentedTask
Pablo Research Group - Department of Computer Science - UIUC
Instrumented Tasks
InstrumentedTask
•May contain multiple sensors and/or actuators
•Many instrumented tasks may be active at any given time
•May register sensors and actuators with multiple Autopilot Managers running on different hosts
May be application code or infrastructure resource monitor (lmon)
•
Pablo Research Group - Department of Computer Science - UIUC
Monitor/Control Tasks•May contain multiple sensor clients
and/or actuator clients
•Many monitor/control tasks may be active at any given time
•May query multiple Autopilot Managers running on different hosts
•May implement “human in the loop” (Autodriver, Virtue) or automated fuzzy logic decision server (PPFS II)
•May be monitor only,writing collected data to a file or displaying it
Monitor/ControlTask
Pablo Research Group - Department of Computer Science - UIUC
Fuzzy Logic Decision Infrastructure
Pablo Research Group - Department of Computer Science - UIUC
Knowledge Repository
Fuzzy Logic Rule Base
Fuzzy LogicDecision Process
Fuz
zifi
er
Def
uzzi
fier
Inpu
ts
Out
puts
SystemSensors ActuatorsActuatorsSensors
Instrumented Task(s)
Monitor/Control Task(s)
Sample Fuzzy Logic Rule Base for Temperature Control
Pablo Research Group - Department of Computer Science - UIUC
rulebase FurnaceRules;
// decide what to do based on roomtemp which falls into 3 rangesvar roomtemp(0,100) { set trapez cold ( 0, 50, 0, 20 ); set trapez medium( 50, 70, 10, 10 ); set trapez hot ( 80, 100, 20, 0 ); };
roomtemptruthvalues
0
1
0 10
20
30
40
50
60
70
80
90
100
coldmediumhot
Sample Fuzzy Logic Rule Base for Temperature Control
(continued)
Pablo Research Group - Department of Computer Science - UIUC
// control the furnace value in a range of 0-1, with 0 = offvar furnace(0,1) { set triangle off ( 0, 0, 0.1 ); set triangle half( 0.5, 0.1, 0.1 ); set triangle full( 1, 0.1, 0 ); };
// the rulesif ( roomtemp == cold ) { furnace = full; } if ( roomtemp == medium ) { furnace = half; }if ( roomtemp == hot ) { furnace = off; }
Fuzzy Logic Decision Infrastructure
Pablo Research Group - Department of Computer Science - UIUC
• Autopilot sensors provide a stream of room temperature readings. After fuzzification, this stream defines the value of the roomtemp fuzzy variable.
• Rules whose conditions are non-zero all contribute to determining the value of the output fuzzy variable furnace. After defuzzification, the value of furnace defines the action taken by the Autopilot actuator.
• Fuzzy logic handles noisy data and conflicting goals.• Fuzzy logic separates data sets (definition of fuzzy variables)
and rules (assertions and consequents) allowing each to be independently adjusted for a particular computing environment without re-coding the decision procedure.
Autodriver Monitor and Control Architecture
Pablo Research Group - Department of Computer Science - UIUC
Autopilot Manager
Autodriver -Autopilot
Adapter Task
InstrumentedTask
Java Remote Method Invocation
Unix
AutodriverJava GUI
Autodriver Startup
Pablo Research Group - Department of Computer Science - UIUC
•User specifies hosts for Autopilot Manager and, if remote, Adapter
•Main window displays currently registered sensors and actuators
•User selects sensors and/or actuators they are interested in
Autodriver Field Selection
Pablo Research Group - Department of Computer Science - UIUC
• When a tagged sensor is selected, a new window showing the list of fields in that sensor are displayed
• The user selects the field(s) they want to view
Autodriver Numeric Display
Pablo Research Group - Department of Computer Science - UIUC
• Data can be displayed as numeric values
• The user can choose to save the data values to a file for later analysis
Autodriver Plot Display
Pablo Research Group - Department of Computer Science - UIUC
• Using ptplot package from Berkeley, values can be plotted as connected or unconnected points
• Multiple fields can be plotted to a single window
•User can control number of points to display in window and zoom in on area of graph
Autodriver Actuator Interaction
Pablo Research Group - Department of Computer Science - UIUC
• User may enter value for selected actuator and transmit it to the remote process
• Interface may be customized for non-numeric data entry such as pull-down menu choice of LRU or MRU for actuator controlling cache replacement policy
Virtue Monitor and Control Architecture
Pablo Research Group - Department of Computer Science - UIUC
Autopilot Manager
Virtue
InstrumentedTask
Tagged Sensordata
Actuatorcontrols
Virtue Display and Control
Pablo Research Group - Department of Computer Science - UIUC
• Each sphere in the ring represents a workstation
• lmon collects processor utilization data and makes it available via sensors
• Virtue maps the data to the display
• Data transmission frequency can be adjusted via slider connected to lmon actuator
Case study: Rocket Simulation Code
Pablo Research Group - Department of Computer Science - UIUC
• Code developed by DOE ASCI Center for Simulation of Advanced Rockets (CSAR) at UIUC
• 40,000 lines of Fortran, MPI for communication between processes, runs on SGI Origin
• 200+ hours on 128 PEs to simulate 1/2 second of burn
• Ultimately want to model 2 minutes for complete booster burn-off
Init
Fluids Code(10 fluid iterations)
Interpolation
Solids CodeDo 3:1
Multigrid Solution for Each of the Meshes
Convergence Test n
Output
Y* Check Against a Residual* Best Case, Converge on First Try
* Saves * Saves DateDate* Advances * Advances Time StepTime Step
* 3 for coarse grain mesh; 1 for fine grain
* * Could Modify Iterations with Actuator
Lmons on systems across
the country
Lmons on systems across
the country
Execution Environment
Pablo Research Group - Department of Computer Science - UIUC
Autopilot Manager
Virtue
CSAR code instrumented via SvPablo
lmon gatheringnetwork data
Running on SGI Origin at NCSA
Running on SGI Octaneand Immersadeskin Pablo group
Running on SPARCin Pablo group
Running on systems aroundthe country
Wide-area Network Performance Data
Pablo Research Group - Department of Computer Science - UIUC
• Network latency statistics gathered via modified traceroute and made available via Autopilot sensors
• Edge color represents latency -- warm colors for high latency
• Cutting plane shows max value of intersected edges
Time Tunnel in Display Hierarchy
Pablo Research Group - Department of Computer Science - UIUC
•Time tunnel is second level in Virtue display hierarchy, showing application behavior on a single parallel system
•Notice long delays for some MPI allreduce calls (shown in white)
Application Phases and Communication Patterns
Pablo Research Group - Department of Computer Science - UIUC
View from “inside” Time Tunnel
Pablo Research Group - Department of Computer Science - UIUC
•User can fly around within the virtual environment to get different views
•MPI profiling wrappers provide MPI call information via Autopilot Sensors
•SvPablo provides code region information via Autopilot Sensors
Call Graph in Display Hierarchy
Pablo Research Group - Department of Computer Science - UIUC
•For each processor in the time tunnel, you can “drill-down” to the procedure call graph
•SvPablo provides call graph layout and dynamic updates via Autopilot sensors
Call Graph Close-Up View
Pablo Research Group - Department of Computer Science - UIUC
•Color mapped to inclusive procedure execution time
•Size mapped to number of times procedure called
•Magic lens exposes the procedure names
Source Code Text Billboard
Pablo Research Group - Department of Computer Science - UIUC
•The user can select a procedure in the call graph display and “drill-down” to the final level, which is the source code for the procedure
Current Efforts
• SvPablo: version with output via Autopilot sensors generally available
• Virtue: new displays and controls for interacting with Autopilot sensors and actuators
• Autodriver: integrated event definition, recognition, adaptation, and notification
• Trace Library and Extensions: rework to use Autopilot as infrastructure, providing “automatic” instrumentation of I/O, MPI I/O, and HDF calls with corresponding well-defined sensor data structures
Pablo Research Group - Department of Computer Science - UIUC
Current Efforts
• Integrate sensors and actuators into Globus infrastructure
• Provide translators from – (appropriate) tagged sensor data to NetLogger
format
– Netlogger format to SDDF
– SDDF to XML
– XML to SDDF
• Continue to explore analysis, visualization, and control techniques in dynamic, distributed environments
Pablo Research Group - Department of Computer Science - UIUC
Pablo Research Group - Department of Computer Science - UIUC
• Randy Ribler*• Huseyin Simitci• Jim Oly• Nancy Tran• Guoyi Wang• Don Schmidt• Jeff Vetter*• Luiz DeRose* • Ying Zhang• Mario Pantano*
• Eric Shaffer• Shannon Whitmore• Ben Schaeffer• Dan Wells• Deb Israel• and lots of others
who have been part of the Pablo group over the years
* postdocs previously with the Pablo group
• Professor Dan Reed, Pablo Project Director
Pablo Group Participants