Detecting and Solving Memory Problems in Linux Clusters ...Detecting and Solving Memory Problems in...

21
Detecting and Solving Memory Problems in Linux Clusters Chris Gottbrath Product Manager

Transcript of Detecting and Solving Memory Problems in Linux Clusters ...Detecting and Solving Memory Problems in...

Page 1: Detecting and Solving Memory Problems in Linux Clusters ...Detecting and Solving Memory Problems in Linux Clusters Chris Gottbrath Product Manager. ... • The Cluster Environment

Detecting and Solving Memory Problems in Linux Clusters

Chris GottbrathProduct Manager

Page 2: Detecting and Solving Memory Problems in Linux Clusters ...Detecting and Solving Memory Problems in Linux Clusters Chris Gottbrath Product Manager. ... • The Cluster Environment

What is a Memory Bug?

• A Memory Bug is a mistake in the management of heap memory

• Failure to check for error conditions

• Relying on non standard behaviour

• Leaking: Failure to free memory

• Dangling references: Failure to clear pointers

• Memory Corruption: Writing to memory not owned / Over running array bounds

Page 3: Detecting and Solving Memory Problems in Linux Clusters ...Detecting and Solving Memory Problems in Linux Clusters Chris Gottbrath Product Manager. ... • The Cluster Environment

Heap Memory

• Heap is managed by the program– C: Malloc() and free()– C++: New and Delete– Fortran90: Allocatable arrays

• Malloc usage is something like:

Page 4: Detecting and Solving Memory Problems in Linux Clusters ...Detecting and Solving Memory Problems in Linux Clusters Chris Gottbrath Product Manager. ... • The Cluster Environment

ptr Heap Block

Normal Allocation

ptr Heap Block

Correct Behavior

ptr Heap Block

Leaked Block

What is a Memory Leak?

ptr Leaked Block

Page 5: Detecting and Solving Memory Problems in Linux Clusters ...Detecting and Solving Memory Problems in Linux Clusters Chris Gottbrath Product Manager. ... • The Cluster Environment

ptr Heap Block

Normal Allocation

ptr

Heap Block

Dangling Pointer

What is a Dangling Pointer

ptr Heap Block

Correct Behavior

ptr

ptr

Heap Block

Heap Block UnrelatedHeap Block

Page 6: Detecting and Solving Memory Problems in Linux Clusters ...Detecting and Solving Memory Problems in Linux Clusters Chris Gottbrath Product Manager. ... • The Cluster Environment

Memory Problems in Clusters

• Moving an application to a cluster increases the problem complexity– Distributed algorithms are more complex– Application data set size may push available memory even

when everything is functioning correctly – Porting to cluster may involve moving to a new

architecture/OS• The Cluster Environment is different

– Many potentially useful memory tools aren't designed for use in a cluster

• May simply fail• May require extreme 'workarounds'

– Report based tools need cluster-aware filtering mechanisms

Page 7: Detecting and Solving Memory Problems in Linux Clusters ...Detecting and Solving Memory Problems in Linux Clusters Chris Gottbrath Product Manager. ... • The Cluster Environment

3/29/06 7

What is MemoryScape?

• What is MemoryScape?

– Streamlined– Lightweight– Intuitive– Collaborative– Memory Debugging

• Features– Shows

• Memory Errors• Memory Status• Memory Leaks• Bounds Violations

– MPI Memory Debugging– Remote Memory Debugging

• Tech– Low Overhead– No Instrumentation

• Interface– Inductive– Collaboration– Multi-process

Page 8: Detecting and Solving Memory Problems in Linux Clusters ...Detecting and Solving Memory Problems in Linux Clusters Chris Gottbrath Product Manager. ... • The Cluster Environment

8

What is TotalView?

• Source Code Debugger– C, C++, Fortran 77,

Fortran90, UPC• Complex Language

Features– Wide Compiler and Platform

Support– Multi-Threaded Debugging– Parallel Debugging

• MPI, PVM, Others– Remote Debugging– Memory Debugging capabilities

• Integrated into the debugger– Powerful and Easy GUI

• Visualization– CLI for Scripting

Page 9: Detecting and Solving Memory Problems in Linux Clusters ...Detecting and Solving Memory Problems in Linux Clusters Chris Gottbrath Product Manager. ... • The Cluster Environment

9

Architecture for Cluster Debugging

• Cluster Architecture– Single Front End (TotalView)

• GUI and debug engine– Debugger Agents (tvdsvr)

• Low overhead, 1 per node• Traces multiple rank processes

– TotalView communicates directly with tvdsvrs• Not using MPI• Optimized Protocol

• Provides: Robust, Scalable, Minimal Interaction

Interface Node

Compute Nodes

………

Compute Nodes

TotalView starts a set of Lightweight debugger servers

Interface Node

Page 10: Detecting and Solving Memory Problems in Linux Clusters ...Detecting and Solving Memory Problems in Linux Clusters Chris Gottbrath Product Manager. ... • The Cluster Environment

Memory Status

• Multiple Reports – Memory Statistics– Interactive Graphical Display– Source Code Display– Backtrace Display

• Allow the user to– Understand Program

Memory Usage Behavior– Discover Allocation Layout– Look for Inefficient Allocation– Look for Memory Leaks

Page 11: Detecting and Solving Memory Problems in Linux Clusters ...Detecting and Solving Memory Problems in Linux Clusters Chris Gottbrath Product Manager. ... • The Cluster Environment

Leak Detection

• MemoryScape Leak Detection– Based on Conservative

Garbage Collection– Can be performed at

any point in runtime• Helps localize leaks in

time– Multiple Reports

• Backtrace Report• Source Code

Structure• Graphically Memory

Location

Page 12: Detecting and Solving Memory Problems in Linux Clusters ...Detecting and Solving Memory Problems in Linux Clusters Chris Gottbrath Product Manager. ... • The Cluster Environment

Array Bounds Violations

• Heap Guard Blocks– Before and/or After – All Allocations or

just a few – Variable Size– Check at Any Time– Reports

• By Memory Address

• Only Corrupted

Page 13: Detecting and Solving Memory Problems in Linux Clusters ...Detecting and Solving Memory Problems in Linux Clusters Chris Gottbrath Product Manager. ... • The Cluster Environment

MemoryScape Technology

• Based on TotalView Technologies HIA Tech– Heap Interposition Agent– Also seen in TotalView

• Advantages of HIA Technology– Use it with your existing builds

• No Source Code or Binary Instrumentation– Programs run nearly full speed

• Low performance overhead – Efficient memory usage

• Low memory overhead– Support wide range of platforms and compilers

Page 14: Detecting and Solving Memory Problems in Linux Clusters ...Detecting and Solving Memory Problems in Linux Clusters Chris Gottbrath Product Manager. ... • The Cluster Environment

The Agent and Interposition

Heap InterpositionAgent (HIA)

Malloc API

User Code and Libraries

AllocationTable

Deallocation

Table

Process

MemoryScape

Page 15: Detecting and Solving Memory Problems in Linux Clusters ...Detecting and Solving Memory Problems in Linux Clusters Chris Gottbrath Product Manager. ... • The Cluster Environment

Graphical User Interface

• Inductive Task Based Approach– Walks the

user throughspecific tasks

– Easy to pickup and use

– Sidebar forsecondary tasks

– Homepagelike summaryreport

Page 16: Detecting and Solving Memory Problems in Linux Clusters ...Detecting and Solving Memory Problems in Linux Clusters Chris Gottbrath Product Manager. ... • The Cluster Environment

Script Mode

• MemoryScape Supports Automation– MemoryScape lets users run tests and check programs for

memory leaks without having to be in front of the program– Simple command line program called memscript

• Doesn’t start up the GUI• Can be run from within a script or test harness

– The user defines• What configuration options are active• What things MemoryScape is looking for• What actions MemoryScape should take for each type of event

that may occur

Page 17: Detecting and Solving Memory Problems in Linux Clusters ...Detecting and Solving Memory Problems in Linux Clusters Chris Gottbrath Product Manager. ... • The Cluster Environment

Multi-Process and Multi-Thread

• Memory debug many processes at the same time– MPI– Client-Server– Fork-Exec– Compare

two runs• Remote

applications• Muti-threaded

applications

Page 18: Detecting and Solving Memory Problems in Linux Clusters ...Detecting and Solving Memory Problems in Linux Clusters Chris Gottbrath Product Manager. ... • The Cluster Environment

• The TotalView Technologies Solution: A new approach to debugging for the next wave of HPC development

• Defines five core technologies required to develop the next generation of multi-threaded, multi-process applications

• Comprehensive, integrated software development tools to improve development productivity and quality

8

Page 19: Detecting and Solving Memory Problems in Linux Clusters ...Detecting and Solving Memory Problems in Linux Clusters Chris Gottbrath Product Manager. ... • The Cluster Environment

3/29/06 19

For more info

• General Info– See the TotalView Technologies Website:

www.totalviewtech.com• Documentation

– See the TotalView Technologies Website: www.totalviewtech.com

– Full documentation in HTML and PDF format– Order hard-copy documentation

• Webcasts– See the TotalView Technologies Website:

www.totalviewtech.com • Training

– Onsite MemoryScape Training will be available soon.

• Contact us– Sales: [email protected]– Support: [email protected]

Page 20: Detecting and Solving Memory Problems in Linux Clusters ...Detecting and Solving Memory Problems in Linux Clusters Chris Gottbrath Product Manager. ... • The Cluster Environment

MemoryScape Supports

• Linux– RedHat and SuSE– X86 and x86-64– Power and ia64

• UNIX– Solaris AMD and SPARC– AIX

• Apple– Power and Intel

• GCC• Vendor Compilers

● Sun Studio● Intel C/C++● Intel Fortran● XL C/C++● XL Fortran

• See platforms document on the www.totalviewtech.com site for details

Page 21: Detecting and Solving Memory Problems in Linux Clusters ...Detecting and Solving Memory Problems in Linux Clusters Chris Gottbrath Product Manager. ... • The Cluster Environment

21

TotalView Debugger Supported Compilers, Distros and Architectures

• Platform Support– Linux x86, x86-64, ia64, Power– Mac Power and Intel– Solaris Sparc and AMD64– AIX, Tru64, IRIX– Cray X1, XT3, IBM BGL

• Languages / Compilers– C/C++, Fortran, UPC, Assembly– Many Commercial & Open Source Compilers

• Parallel Environments– MPI (MPICH1 & 2, LAM, Open MPI, poe, MPT, Quadrics,

MVAPICH, & many others )– UPC