Distributed Simulation and Real Time Applications 2009 October 25, 28 - Singapore Benchmarking...
-
Upload
flora-rogers -
Category
Documents
-
view
219 -
download
0
Transcript of Distributed Simulation and Real Time Applications 2009 October 25, 28 - Singapore Benchmarking...
Distributed Simulation and Real Time Applications 2009 October 25, 28 - Singapore
Benchmarking Memory Benchmarking Memory Management Capabilities Management Capabilities
within ROOT-Simwithin ROOT-SimRoberto Vitali, Alessandro Pellegrini, Francesco Quaglia
Benchmarking Memory Management Capabilities within ROOT-Sim2
Motivations for the Work
• We have designed and implemented a fully featured Memory Management Subsystem for optimistic PDES platforms (Di-DyMeLoR [PADS 2009]): Targeted at C-based platforms hosted by CISC
architectures (i386, x86-64) Capable of supporting incremental logging with arbitrary
granularity (based on transparent code instrumentation) Which allows Simulation Objects’ Memory Maps to
dynamically change, via standard malloc/free services
• In this work we provide accurate benchmarking results for assessing the effectiveness of such a subsystem This entails definition and implementation of an adequate
benchmark application
Benchmarking Memory Management Capabilities within ROOT-Sim3
Motivations for the Work (2)
• The significance of this study is in that literature benchmarking results for Memory Management Subsystems with incremental capabilities:
Did not cope with dynamic memory mapsHave only been targeted at RISC systems (no
complex instruction sets coverage)They are about 10 years old (no coverage of current
technological treds)
Benchmarking Memory Management Capabilities within ROOT-Sim4
Objectives
• Show what is the actual overhead - due to memory update tracking mechanisms - added to the execution in a parallel and distributed optimistic simulation environment relying on current technological trends.
• Develop an effective benchmark to assess the performance of dynamic Memory Management Subsystems, since no such benchmark exists in our context.
Benchmarking Memory Management Capabilities within ROOT-Sim5
Work Path
• The most widely known benchmark for PDES Systems is PHOLD
• Traditionally, it has been used to evaluate PDES platforms as a whole (e.g., to evaluate the effects of the selected Synchronization Scheme)
• We have provided extra specifications to PHOLD in order to explicitly cope with the evaluation of memory management capabilities in Optimistic Systems
• The implementation reflects standard libraries’ code in the execution of memory access tasks: e.g., writing in contiguous memory regions is performed by
exploiting string instructions, such as stos.
Benchmarking Memory Management Capabilities within ROOT-Sim6
Reference Operating Architecture
• Rome Optimistic Simulator (ROOT-Sim):Based on ANSI-C/POSIX technology and the MPI
standardTransparent support of housekeeping operations
typical of optimistic simulation environments (e.g., objects mapping and scheduling)
Based on the notion of event handlers and event injection services
Benchmarking Memory Management Capabilities within ROOT-Sim7
Reference Operating Architecture (2)
MemoryManagementSubsystem
Benchmarking Memory Management Capabilities within ROOT-Sim8
Memory Management Subsystem
• Dynamic Memory Logger and Restorer (DyMeLoR):Based on ANSI-C wrapped malloc/free servicesProvides log/restore facilities of dynamic memory based
objects’ states transparently towards the application-level programmer
Supports dynamic memory chunks’ contiguity for a same object
Allocation/deallocation operations are guaranteed to be Piece-Wise-Deterministic
Benchmarking Memory Management Capabilities within ROOT-Sim9
Memory Management Subsystem (2)
• Based on Static Software Instrumentation:Compile-time disassembling and rewriting of the
application level executable generated by standard compilers (e.g., gcc)
Transparent injection of memory-access tracing routines
Disassembling data are cached into compile-time generated tables, to reduce memory-access tracing overhead
Benchmarking Memory Management Capabilities within ROOT-Sim10
Memory Management Subsystem (3)
Statically inserts calls to the update trackingroutine – generates data tables
Traces the execution of those instructionsinvolving a memory update
Keeps track of intra-checkpoint memory updates
Allows faster interception of memory updates
Benchmarking Memory Management Capabilities within ROOT-Sim11
Parser / Modifier
• By parsing the application-level code’s byte stream, it identifies all those instructions involving a memory write
• Preposes to them a call to the update_tracker module
• Extracts all relevant information
• Builds the Disassembling Table
• Corrects all the static and dynamic references.
Benchmarking Memory Management Capabilities within ROOT-Sim12
update_tracker
• Written in Assembly language to optimize performance
• When called, it exploits the Disassembling Table generated at compile time together with the CPU registers to compute the actual memory-write destination address
• Triggers the Memory Map Manager to keep track of the new memory update
Benchmarking Memory Management Capabilities within ROOT-Sim13
Memory Map Manager• For each simulation object a meta data table is
mantained (different entries handle different chunk sizes, in between 32B - 32KB)
• Each entry keeps information about a block of contiguous preallocated memory chunks
• Block structures include status bitmap – to keep track of allocated chunks – and dirty bitmap – to keep track of updated chunks
malloc_area
malloc_area
base_state_address
state_layout_info
statusbitmap
dirtybitmap
chunk
chunk
preallocated blockof contiguous chunks
Benchmarking Memory Management Capabilities within ROOT-Sim14
Memory Map Manager (2)
• When triggered, it matches the area and the chunks involved by the write operation
• Marks involved chunks as dirty, and updates all the relevant meta data
• All those operation concerning areas outside the object’s state (e.g., global variables) are simply discarded
Benchmarking Memory Management Capabilities within ROOT-Sim15
Memory Map Manager (3)• Incremental State Log Operations:
A log operation results in packing the information to be logged into a contiguous memory buffer
A malloc_area, together with its Status Bitmap, is only copied if it was updated since the last log/restore operation
Dirty Bitmaps are only copied if at least one chunk has been updated since the last log/restore operation
Every dirty chunk is also copied into the log bufferPeriodically, a full snapshot of the state is takenLogs are organized into a chain, ordered with respect
to the Logical Simulation Time they were taken at
Benchmarking Memory Management Capabilities within ROOT-Sim16
Memory Map Manager (4)
• Incremental Restore State Operations:When a Restore Operation needs to be executed at
simulation time T, the log chain is backward traversed to determine the most recent log with timestamp less than or equal to T
The Restore Operation is performed with an iterative procedure which scans the logs along the chain
The operation halts whenever the memory map is completely restored (i.e., when a full log in encountered)
Benchmarking Memory Management Capabilities within ROOT-Sim17
The Benchmark• The Benchmark is derived from PHOLD:
Fictious events are executed, involving the advancement of the local simulation clock
Upon event execution, a new event is scheduled, destined to whatever object in the system
Each Simulation Object’s state contains a set of N pointers for accessing N
distinct linked lists of buffers
relying on dynamic memory
allocation
Benchmarking Memory Management Capabilities within ROOT-Sim18
The Benchmark (2)• Different lists keep track of buffers with different sizes, in between
a min and max• Denoting as size(i) the exact size of the buffers inside the i-th list,
at setup time the S bytes of the state are allocated according to the following rule:
bytes are destined for buffer allocation inside each list
buffers are allocated for the i-th list, and linked together
• There is a bias towards the number of buffers associated with smaller sizes, to mimic the common scenario where applications tend to allocate a large number of small buffers
N
S
)(isize
NS
Benchmarking Memory Management Capabilities within ROOT-Sim19
The Benchmark (3)• The benchmark logic provides two events:
BUFFER_ALLOCATE[size]: upon its execution, a new buffer is allocated and linked to the i-th local list, associated with size(i) = size
BUFFER_DEALLOCATE[null]: upon its execution at time t:
• A size value is randomly selected from the pool of size(i) possibilities
• A random buffer in the list associated with size(i)=size (if any) gets released
• A new BUFFER_ALLOCATE[size] event is scheduled for whatever simulation object, at the same time t
• A new BUFFER_DEALLOCATE[null] event is scheduled for the same simulation object, at time t + inc
Benchmarking Memory Management Capabilities within ROOT-Sim20
The Benchmark (4)
• The differentiation in the two types of events implies that we are migrating buffers across the different simulation objects, with exponentially distributed migration rate
• At each simulation time, the total memory used by the simulation objects is constant, thus reflecting the specific space complexity of the simulation model for which the benchmark configuration is the current mimic
Benchmarking Memory Management Capabilities within ROOT-Sim21
The Benchmark (5)
• Read/write accesses into the objects states’ buffers have been associated to the execution of the fictious events
• The benchmark is able to emulate read vs write intensive application:The more write intensive the event, the larger is the
number of chunks updatedThis allows to observe how the costs of memory-write
tracking and log/restore operations scale vs ROOT-Sim implementation
Benchmarking Memory Management Capabilities within ROOT-Sim22
The Benchmark (6)
• An additional parameter (x ≤ S) indicates the total amount of bytes to be read/written: A breadth-first visit on the lists has been adopted:
• When executing an event we randomly select a list to start the visit from
• All the content in the buffer at the head of the list is touched in read/write mode
• Other lists are accessed according to a circular policy, moving to the next buffers on subsequent accesses
• Until exactly x bytes have been touched
• The breadth-first visit mimics a worst case scenario for log/restore facilities offered by ROOT-Sim: write operations are not localized into a few malloc_areas
Benchmarking Memory Management Capabilities within ROOT-Sim23
Measures Performed
• Test Platform: Quad-Core machine equipped with four 2.4GHz/4MB-Cache 64-
bits Intel processors 4 GB of RAM memory One ROOT-Sim simulation kernel per processor
• Four simulation objects (one per core)
• Performed tests require each simulation object to execute at least 10.000 buffer allocations, scattered over 8 different buffer chains with sizes ranging from 32 Bytes to 4KB
• The parameter x has been varied in order to generate read/write operations spanning in between 20%-80% of the whole size of the simulation object
Benchmarking Memory Management Capabilities within ROOT-Sim24
Measures Performed (2)• Event Latency, Checkpoint Latency, Restore Latency
and Memory Usage (per checkpoint) have been measured
• Different interleaving steps between full and incremental logs have been selected, taking full logs every 5 or every 20 log operations.
• Similar measurements have been performed by excluding software instrumentation and related incremental log capabilities:By linking a previous memory map manager, with a
similar structure except for that no memory-write tracking is supported
Benchmarking Memory Management Capabilities within ROOT-Sim25
Experimental Results: Events• The tracking mechanisms used to identify regions
involved in update operations add an overhead to the event execution
• Nevertheless, this overhead is relatively limited, up to 40% spanning of write operations
• When the state increases in size, it gets relatively reduced
10 KB 100 KB 1024 KB
Benchmarking Memory Management Capabilities within ROOT-Sim26
Experimental Results: Log• The event processing overhead of the instrumented
software is moreover counterbalanced by reduced checkpoint latency
• The capability for such a checkpoint has a great relevance in scenarios with applications being not Piece-Wise-Deterministic
10 KB 100 KB 1024 KB
Benchmarking Memory Management Capabilities within ROOT-Sim27
Experimental Results: Restore• The non-instrumented configuration typically
provides gains in state restore operations
• State restore latency directly depends on the interleaving between full logs and incremental logs along the log chain
• The performance decrease can be controlled via proper selection of a non-oversized interleaving step
10 KB 100 KB 1024 KB
Benchmarking Memory Management Capabilities within ROOT-Sim28
Experimental Results: Memory• Memory requirements for each log operation in the
instrumented case are definitely lower than those observed for non-instrumented software
• This further strengths the capabilities of the fully featured incremental version of the software in case of applications with very large memory requirements for the objects’ states
10 KB 100 KB 1024 KB
Benchmarking Memory Management Capabilities within ROOT-Sim29
Summary
• We have developed a synthetic benchmark to assess the memory management capabilities offered by the optimistic parallel simulation environment ROOT-Sim (based on C technology)
• We have focused on incremental log/restore aspects and on software instrumentation techniques
• This has been done to valuate the efficiency and effectiveness of supports to high performance simulation systems, which are important in contexts with, e.g, temporal constraints
• The targeted system is representative of platforms hosted by modern CISC machines
Benchmarking Memory Management Capabilities within ROOT-Sim30
Planned future work
• Evaluation of ROOT-Sim with differentiated application programming patternsUse of large spectrum of simulation models from the
real worldUse of unoptimized vs optimized machine code for
memory read and write operations (tradeoffs between programmer skills vs compiler automatic optimizations)
Benchmarking Memory Management Capabilities within ROOT-Sim31
Thanks!!
Questions?