U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for...
-
Upload
brian-roley -
Category
Documents
-
view
212 -
download
0
Transcript of U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for...
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Memory Management for High-Performance
ApplicationsEmery Berger
University of Massachusetts, Amherst
2UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
High-Performance Applications
Web servers, search engines, scientific codes
C or C++ (still…) Run on one or
cluster of server boxes
Raid drive
cpucpucpucpu
RAM
Raid drive
cpucpucpucpu
RAM
RAID drive
cpucpucpucpu
RAM
software
compiler
runtime system
operating system
hardware
Needs support at every level
3UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
New Applications,Old Memory Managers
Applications and hardware have changed Multiprocessors now commonplace Object-oriented, multithreaded Increased pressure on memory manager
(malloc, free)
But memory managers have not kept up Inadequate support for modern
applications
4UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Current Memory ManagersLimit Scalability
Runtime Performance
01234567
89
1011121314
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Number of Processors
Sp
eed
up
Ideal
Actual
As we add processors, program slows down
Caused by heap contention
Larson server benchmark on 14-processor Sun
5UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
The Problem
Current memory managersinadequate for high-performance applications on modern architectures Limit scalability, application
performance, and robustness
6UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
This Talk
Building memory managers Heap Layers framework [PLDI 2001]
Problems with current memory managers Contention, false sharing, space
Solution: provably scalable memory manager Hoard [ASPLOS-IX]
Extended memory manager for servers Reap [OOPSLA 2002]
7UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Implementing Memory Managers
Memory managers must be Space efficient Very fast
Heavily-optimized code Hand-unrolled loops Macros Monolithic functions
Hard to write, reuse, or extend
8UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Building Modular Memory Managers
Classes Overhead Rigid hierarchy
Mixins No overhead Flexible hierarchy
9UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
A Heap Layer
template <class SuperHeap>class GreenHeapLayer :
public SuperHeap {…};
GreenHeapLayer
RedHeapLayer
Mixin with malloc & free methods
10UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
LockedHeap
mallocHeapmallocHeap
Example:Thread-Safe Heap Layer
LockedHeap protect the superheap with a lock
LockedMallocHeap
mallocHeap
LockedHeap
11UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Empirical Results
Heap Layers vs. originals: KingsleyHeap
vs. BSD allocator
LeaHeapvs. DLmalloc 2.7
Competitive runtime and memory efficiency
Runtime (normalized to Lea allocator)
0
0.25
0.5
0.75
1
1.25
1.5
cfrac espresso lindsay LRUsim perl roboop Average
BenchmarkN
orm
alized
Ru
nti
me
Kingsley KingsleyHeap Lea LeaHeap
Space (normalized to Lea allocator)
0
0.5
1
1.5
2
2.5
cfrac espresso lindsay LRUsim perl roboop Average
Benchmark
No
rmali
zed
Sp
ace
Kingsley KingsleyHeap Lea LeaHeap
12UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Overview
Building memory managers Heap Layers framework
Problems with memory managers Contention, space, false sharing
Solution: provably scalable allocator Hoard
Extended memory manager for servers Reap
13UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Problems with General-Purpose Memory Managers
Previous work for multiprocessors Concurrent single heap [Bigler et al. 85, Johnson 91,
Iyengar 92] Impractical
Multiple heaps [Larson 98, Gloger 99]
Reduce contention but cause other problems: P-fold or even unbounded increase in space Allocator-induced false sharing
we show
14UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Multiple Heap Allocator:Pure Private Heaps
One heap per processor: malloc gets memory
from its local heap free puts memory
on its local heap
STL, Cilk, ad hoc
x1= malloc(1)
free(x1) free(x2)
x3= malloc(1)
x2= malloc(1)
x4= malloc(1)
processor 0 processor 1
= in use, processor 0
= free, on heap 1
free(x3) free(x4)
Key:
15UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Problem:Unbounded Memory Consumption
Producer-consumer: Processor 0 allocates Processor 1 frees
Unbounded memory consumption Crash!
free(x1)
x2= malloc(1)
free(x2)
x1= malloc(1)processor 0 processor 1
x3= malloc(1)
free(x3)
16UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Multiple Heap Allocator:Private Heaps with Ownership
free returns memory to original heap
Bounded memory consumption No crash!
“Ptmalloc” (Linux),LKmalloc
x1= malloc(1)
free(x1)
free(x2)
x2= malloc(1)
processor 0 processor 1
17UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Problem:P-fold Memory Blowup
Occurs in practice Round-robin producer-
consumer processor i mod P
allocates processor (i+1) mod P
frees
Footprint = 1 (2GB),but space = 3 (6GB) Exceeds 32-bit address
space: Crash!
free(x2)
free(x1)
free(x3)
x1= malloc(1)
x2= malloc(1)
x3=malloc(1)
processor 0 processor 1 processor 2
18UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Problem:Allocator-Induced False Sharing
False sharing Non-shared objects
on same cache line Bane of parallel
applications Extensively studied
All these allocatorscause false sharing!
CPU 0 CPU 1
cache cache
bus
processor 0 processor 1x2= malloc(1)x1= malloc(1)
cache line
thrash… thrash…
19UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
So What Do We Do Now? Where do we put free memory?
on central heap: on our own heap:
(pure private heaps) on the original heap:
(private heaps with ownership)
How do we avoid false sharing?
Heap contention Unbounded
memory consumption
P-fold blowup
20UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Overview
Building memory managers Heap Layers framework
Problems with memory managers Contention, space, false sharing
Solution: provably scalable allocator Hoard
Extended memory manager for servers Reap
21UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Hoard: Key Insights Bound local memory consumption
Explicitly track utilization Move free memory to a global
heap Provably bounds memory
consumption
Manage memory in large chunks Avoids false sharing Reduces heap contention
22UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Overview of Hoard
Manage memory in heap blocks Page-sized Avoids false sharing
Allocate from local heap block Avoids heap contention
Low utilization
Move heap block to global heap Avoids space blowup
global heap
…
processor 0 processor P-1
23UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Summary of Analytical Results
Space consumption: near optimal worst-case
Hoard: O(n log M/m + P) {P « n} Optimal: O(n log M/m)
[Robson 70]: ≈ bin-packing
Private heaps with ownership:O(P n log M/m)
Provably low synchronization
n = memory requiredM = biggest object sizem = smallest object sizeP = processors
24UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Empirical Results Measure runtime on 14-processor Sun
Allocators Solaris (system allocator) Ptmalloc (GNU libc) mtmalloc (Sun’s “MT-hot” allocator)
Micro-benchmarks Threadtest: no sharing Larson: sharing (server-style) Cache-scratch: mostly reads & writes
(tests for false sharing) Real application experience similar
25UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Runtime Performance: threadtest
speedup(x,P) = runtime(Solaris allocator, one processor) / runtime(x on P processors)
Many threads,no sharing
Hoard achieves linear speedup
26UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Runtime Performance: Larson
Many threads,sharing(server-style)
Hoard achieves linear speedup
27UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Runtime Performance:false sharing
Many threads,mostly reads & writes of heap data
Hoard achieves linear speedup
28UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Hoard in the “Real World” Open source code
www.hoard.org 13,000 downloads Solaris, Linux, Windows, IRIX, …
Widely used in industry AOL, British Telecom, Novell, Philips Reports: 2x-10x, “impressive” improvement in
performance Search server, telecom billing systems, scene
rendering,real-time messaging middleware, text-to-speech engine, telephony, JVM
Scalable general-purpose memory manager
29UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Overview
Building memory managers Heap Layers framework
Problems with memory managers Contention, space, false sharing
Solution: provably scalable allocator Hoard
Extended memory manager for servers Reap
30UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Custom Memory Allocation
Programmers often replace malloc/free Attempt to increase performance Provide extra functionality (e.g., for
servers) Reduce space (rarely)
Empirical study of custom allocators Lea allocator often as fast or faster Custom allocation ineffective,
except for regions. [OOPSLA 2002]
31UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Overview of Regions
+ Fast+ Pointer-bumping
allocation+ Deletion of chunks
+ Convenient+ One call frees all memory
regionmalloc(r, sz)regiondelete(r)
Separate areas, deletion only en masse
regioncreate(r) r
- Risky- Accidental
deletion- Too much
space
32UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Why Regions?
Apparently faster, more space-efficient
Servers need memory management support: Avoid resource leaks
Tear down memory associated with terminated connections or transactions
Current approach (e.g., Apache): regions
33UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Drawbacks of Regions
Can’t reclaim memory within regions Problem for long-running computations,
producer-consumer patterns,off-the-shelf “malloc/free” programs
unbounded memory consumption
Current situation for Apache: vulnerable to denial-of-service limits runtime of connections limits module programming
34UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Reap = region + heap Adds individual object deletion & heap
Reap Hybrid Allocator
reapmalloc(r, sz)
reapdelete(r)
reapcreate(r)r
reapfree(r,p)
Can reduce memory consumption Fast
Adapts to use (region or heap style) Cheap deletion
35UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Using Reap as Regions
Runtime - Region-Based Benchmarks
0
0.5
1
1.5
2
2.5
lcc mudlle
No
rma
lize
d R
un
tim
e
Original Win32 DLmalloc WinHeap Vmalloc Reap
4.08
Reap performance nearly matches regions
36UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Reap: Best of Both Worlds
Combining new/delete with regionsusually impossible:
Incompatible API’s Hard to rewrite code
Use Reap: Incorporate new/delete code into Apache “mod_bc” (arbitrary-precision calculator)
Changed 20 lines (out of 8000) Benchmark: compute 1000th prime
With Reap: 240K Without Reap: 7.4MB
37UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Open Questions
Grand Unified Memory Manager? Hoard + Reap Integration with garbage collection
Effective Custom Allocators? Exploit sizes, lifetimes, locality and
sharing
Challenges of newer architectures NUMA, SMT/CMP, 64-bit, predication
38UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Current Work: Robust Performance
Currently: no VM-GC communicaton BAD interactions under memory
pressure Our approach (with Eliot Moss, Scott
Kaplan):Cooperative Robust Automatic Memory Management
Garbage collector
/ allocator
Virtual memory manager
LRU queuememory pressure
empty pages
reduced impact
39UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Current Work: Predictable VMM
Recent work on scheduling for QoS E.g., proportional-share Under memory pressure, VMM is
scheduler Paged-out processes may never recover Intermittent processes may wait long time
Scheduler-faithful virtual memory(with Scott Kaplan, Prashant Shenoy) Based on page value rather than order
40UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
ConclusionMemory management for high-performance
applications Heap Layers framework [PLDI 2001]
Reusable components, no runtime cost Hoard scalable memory manager [ASPLOS-IX]
High-performance, provably scalable & space-efficient Reap hybrid memory manager [OOPSLA 2002]
Provides speed & robustness for server applications
Current work: robust memory management for multiprogramming
41UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
The Obligatory URL Slide
http://www.cs.umass.edu/~emery
42UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
If You Can Read This,I Went Too Far
43UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Hoard: Under the Hood
MallocOrFreeHeap
PerProcessorHeap
SelectSizeHeap
LockedHeap
HeapBlockManager
LockedHeap
SuperblockHeap
LockedHeap
HeapBlockManager
SystemHeap
Largeobjects(> 4K)
FreeToHeapBlock
EmptyHeap Blocks
LockedHeap
HeapBlockManager
select heap based on size
malloc from local heap, free to heap block
get or return memory to global heap
44UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Custom Memory Allocation
Very common practice Apache, gcc, lcc,
STL, database servers…
Language-level support in C++
Replace new/delete,bypassing general-purpose allocator Reduce runtime – often Expand functionality –
sometimes Reduce space – rarely
“Use custom allocators”
45UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Drawbacks of Custom Allocators
Avoiding memory manager means: More code to maintain & debug Can’t use memory debuggers Not modular or robust:
Mix memory from customand general-purpose allocators → crash!
Increased burden on programmers
46UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Overview
Introduction Perceived benefits and drawbacks Three main kinds of custom allocators Comparison with general-purpose
allocators Advantages and drawbacks of regions Reaps – generalization of regions &
heaps
47UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Class1 free list
(1) Per-Class Allocators
a
b
c
a = new Class1;b = new Class1;c = new Class1;delete a;delete b;delete c;a = new Class1;b = new Class1;c = new Class1;
Recycle freed objects from a free list
+ Fast+ Linked list operations
+ Simple+ Identical semantics+ C++ language
support- Possibly space-
inefficient
48UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
(II) Custom Patterns
Tailor-made to fit allocation patterns Example: 197.parser (natural
language parser)char[MEMORY_LIMIT]
a = xalloc(8);b = xalloc(16);c = xalloc(8);xfree(b);xfree(c);d = xalloc(8);
a b cd
end_of_arrayend_of_arrayend_of_arrayend_of_arrayend_of_arrayend_of_array
+ Fast+ Pointer-bumping allocation
- Brittle- Fixed memory size- Requires stack-like
lifetimes
49UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
(III) Regions
+ Fast+ Pointer-bumping
allocation+ Deletion of chunks
+ Convenient+ One call frees all memory
regionmalloc(r, sz)regiondelete(r)
Separate areas, deletion only en masse
regioncreate(r) r
- Risky- Accidental
deletion- Too much
space
50UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Overview
Introduction Perceived benefits and drawbacks Three main kinds of custom allocators Comparison with general-purpose
allocators Advantages and drawbacks of regions Reaps – generalization of regions &
heaps
51UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Custom Allocators Are Faster…
Runtime - Custom Allocator Benchmarks
0
0.25
0.5
0.75
1
1.25
1.5
1.75
197.
pars
er
boxe
d-sim
c-br
eeze
175.
vpr
176.
gcc
apac
helcc
mud
lle
Non-re
gions
Regio
ns
Ove
rall
No
rma
lize
d R
un
tim
e
Custom Win32
non-regions regions averages
52UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Not So Fast…
Runtime - Custom Allocator Benchmarks
0
0.25
0.5
0.75
1
1.25
1.5
1.75
197.
pars
er
boxe
d-sim
c-br
eeze
175.
vpr
176.
gcc
apac
he lcc
mud
lle
Non-re
gions
Regio
ns
Overa
ll
No
rma
lize
d R
un
tim
e
Custom Win32 DLmalloc
non-regions regions averages
53UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
The Lea Allocator (DLmalloc 2.7.0)
Optimized for common allocation patterns Per-size quicklists ≈ per-class
allocation Deferred coalescing
(combining adjacent free objects) Highly-optimized fastpath Space-efficient
54UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Space Consumption Results
Space - Custom Allocator Benchmarks
00.250.5
0.751
1.251.5
1.75
No
rmal
ized
Sp
ace
Original DLmalloc
regionsnon-regions averages
55UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Overview
Introduction Perceived benefits and drawbacks Three main kinds of custom allocators Comparison with general-purpose
allocators Advantages and drawbacks of regions Reaps – generalization of regions &
heaps
56UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Why Regions?
Apparently faster, more space-efficient
Servers need memory management support: Avoid resource leaks
Tear down memory associated with terminated connections or transactions
Current approach (e.g., Apache): regions
57UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Drawbacks of Regions
Can’t reclaim memory within regions Problem for long-running computations,
producer-consumer patterns,off-the-shelf “malloc/free” programs
unbounded memory consumption
Current situation for Apache: vulnerable to denial-of-service limits runtime of connections limits module programming
58UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Reap = region + heap Adds individual object deletion & heap
Reap Hybrid Allocator
reapmalloc(r, sz)
reapdelete(r)
reapcreate(r)r
reapfree(r,p)
Can reduce memory consumption+ Fast
+ Adapts to use (region or heap style)+ Cheap deletion
59UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Using Reap as Regions
Runtime - Region-Based Benchmarks
0
0.5
1
1.5
2
2.5
lcc mudlle
No
rma
lize
d R
un
tim
e
Original Win32 DLmalloc WinHeap Vmalloc Reap
4.08
Reap performance nearly matches regions
60UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Reap: Best of Both Worlds
Combining new/delete with regionsusually impossible:
Incompatible API’s Hard to rewrite code
Use Reap: Incorporate new/delete code into Apache “mod_bc” (arbitrary-precision calculator)
Changed 20 lines (out of 8000) Benchmark: compute 1000th prime
With Reap: 240K Without Reap: 7.4MB
61UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Conclusion
Empirical study of custom allocators Lea allocator often as fast or faster Custom allocation ineffective,
except for regions Reaps:
Nearly matches region performancewithout other drawbacks
Take-home message: Stop using custom memory allocators!
62UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Software
http://www.cs.umass.edu/~emery
(part of Heap Layers distribution)