Reconsidering Custom Memory Allocation
-
Upload
emery-berger -
Category
Business
-
view
4.252 -
download
0
description
Transcript of Reconsidering Custom Memory Allocation
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
Reconsidering
Custom Memory Allocation
Emery Berger, Ben Zorn, Kathryn McKinley
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 2
Custom Memory Allocation
Very common practice
Apache, gcc, lcc, STL,
database servers…
Language-level
support in C++
Widely recommended
Programmers replace
new/delete, bypassing
system allocator
Reduce runtime – often
Expand functionality – sometimes
Reduce space – rarely
“Use custom
allocators”
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 3
Drawbacks of Custom Allocators
Avoiding system allocator:
More code to maintain & debug
Can’t use memory debuggers
Not modular or robust:
Mix memory from custom
and general-purpose allocators → crash!
Increased burden on programmers
Are custom allocators really a win?
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 4
Overview
Introduction
Perceived benefits and drawbacks
Three main kinds of custom allocators
Comparison with general-purpose allocators
Advantages and drawbacks of regions
Reaps – generalization of regions & heaps
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 5
Class1 free list
(1) Per-Class Allocators
a
b
c
a = new Class1;
b = new Class1;
c = new Class1;
delete a;
delete b;
delete c;
a = new Class1;
b = new Class1;
c = new Class1;
Recycle freed objects from a free list
+ Fast+ Linked list operations
+ Simple
+ Identical semantics
+ C++ language support
- Possibly space-inefficient
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 6
(II) Custom Patterns
Tailor-made to fit allocation patterns
Example: 197.parser (natural language parser)
char[MEMORY_LIMIT]
a = xalloc(8);b = xalloc(16);c = xalloc(8);xfree(b);xfree(c);d = xalloc(8);
a b cd
end_of_arrayend_of_arrayend_of_arrayend_of_arrayend_of_arrayend_of_array
+ Fast+ Pointer-bumping allocation
- Brittle
- Fixed memory size
- Requires stack-like lifetimes
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 7
(III) Regions
+ Fast
+ Pointer-bumping allocation
+ Deletion of chunks
+ Convenient
+ One call frees all memory
regionmalloc(r, sz)
regiondelete(r)
Separate areas, deletion only en masse
regioncreate(r) r
- Risky
- Dangling
references
- Too much space
Increasingly popular custom allocator
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 8
Overview
Introduction
Perceived benefits and drawbacks
Three main kinds of custom allocators
Comparison with general-purpose allocators
Advantages and drawbacks of regions
Reaps – generalization of regions & heaps
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 9
Custom Allocators Are Faster…
Runtime - Custom Allocator Benchmarks
0
0.25
0.5
0.75
1
1.25
1.5
1.75
197.
pars
er
boxe
d-sim
c-br
eeze
175.
vpr
176.
gcc
apac
helcc
mud
lle
No
rma
lize
d R
un
tim
e
Custom Win32
non-regions regions
As good as and sometimes much faster than Win32
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 10
Not So Fast…
Runtime - Custom Allocator Benchmarks
0
0.25
0.5
0.75
1
1.25
1.5
1.75
197.
pars
er
boxe
d-sim
c-br
eeze
175.
vpr
176.
gcc
apac
he lcc
mud
lle
No
rma
lize
d R
un
tim
e
Custom Win32 DLmalloc
non-regions regions
DLmalloc: as fast or faster for most benchmarks
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 11
The Lea Allocator (DLmalloc 2.7.0)
Mature public-domain general-purpose allocator
Optimized for common allocation patterns
Per-size quicklists ≈ per-class allocation
Deferred coalescing(combining adjacent free objects)
Highly-optimized fastpath
Space-efficient
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 12
Space Consumption: Mixed Results
Space - Custom Allocator Benchmarks
0
0.25
0.5
0.75
1
1.25
1.5
1.75
197.
pars
er
boxe
d-sim
c-br
eeze
175.
vpr
176.
gcc
apac
he lcc
mud
lle
No
rmalized
Sp
ace
Custom DLmalloc
regionsnon-regions
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 13
Overview
Introduction
Perceived benefits and drawbacks
Three main kinds of custom allocators
Comparison with general-purpose allocators
Advantages and drawbacks of regions
Reaps – generalization of regions & heaps
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 14
Regions – Pros and Cons
+ Fast, convenient, etc.
+ Avoid resource leaks (e.g., Apache)
Tear down memory for terminated connections
- No individual object deletion
Unbounded memory consumption(producer-consumer, long-running computations,
off-the-shelf programs)
Apache: vulnerable to DoS, memory leaks
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 15
Reap = region + heap
Adds individual object deletion & heap
Reap Hybrid Allocator
reapmalloc(r, sz)
reapdelete(r)
reapcreate(r)r
reapfree(r,p)
+ Can reduce memory consumption
+ Fast
+ Adapts to use (region or heap style)
+ Cheap deletion
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 16
Reap Runtime
Runtime - Custom Allocation Benchmarks
0
0.25
0.5
0.75
1
1.25
1.5
1.75
197.
pars
er
boxe
d-sim
c-br
eeze
175.
vpr
176.
gcc
apac
he lcc
mud
lle
No
rma
lize
d r
un
tim
e
Custom Win32 DLmalloc Reap
non-regions regions
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 17
Reap Space
Space - Custom Allocator Benchmarks
0
0.25
0.5
0.75
1
1.25
1.5
1.75
197.
pars
er
boxe
d-sim
c-br
eeze
175.
vpr
176.
gcc
apac
he lcc
mud
lle
No
rma
lize
d S
pa
ce
Custom DLmalloc Reap
non-regions regions
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 18
Reap: Best of Both Worlds
Allows mixing of regions and new/delete
Case study:
New Apache module “mod_bc”
bc: C-based arbitrary-precision calculator
Changed 20 lines out of 8000
Benchmark: compute 1000th prime
With Reap: 240K
Without Reap: 7.4MB
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 19
Conclusions and Future Work
Empirical study of custom allocators
Lea allocator often as fast or faster
Non-region custom allocation ineffective
Reap: region performance without drawbacks
Future work:
Reduce space with per-page bitmaps
Combine with scalable general-purpose
allocator (e.g., Hoard)
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 20
Software
http://www.cs.umass.edu/~emery
(Reap: part of Heap Layers distribution)
http://g.oswego.edu
(DLmalloc 2.7.0)
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 21
If You Can Read This,
I Went Too Far
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 23
Experimental Methodology
Comparing to general-purpose
allocators
Same semantics: no problem
E.g., disable per-class allocators
Different semantics: use emulator
Uses general-purpose allocator
Adds bookkeeping to support
region semantics
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 24
Why Did They Do That?
Recommended practice
Premature optimization
Microbenchmarks vs. actual performance
Drift
Not bottleneck anymore
Improved competition
Modern allocators are better
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 25
Reaps as Regions: Runtime
Runtime - Region-Based Benchmarks
0
0.25
0.5
0.75
1
1.25
1.5
1.75
lcc mudlle
No
rma
lize
d R
un
tim
e
Custom Win32 DLmalloc Reap
Reap performance nearly matches regions
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 26
Using Reap as Regions
Runtime - Region-Based Benchmarks
0
0.5
1
1.5
2
2.5
lcc mudlle
No
rma
lize
d R
un
tim
e
Original Win32 DLmalloc WinHeap Vmalloc Reap
4.08
Reap performance nearly matches regions
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 27
Drawbacks of Regions
Can’t reclaim memory within regions
Bad for long-running computations,
producer-consumer patterns,
“malloc/free” programs
unbounded memory consumption
Current situation for Apache:
vulnerable to denial-of-service
limits runtime of connections
limits module programming
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 28
Use Custom Allocators?
Strongly recommended by practitioners
Little hard data on performance/space improvements
Only one previous study [Zorn 1992]
Focused on just one type of allocator
Custom allocators: waste of time
Small gains, bad allocators
Different allocators better? Trade-offs?
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 29
Kinds of Custom Allocators
Three basic types of custom allocators
Per-class
Fast
Custom patterns
Fast, but very special-purpose
Regions
Fast, possibly more space-efficient
Convenient
Variants: nested, obstacks
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 30
Optimization Opportunity
Time Spent in Memory Operations
0
20
40
60
80
100
197.pa
rser
boxe
d-sim
c-br
eeze
175.vp
r
176.gc
c
apac
he lcc
mud
lle
Avera
ge
% o
f ru
nti
me
Memory Operations Other