Reconsidering Custom Memory Allocation

30
UNIVERSITY OF MASSACHUSETTS DEPARTMENT OF COMPUTER SCIENCE Reconsidering Custom Memory Allocation Emery Berger, Ben Zorn, Kathryn McKinley

description

This talk presents an extensive experimental study that shows that a good general-purpose allocator is better than almost all commonly-used custom allocators, with one exception: regions (a.k.a., pools, arenas, zones). However, it shows that regions consume much more memory than necessary. The talk then introduces reaps (regions + heaps), which combine the flexibility and space efficiency of heaps with the performance of regions.

Transcript of Reconsidering Custom Memory Allocation

Page 1: Reconsidering Custom Memory Allocation

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE

Reconsidering

Custom Memory Allocation

Emery Berger, Ben Zorn, Kathryn McKinley

Page 2: Reconsidering Custom Memory Allocation

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 2

Custom Memory Allocation

Very common practice

Apache, gcc, lcc, STL,

database servers…

Language-level

support in C++

Widely recommended

Programmers replace

new/delete, bypassing

system allocator

Reduce runtime – often

Expand functionality – sometimes

Reduce space – rarely

“Use custom

allocators”

Page 3: Reconsidering Custom Memory Allocation

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 3

Drawbacks of Custom Allocators

Avoiding system allocator:

More code to maintain & debug

Can’t use memory debuggers

Not modular or robust:

Mix memory from custom

and general-purpose allocators → crash!

Increased burden on programmers

Are custom allocators really a win?

Page 4: Reconsidering Custom Memory Allocation

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 4

Overview

Introduction

Perceived benefits and drawbacks

Three main kinds of custom allocators

Comparison with general-purpose allocators

Advantages and drawbacks of regions

Reaps – generalization of regions & heaps

Page 5: Reconsidering Custom Memory Allocation

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 5

Class1 free list

(1) Per-Class Allocators

a

b

c

a = new Class1;

b = new Class1;

c = new Class1;

delete a;

delete b;

delete c;

a = new Class1;

b = new Class1;

c = new Class1;

Recycle freed objects from a free list

+ Fast+ Linked list operations

+ Simple

+ Identical semantics

+ C++ language support

- Possibly space-inefficient

Page 6: Reconsidering Custom Memory Allocation

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 6

(II) Custom Patterns

Tailor-made to fit allocation patterns

Example: 197.parser (natural language parser)

char[MEMORY_LIMIT]

a = xalloc(8);b = xalloc(16);c = xalloc(8);xfree(b);xfree(c);d = xalloc(8);

a b cd

end_of_arrayend_of_arrayend_of_arrayend_of_arrayend_of_arrayend_of_array

+ Fast+ Pointer-bumping allocation

- Brittle

- Fixed memory size

- Requires stack-like lifetimes

Page 7: Reconsidering Custom Memory Allocation

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 7

(III) Regions

+ Fast

+ Pointer-bumping allocation

+ Deletion of chunks

+ Convenient

+ One call frees all memory

regionmalloc(r, sz)

regiondelete(r)

Separate areas, deletion only en masse

regioncreate(r) r

- Risky

- Dangling

references

- Too much space

Increasingly popular custom allocator

Page 8: Reconsidering Custom Memory Allocation

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 8

Overview

Introduction

Perceived benefits and drawbacks

Three main kinds of custom allocators

Comparison with general-purpose allocators

Advantages and drawbacks of regions

Reaps – generalization of regions & heaps

Page 9: Reconsidering Custom Memory Allocation

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 9

Custom Allocators Are Faster…

Runtime - Custom Allocator Benchmarks

0

0.25

0.5

0.75

1

1.25

1.5

1.75

197.

pars

er

boxe

d-sim

c-br

eeze

175.

vpr

176.

gcc

apac

helcc

mud

lle

No

rma

lize

d R

un

tim

e

Custom Win32

non-regions regions

As good as and sometimes much faster than Win32

Page 10: Reconsidering Custom Memory Allocation

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 10

Not So Fast…

Runtime - Custom Allocator Benchmarks

0

0.25

0.5

0.75

1

1.25

1.5

1.75

197.

pars

er

boxe

d-sim

c-br

eeze

175.

vpr

176.

gcc

apac

he lcc

mud

lle

No

rma

lize

d R

un

tim

e

Custom Win32 DLmalloc

non-regions regions

DLmalloc: as fast or faster for most benchmarks

Page 11: Reconsidering Custom Memory Allocation

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 11

The Lea Allocator (DLmalloc 2.7.0)

Mature public-domain general-purpose allocator

Optimized for common allocation patterns

Per-size quicklists ≈ per-class allocation

Deferred coalescing(combining adjacent free objects)

Highly-optimized fastpath

Space-efficient

Page 12: Reconsidering Custom Memory Allocation

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 12

Space Consumption: Mixed Results

Space - Custom Allocator Benchmarks

0

0.25

0.5

0.75

1

1.25

1.5

1.75

197.

pars

er

boxe

d-sim

c-br

eeze

175.

vpr

176.

gcc

apac

he lcc

mud

lle

No

rmalized

Sp

ace

Custom DLmalloc

regionsnon-regions

Page 13: Reconsidering Custom Memory Allocation

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 13

Overview

Introduction

Perceived benefits and drawbacks

Three main kinds of custom allocators

Comparison with general-purpose allocators

Advantages and drawbacks of regions

Reaps – generalization of regions & heaps

Page 14: Reconsidering Custom Memory Allocation

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 14

Regions – Pros and Cons

+ Fast, convenient, etc.

+ Avoid resource leaks (e.g., Apache)

Tear down memory for terminated connections

- No individual object deletion

Unbounded memory consumption(producer-consumer, long-running computations,

off-the-shelf programs)

Apache: vulnerable to DoS, memory leaks

Page 15: Reconsidering Custom Memory Allocation

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 15

Reap = region + heap

Adds individual object deletion & heap

Reap Hybrid Allocator

reapmalloc(r, sz)

reapdelete(r)

reapcreate(r)r

reapfree(r,p)

+ Can reduce memory consumption

+ Fast

+ Adapts to use (region or heap style)

+ Cheap deletion

Page 16: Reconsidering Custom Memory Allocation

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 16

Reap Runtime

Runtime - Custom Allocation Benchmarks

0

0.25

0.5

0.75

1

1.25

1.5

1.75

197.

pars

er

boxe

d-sim

c-br

eeze

175.

vpr

176.

gcc

apac

he lcc

mud

lle

No

rma

lize

d r

un

tim

e

Custom Win32 DLmalloc Reap

non-regions regions

Page 17: Reconsidering Custom Memory Allocation

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 17

Reap Space

Space - Custom Allocator Benchmarks

0

0.25

0.5

0.75

1

1.25

1.5

1.75

197.

pars

er

boxe

d-sim

c-br

eeze

175.

vpr

176.

gcc

apac

he lcc

mud

lle

No

rma

lize

d S

pa

ce

Custom DLmalloc Reap

non-regions regions

Page 18: Reconsidering Custom Memory Allocation

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 18

Reap: Best of Both Worlds

Allows mixing of regions and new/delete

Case study:

New Apache module “mod_bc”

bc: C-based arbitrary-precision calculator

Changed 20 lines out of 8000

Benchmark: compute 1000th prime

With Reap: 240K

Without Reap: 7.4MB

Page 19: Reconsidering Custom Memory Allocation

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 19

Conclusions and Future Work

Empirical study of custom allocators

Lea allocator often as fast or faster

Non-region custom allocation ineffective

Reap: region performance without drawbacks

Future work:

Reduce space with per-page bitmaps

Combine with scalable general-purpose

allocator (e.g., Hoard)

Page 20: Reconsidering Custom Memory Allocation

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 20

Software

http://www.cs.umass.edu/~emery

(Reap: part of Heap Layers distribution)

http://g.oswego.edu

(DLmalloc 2.7.0)

Page 21: Reconsidering Custom Memory Allocation

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 21

If You Can Read This,

I Went Too Far

Page 22: Reconsidering Custom Memory Allocation

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 22

Backup Slides

Page 23: Reconsidering Custom Memory Allocation

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 23

Experimental Methodology

Comparing to general-purpose

allocators

Same semantics: no problem

E.g., disable per-class allocators

Different semantics: use emulator

Uses general-purpose allocator

Adds bookkeeping to support

region semantics

Page 24: Reconsidering Custom Memory Allocation

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 24

Why Did They Do That?

Recommended practice

Premature optimization

Microbenchmarks vs. actual performance

Drift

Not bottleneck anymore

Improved competition

Modern allocators are better

Page 25: Reconsidering Custom Memory Allocation

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 25

Reaps as Regions: Runtime

Runtime - Region-Based Benchmarks

0

0.25

0.5

0.75

1

1.25

1.5

1.75

lcc mudlle

No

rma

lize

d R

un

tim

e

Custom Win32 DLmalloc Reap

Reap performance nearly matches regions

Page 26: Reconsidering Custom Memory Allocation

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 26

Using Reap as Regions

Runtime - Region-Based Benchmarks

0

0.5

1

1.5

2

2.5

lcc mudlle

No

rma

lize

d R

un

tim

e

Original Win32 DLmalloc WinHeap Vmalloc Reap

4.08

Reap performance nearly matches regions

Page 27: Reconsidering Custom Memory Allocation

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 27

Drawbacks of Regions

Can’t reclaim memory within regions

Bad for long-running computations,

producer-consumer patterns,

“malloc/free” programs

unbounded memory consumption

Current situation for Apache:

vulnerable to denial-of-service

limits runtime of connections

limits module programming

Page 28: Reconsidering Custom Memory Allocation

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 28

Use Custom Allocators?

Strongly recommended by practitioners

Little hard data on performance/space improvements

Only one previous study [Zorn 1992]

Focused on just one type of allocator

Custom allocators: waste of time

Small gains, bad allocators

Different allocators better? Trade-offs?

Page 29: Reconsidering Custom Memory Allocation

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 29

Kinds of Custom Allocators

Three basic types of custom allocators

Per-class

Fast

Custom patterns

Fast, but very special-purpose

Regions

Fast, possibly more space-efficient

Convenient

Variants: nested, obstacks

Page 30: Reconsidering Custom Memory Allocation

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 30

Optimization Opportunity

Time Spent in Memory Operations

0

20

40

60

80

100

197.pa

rser

boxe

d-sim

c-br

eeze

175.vp

r

176.gc

c

apac

he lcc

mud

lle

Avera

ge

% o

f ru

nti

me

Memory Operations Other