Scalable Locality- Conscious Multithreaded Memory Allocation Scott Schneider Christos D....
-
date post
21-Dec-2015 -
Category
Documents
-
view
222 -
download
0
Transcript of Scalable Locality- Conscious Multithreaded Memory Allocation Scott Schneider Christos D....
Scalable Locality-Conscious Multithreaded
Memory Allocation
Scott Schneider
Christos D. Antonopoulos
Dimitrios S. Nikolopoulos
The College of William and Mary
The 2006 International Symposium on Memory Management
June 10, 2006
Outline
Introduction Related Work Streamflow design: data structures and
operations Experimental Evaluation Conclusions
Introduction
Multithreading is becoming more common Sophistication of system software trails
hardware Synchronization mechanisms used in system
software can greatly effect performance
Related Work
Hoard Emery Berger et al., ASPLOS 2000 Lock based, per-processor and global heaps
Michael’s Maged Michael, PLDI 2004 Lock-free
Tcmalloc Sanjay Ghemawat, part of Google’s perftools Lock based
Streamflow
Promote scalability and reduce latency Lock-free algorithms and data structures Synchronization-free in the common case Decoupled remote object deallocation
Promote locality Favors locally recycled objects in private heaps Thread-local heaps reduces false-sharing Removing object headers Custom page manager
Design: Data Structures
. . .
1-4
5-8 9-12 13-16
2045-2048
Obj
ect s
ize
clas
ses
mal
loc/
free
Active Head
Active Tail . . .
. . .
Thr
ead
1 T
hrea
d n
Page blk 1 Page blk 2 Page blk k
Freed Unallocated
Object
Next
Prev
Rem
otel
y F
reed
ID
Design: Data Structures
. . .
1-4
5-8 9-12 13-16
2045-2048
Obj
ect s
ize
clas
ses
mal
loc/
free
Active Head
Active Tail . . .
. . .
Thr
ead
1 T
hrea
d n
Page blk 1 Page blk 2 Page blk k
Freed Unallocated
Object
Next
Prev
Rem
otel
y F
reed
ID
heaps pageblocks
Design: Data Structures
. . .
1-4
5-8 9-12 13-16
2045-2048
Obj
ect s
ize
clas
ses
mal
loc/
free
Active Head
Active Tail . . .
. . .
Th
read
1
Th
read
n
Page blk 1 Page blk 2 Page blk k
Freed Unallocated
Object
Next
Prev
Rem
otel
y F
reed
ID
Design: Data Structures
. . .
1-4
5-8 9-12 13-16
2045-2048
Ob
ject
siz
e cl
asse
s
mal
loc/
free
Active Head
Active Tail . . .
. . .
Thr
ead
1 T
hrea
d n
Page blk 1 Page blk 2 Page blk k
Freed Unallocated
Object
Next
Prev
Rem
otel
y F
reed
ID
Design: Data Structures
. . .
1-4
5-8 9-12 13-16
2045-2048
Obj
ect s
ize
clas
ses
mal
loc/
free
Active Head
Active Tail . . .
. . .
Thr
ead
1 T
hrea
d n
Page blk 1 Page blk 2 Page blk k
Freed Unallocated
Object
Next
Prev
Rem
otel
y F
reed
ID
Design: Data Structures
. . .
1-4
5-8 9-12 13-16
2045-2048
Obj
ect s
ize
clas
ses
mal
loc/
free
Active Head
Active Tail . . .
. . .
Thr
ead
1 T
hrea
d n
Page blk 1 Page blk 2 Page blk k
Freed Unallocated
Object
Next
Prev
Rem
otel
y F
reed
ID
Design: Data Structures
. . .
1-4
5-8 9-12 13-16
2045-2048
Obj
ect s
ize
clas
ses
mal
loc/
free
Active Head
Active Tail . . .
. . .
Thr
ead
1 T
hrea
d n
Page blk 1 Page blk 2 Page blk k
Freed Unallocated
Object
Next
Prev
Rem
otel
y F
reed
ID
Design: Data Structures
. . .
1-4
5-8 9-12 13-16
2045-2048
Obj
ect s
ize
clas
ses
mal
loc/
free
Active Head
Active Tail . . .
. . .
Thr
ead
1 T
hrea
d n
Page blk 1 Page blk 2 Page blk k
Freed Unallocated
Object
Next
Prev
Rem
otel
y F
reed
ID
Design: Allocation
. . .
1-4
5-8 9-12 13-16
2045-2048
Obj
ect s
ize
clas
ses
mal
loc/
free
Active Head
Active Tail . . .
. . .
Thr
ead
1 T
hrea
d n
Page blk 1 Page blk 2 Page blk k
Freed Unallocated
Object
Next
Prev
Rem
otel
y F
reed
ID
Design: Allocation
. . .
1-4
5-8 9-12 13-16
2045-2048
Obj
ect s
ize
clas
ses
mal
loc/
free
Active Head
Active Tail . . .
. . .
Thr
ead
1 T
hrea
d n
Page blk 1 Page blk 2 Page blk k
Freed Unallocated
Object
Next
Prev
Rem
otel
y F
reed
ID
Design: Allocation
. . .
1-4
5-8 9-12 13-16
2045-2048
Obj
ect s
ize
clas
ses
mal
loc/
free
Active Head
Active Tail . . .
. . .
Thr
ead
1 T
hrea
d n
Page blk 1 Page blk 2 Page blk k
Freed Unallocated
Object
Next
Prev
Rem
otel
y F
reed
ID
Design: Local Free
. . .
1-4
5-8 9-12 13-16
2045-2048
Obj
ect s
ize
clas
ses
mal
loc/
free
Active Head
Active Tail . . .
. . .
Thr
ead
1 T
hrea
d n
Page blk 1 Page blk 2 Page blk k
Freed Unallocated
Object
Next
Prev
Rem
otel
y F
reed
ID
Design: Local Free
. . .
1-4
5-8 9-12 13-16
2045-2048
Obj
ect s
ize
clas
ses
mal
loc/
free
Active Head
Active Tail . . .
. . .
Th
read
1
Thr
ead
n
Page blk 1 Page blk 2 Page blk k
Freed Unallocated
Object
Next
Prev
Rem
otel
y F
reed
ID
pageblock belongs to current thread
Design: Local Free
. . .
1-4
5-8 9-12 13-16
2045-2048
Obj
ect s
ize
clas
ses
mal
loc/
free
Active Head
Active Tail . . .
. . .
Th
read
1
Thr
ead
n
Page blk 1 Page blk 2 Page blk k
Freed Unallocated
Object
Next
Prev
Rem
otel
y F
reed
ID
pageblock belongs to current thread
Design: Remote Free
. . .
1-4
5-8 9-12 13-16
2045-2048
Obj
ect s
ize
clas
ses
mal
loc/
free
Active Head
Active Tail . . .
. . .
Th
read
1
Thr
ead
n
Page blk 1 Page blk 2 Page blk k
Freed Unallocated
Object
Next
Prev
Rem
otel
y F
reed
ID
pageblock does not belong to current thread
Design: Remote Free
. . .
1-4
5-8 9-12 13-16
2045-2048
Obj
ect s
ize
clas
ses
mal
loc/
free
Active Head
Active Tail . . .
. . .
Th
read
1
Thr
ead
n
Page blk 1 Page blk 2 Page blk k
Freed Unallocated
Object
Next
Prev
Rem
otel
y F
reed
ID
pageblock does not belong to current thread
Design: Page Manager
Manages pageblocks Implemented using superpages; 4MB vs. 4K
Allows Streamflow to allocate pageblocks in contiguous physical memory regions
Reduces TLB misses and minor page faults Superpage headers are managed similar to
small objects Pageblocks are allocated within a superpage
using buddy allocation
Evaluation: System
4 processor Dell PowerEdge 6650 Hyper-Threaded Intel Xeon processors at 2.0GHz 2 GB RAM
Suse Linux 9.1 with kernel 2.6.13.4 and glibc 2-3.3
Hoard version 3.3.0 Tcmalloc version 0.4 Custom 32-bit implementation of Michael’s
Evaluation: Benchmarks
Sequential Parser: SPECINT2000 English parser
Multithreaded Synthetic
Recycle: stresses local allocation and frees Larson: server simulator; stresses remote frees Consume: producer-consume
Applications MPCDM: Multithreaded mesh generation
Evaluation: Sequential
sequential Streamflow multithreaded
Parser
0
100
200
300
400
500
600
700
Exe
cutio
n t
ime
(se
con
ds)
glibc sequential
Vam
Hoard sequential
Streamflow headers
Streamflow wo headers
Streamflow super
glibc MT
Hoard MT
Michael
Tcmalloc
Evaluation: Multithreaded
Recycle
0
10
20
30
40
50
60
70
1 2 3 4 5 6 7 8
Threads
Exe
cutio
n ti
me
(se
c.)
Streamflowheaders
Streamflow woheaders
Streamflow super
Michael
Hoard
glibc
Tcmalloc
Evaluation: Multithreaded
Larson
0
2
4
6
8
10
12
14
16
1 2 3 4 5 6 7 8
Threads
Th
rou
gh
pu
t (M
op
s/se
c)
Streamflowheaders
Streamflow woheaders
Streamflow super
Michael
Hoard
glibc
Tcmalloc
Evaluation: Multithreaded
Consume
0
50
100
150
200
250
300
350
400
1 2 3 4 5 6 7 8
Threads
Exe
cutio
n ti
me
(se
c.)
Streamflowheaders
Streamflow woheaders
Streamflow super
Michael
Hoard
glibc
Tcmalloc
Evaluation: Multithreaded
MPCDM
0
5
10
15
20
25
30
35
40
45
1 2 3 4 5 6 7 8
Threads
Exe
cutio
n ti
me
(se
c.)
Streamflowheaders
Streamflow woheaders
Streamflow super
Michael
Hoard
glibc
Tcmalloc
Conclusions
Presented a new memory allocator design Uses lock-free algorithms and data structures Synchronization-free in the common case Promotes locality at multiple levels
Experimental evaluation shows the designs performs in practice
http://www.cs.wm.edu/streamflow
Evaluation: Multithreaded
Knary
0
5
10
15
20
25
30
35
40
1 2 3 4 5 6 7 8
Threads
Exe
cutio
n ti
me
(se
c.)
Streamflowheaders
Streamflow woheaders
Streamflow super
Michael
Hoard
glibc
Tcmalloc