Scalable Locality- Conscious Multithreaded Memory Allocation Scott Schneider Christos D....

32
Scalable Locality- Conscious Multithreaded Memory Allocation Scott Schneider Christos D. Antonopoulos Dimitrios S. Nikolopoulos The College of William and Mary The 2006 International Symposium on Memory Management June 10, 2006
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    222
  • download

    0

Transcript of Scalable Locality- Conscious Multithreaded Memory Allocation Scott Schneider Christos D....

Scalable Locality-Conscious Multithreaded

Memory Allocation

Scott Schneider

Christos D. Antonopoulos

Dimitrios S. Nikolopoulos

The College of William and Mary

The 2006 International Symposium on Memory Management

June 10, 2006

Outline

Introduction Related Work Streamflow design: data structures and

operations Experimental Evaluation Conclusions

Introduction

Multithreading is becoming more common Sophistication of system software trails

hardware Synchronization mechanisms used in system

software can greatly effect performance

Related Work

Hoard Emery Berger et al., ASPLOS 2000 Lock based, per-processor and global heaps

Michael’s Maged Michael, PLDI 2004 Lock-free

Tcmalloc Sanjay Ghemawat, part of Google’s perftools Lock based

Streamflow

Promote scalability and reduce latency Lock-free algorithms and data structures Synchronization-free in the common case Decoupled remote object deallocation

Promote locality Favors locally recycled objects in private heaps Thread-local heaps reduces false-sharing Removing object headers Custom page manager

Design: Data Structures

. . .

1-4

5-8 9-12 13-16

2045-2048

Obj

ect s

ize

clas

ses

mal

loc/

free

Active Head

Active Tail . . .

. . .

Thr

ead

1 T

hrea

d n

Page blk 1 Page blk 2 Page blk k

Freed Unallocated

Object

Next

Prev

Rem

otel

y F

reed

ID

Design: Data Structures

. . .

1-4

5-8 9-12 13-16

2045-2048

Obj

ect s

ize

clas

ses

mal

loc/

free

Active Head

Active Tail . . .

. . .

Thr

ead

1 T

hrea

d n

Page blk 1 Page blk 2 Page blk k

Freed Unallocated

Object

Next

Prev

Rem

otel

y F

reed

ID

heaps pageblocks

Design: Data Structures

. . .

1-4

5-8 9-12 13-16

2045-2048

Obj

ect s

ize

clas

ses

mal

loc/

free

Active Head

Active Tail . . .

. . .

Th

read

1

Th

read

n

Page blk 1 Page blk 2 Page blk k

Freed Unallocated

Object

Next

Prev

Rem

otel

y F

reed

ID

Design: Data Structures

. . .

1-4

5-8 9-12 13-16

2045-2048

Ob

ject

siz

e cl

asse

s

mal

loc/

free

Active Head

Active Tail . . .

. . .

Thr

ead

1 T

hrea

d n

Page blk 1 Page blk 2 Page blk k

Freed Unallocated

Object

Next

Prev

Rem

otel

y F

reed

ID

Design: Data Structures

. . .

1-4

5-8 9-12 13-16

2045-2048

Obj

ect s

ize

clas

ses

mal

loc/

free

Active Head

Active Tail . . .

. . .

Thr

ead

1 T

hrea

d n

Page blk 1 Page blk 2 Page blk k

Freed Unallocated

Object

Next

Prev

Rem

otel

y F

reed

ID

Design: Data Structures

. . .

1-4

5-8 9-12 13-16

2045-2048

Obj

ect s

ize

clas

ses

mal

loc/

free

Active Head

Active Tail . . .

. . .

Thr

ead

1 T

hrea

d n

Page blk 1 Page blk 2 Page blk k

Freed Unallocated

Object

Next

Prev

Rem

otel

y F

reed

ID

Design: Data Structures

. . .

1-4

5-8 9-12 13-16

2045-2048

Obj

ect s

ize

clas

ses

mal

loc/

free

Active Head

Active Tail . . .

. . .

Thr

ead

1 T

hrea

d n

Page blk 1 Page blk 2 Page blk k

Freed Unallocated

Object

Next

Prev

Rem

otel

y F

reed

ID

Design: Data Structures

. . .

1-4

5-8 9-12 13-16

2045-2048

Obj

ect s

ize

clas

ses

mal

loc/

free

Active Head

Active Tail . . .

. . .

Thr

ead

1 T

hrea

d n

Page blk 1 Page blk 2 Page blk k

Freed Unallocated

Object

Next

Prev

Rem

otel

y F

reed

ID

Design: Allocation

. . .

1-4

5-8 9-12 13-16

2045-2048

Obj

ect s

ize

clas

ses

mal

loc/

free

Active Head

Active Tail . . .

. . .

Thr

ead

1 T

hrea

d n

Page blk 1 Page blk 2 Page blk k

Freed Unallocated

Object

Next

Prev

Rem

otel

y F

reed

ID

Design: Allocation

. . .

1-4

5-8 9-12 13-16

2045-2048

Obj

ect s

ize

clas

ses

mal

loc/

free

Active Head

Active Tail . . .

. . .

Thr

ead

1 T

hrea

d n

Page blk 1 Page blk 2 Page blk k

Freed Unallocated

Object

Next

Prev

Rem

otel

y F

reed

ID

Design: Allocation

. . .

1-4

5-8 9-12 13-16

2045-2048

Obj

ect s

ize

clas

ses

mal

loc/

free

Active Head

Active Tail . . .

. . .

Thr

ead

1 T

hrea

d n

Page blk 1 Page blk 2 Page blk k

Freed Unallocated

Object

Next

Prev

Rem

otel

y F

reed

ID

Design: Local Free

. . .

1-4

5-8 9-12 13-16

2045-2048

Obj

ect s

ize

clas

ses

mal

loc/

free

Active Head

Active Tail . . .

. . .

Thr

ead

1 T

hrea

d n

Page blk 1 Page blk 2 Page blk k

Freed Unallocated

Object

Next

Prev

Rem

otel

y F

reed

ID

Design: Local Free

. . .

1-4

5-8 9-12 13-16

2045-2048

Obj

ect s

ize

clas

ses

mal

loc/

free

Active Head

Active Tail . . .

. . .

Th

read

1

Thr

ead

n

Page blk 1 Page blk 2 Page blk k

Freed Unallocated

Object

Next

Prev

Rem

otel

y F

reed

ID

pageblock belongs to current thread

Design: Local Free

. . .

1-4

5-8 9-12 13-16

2045-2048

Obj

ect s

ize

clas

ses

mal

loc/

free

Active Head

Active Tail . . .

. . .

Th

read

1

Thr

ead

n

Page blk 1 Page blk 2 Page blk k

Freed Unallocated

Object

Next

Prev

Rem

otel

y F

reed

ID

pageblock belongs to current thread

Design: Remote Free

. . .

1-4

5-8 9-12 13-16

2045-2048

Obj

ect s

ize

clas

ses

mal

loc/

free

Active Head

Active Tail . . .

. . .

Th

read

1

Thr

ead

n

Page blk 1 Page blk 2 Page blk k

Freed Unallocated

Object

Next

Prev

Rem

otel

y F

reed

ID

pageblock does not belong to current thread

Design: Remote Free

. . .

1-4

5-8 9-12 13-16

2045-2048

Obj

ect s

ize

clas

ses

mal

loc/

free

Active Head

Active Tail . . .

. . .

Th

read

1

Thr

ead

n

Page blk 1 Page blk 2 Page blk k

Freed Unallocated

Object

Next

Prev

Rem

otel

y F

reed

ID

pageblock does not belong to current thread

Design: Page Manager

Manages pageblocks Implemented using superpages; 4MB vs. 4K

Allows Streamflow to allocate pageblocks in contiguous physical memory regions

Reduces TLB misses and minor page faults Superpage headers are managed similar to

small objects Pageblocks are allocated within a superpage

using buddy allocation

Evaluation: System

4 processor Dell PowerEdge 6650 Hyper-Threaded Intel Xeon processors at 2.0GHz 2 GB RAM

Suse Linux 9.1 with kernel 2.6.13.4 and glibc 2-3.3

Hoard version 3.3.0 Tcmalloc version 0.4 Custom 32-bit implementation of Michael’s

Evaluation: Benchmarks

Sequential Parser: SPECINT2000 English parser

Multithreaded Synthetic

Recycle: stresses local allocation and frees Larson: server simulator; stresses remote frees Consume: producer-consume

Applications MPCDM: Multithreaded mesh generation

Evaluation: Sequential

sequential Streamflow multithreaded

Parser

0

100

200

300

400

500

600

700

Exe

cutio

n t

ime

(se

con

ds)

glibc sequential

Vam

Hoard sequential

Streamflow headers

Streamflow wo headers

Streamflow super

glibc MT

Hoard MT

Michael

Tcmalloc

Evaluation: Multithreaded

Recycle

0

10

20

30

40

50

60

70

1 2 3 4 5 6 7 8

Threads

Exe

cutio

n ti

me

(se

c.)

Streamflowheaders

Streamflow woheaders

Streamflow super

Michael

Hoard

glibc

Tcmalloc

Evaluation: Multithreaded

Larson

0

2

4

6

8

10

12

14

16

1 2 3 4 5 6 7 8

Threads

Th

rou

gh

pu

t (M

op

s/se

c)

Streamflowheaders

Streamflow woheaders

Streamflow super

Michael

Hoard

glibc

Tcmalloc

Evaluation: Multithreaded

Consume

0

50

100

150

200

250

300

350

400

1 2 3 4 5 6 7 8

Threads

Exe

cutio

n ti

me

(se

c.)

Streamflowheaders

Streamflow woheaders

Streamflow super

Michael

Hoard

glibc

Tcmalloc

Evaluation: Multithreaded

MPCDM

0

5

10

15

20

25

30

35

40

45

1 2 3 4 5 6 7 8

Threads

Exe

cutio

n ti

me

(se

c.)

Streamflowheaders

Streamflow woheaders

Streamflow super

Michael

Hoard

glibc

Tcmalloc

Conclusions

Presented a new memory allocator design Uses lock-free algorithms and data structures Synchronization-free in the common case Promotes locality at multiple levels

Experimental evaluation shows the designs performs in practice

http://www.cs.wm.edu/streamflow

Evaluation: Multithreaded

Knary

0

5

10

15

20

25

30

35

40

1 2 3 4 5 6 7 8

Threads

Exe

cutio

n ti

me

(se

c.)

Streamflowheaders

Streamflow woheaders

Streamflow super

Michael

Hoard

glibc

Tcmalloc

Evaluation: Multithreaded

Barnes

0

10

20

30

40

50

60

70

1 2 3 4 5 6 7 8

Threads

Exe

cutio

n ti

me

(se

c.)

Streamflowheaders

Streamflow woheaders

Streamflow super

Michael

hoard

glibc

Tcmalloc