IBM Labs in Haifa © 2004 IBM Corporation An Efficient Parallel Heap Compaction Diab Abuaiadh, Yoav...

29
IBM Labs in Haifa © 2004 IBM Corporation An Efficient Parallel Heap Compaction Diab Abuaiadh, Yoav Ossia, Erez Petrank, Uri Silberstein IBM Haifa Research Lab
  • date post

    18-Dec-2015
  • Category

    Documents

  • view

    214
  • download

    0

Transcript of IBM Labs in Haifa © 2004 IBM Corporation An Efficient Parallel Heap Compaction Diab Abuaiadh, Yoav...

Page 1: IBM Labs in Haifa © 2004 IBM Corporation An Efficient Parallel Heap Compaction Diab Abuaiadh, Yoav Ossia, Erez Petrank, Uri Silberstein IBM Haifa Research.

IBM Labs in Haifa © 2004 IBM Corporation

An Efficient Parallel Heap Compaction

Diab Abuaiadh, Yoav Ossia,

Erez Petrank, Uri Silberstein

IBM Haifa Research Lab

Page 2: IBM Labs in Haifa © 2004 IBM Corporation An Efficient Parallel Heap Compaction Diab Abuaiadh, Yoav Ossia, Erez Petrank, Uri Silberstein IBM Haifa Research.

IBM Labs in Haifa

© 2004 IBM Corporation2/30 OOPSLA 2004: Efficient Parallel Heap Compaction

Garbage Collection Today

Garbage Collection (GC) is an important part of the memory management system of many modern programming languages such as Java and C#

Modern computing systems include multithreading and SMP platforms.

Modern GC must support such setting, especially with large server systems being built on languages such as Java and C#.

Page 3: IBM Labs in Haifa © 2004 IBM Corporation An Efficient Parallel Heap Compaction Diab Abuaiadh, Yoav Ossia, Erez Petrank, Uri Silberstein IBM Haifa Research.

IBM Labs in Haifa

© 2004 IBM Corporation3/30 OOPSLA 2004: Efficient Parallel Heap Compaction

Garbage Collection: General

The heap before GC

The heap after GC

Page 4: IBM Labs in Haifa © 2004 IBM Corporation An Efficient Parallel Heap Compaction Diab Abuaiadh, Yoav Ossia, Erez Petrank, Uri Silberstein IBM Haifa Research.

IBM Labs in Haifa

© 2004 IBM Corporation4/30 OOPSLA 2004: Efficient Parallel Heap Compaction

fragmentation

After a few GCs the heap becomes fragmented

Fragmentation causes: Slow allocation Premature GC when allocating large objects Bad locality of reference Bad chances for allocating huge objects

Page 5: IBM Labs in Haifa © 2004 IBM Corporation An Efficient Parallel Heap Compaction Diab Abuaiadh, Yoav Ossia, Erez Petrank, Uri Silberstein IBM Haifa Research.

IBM Labs in Haifa

© 2004 IBM Corporation5/30 OOPSLA 2004: Efficient Parallel Heap Compaction

Compaction

Compaction: solution for the fragmentation problems Moving the objects to be close to each other

Heap before compaction

Heap after Compaction

Page 6: IBM Labs in Haifa © 2004 IBM Corporation An Efficient Parallel Heap Compaction Diab Abuaiadh, Yoav Ossia, Erez Petrank, Uri Silberstein IBM Haifa Research.

IBM Labs in Haifa

© 2004 IBM Corporation6/30 OOPSLA 2004: Efficient Parallel Heap Compaction

Previous Work

Several known compaction algorithms (two fingers, threaded algorithm). Not appropriate for SMP machines

First parallel compaction: Sun’s algorithm [Flood-Detlefs-Shavit-Zhang 2001]

Page 7: IBM Labs in Haifa © 2004 IBM Corporation An Efficient Parallel Heap Compaction Diab Abuaiadh, Yoav Ossia, Erez Petrank, Uri Silberstein IBM Haifa Research.

IBM Labs in Haifa

© 2004 IBM Corporation7/30 OOPSLA 2004: Efficient Parallel Heap Compaction

3 phases (assuming marking done) Forwarding pointers installation Fix up pointers phase Moving phase

Each phase done in parallel (no contention) Disadvantages:

Restricted maximal size of free chunks 3 passes Space overhead – A forwarding pointer per object

Sun’s Collector

Page 8: IBM Labs in Haifa © 2004 IBM Corporation An Efficient Parallel Heap Compaction Diab Abuaiadh, Yoav Ossia, Erez Petrank, Uri Silberstein IBM Haifa Research.

IBM Labs in Haifa

© 2004 IBM Corporation8/30 OOPSLA 2004: Efficient Parallel Heap Compaction

Features of Our New Compaction Algorithm

Parallel High scalability (almost perfect speed up) (Almost) optimal quality: heap compacted to the lower

addresses. Small (and controllable) space overhead Two Phases: each phase done in parallel

Moving the objects the move phase Updating the pointers the fix-up phase

Page 9: IBM Labs in Haifa © 2004 IBM Corporation An Efficient Parallel Heap Compaction Diab Abuaiadh, Yoav Ossia, Erez Petrank, Uri Silberstein IBM Haifa Research.

IBM Labs in Haifa

© 2004 IBM Corporation9/30 OOPSLA 2004: Efficient Parallel Heap Compaction

Dividing the Heap

We divide the heap to n areas where n is a parameter dependant on the heap size and the number of processors/threads

For example: 640MB heap and 8 processors, we chose n ≈ 64

10MB

640MB

Page 10: IBM Labs in Haifa © 2004 IBM Corporation An Efficient Parallel Heap Compaction Diab Abuaiadh, Yoav Ossia, Erez Petrank, Uri Silberstein IBM Haifa Research.

IBM Labs in Haifa

© 2004 IBM Corporation10/30 OOPSLA 2004: Efficient Parallel Heap Compaction

Squeezing the Objects in Spite of Parallelism

The goal: move all objects to the lower addresses. Each thread compacts one area at a time. Start: an area is compacted into itself (areas on the left

side of the heap) After a while: vacant spaces appear in compacted areas. Course of action: a thread compacts objects of one area

into a the free space of a lower area.

Page 11: IBM Labs in Haifa © 2004 IBM Corporation An Efficient Parallel Heap Compaction Diab Abuaiadh, Yoav Ossia, Erez Petrank, Uri Silberstein IBM Haifa Research.

IBM Labs in Haifa

© 2004 IBM Corporation11/30 OOPSLA 2004: Efficient Parallel Heap Compaction

First Phase: Moving the Objects

Each thread picks the next area to be compacted. Each thread finds a lower area with empty space to

compact into. If no such area exists, compact to the bottom of the same

area.

While moving the objects, we record information that will enable us to update the pointers (Fix up phase)

Page 12: IBM Labs in Haifa © 2004 IBM Corporation An Efficient Parallel Heap Compaction Diab Abuaiadh, Yoav Ossia, Erez Petrank, Uri Silberstein IBM Haifa Research.

IBM Labs in Haifa

© 2004 IBM Corporation12/30 OOPSLA 2004: Efficient Parallel Heap Compaction

Moving the objects: Example

Two threads, 4 area (Thread#1,red area), (Thread#2,blue area)

(Thread#1,brown area), (Thread#2,blue area)

At the end

Page 13: IBM Labs in Haifa © 2004 IBM Corporation An Efficient Parallel Heap Compaction Diab Abuaiadh, Yoav Ossia, Erez Petrank, Uri Silberstein IBM Haifa Research.

IBM Labs in Haifa

© 2004 IBM Corporation13/30 OOPSLA 2004: Efficient Parallel Heap Compaction

More areas

4 threads, 64 areas, In the end we may have some holes at the last areas For a reasonable number of areas, these holes are

insignificant.

At the end

Empty space

………….

………….

Page 14: IBM Labs in Haifa © 2004 IBM Corporation An Efficient Parallel Heap Compaction Diab Abuaiadh, Yoav Ossia, Erez Petrank, Uri Silberstein IBM Haifa Research.

IBM Labs in Haifa

© 2004 IBM Corporation14/30 OOPSLA 2004: Efficient Parallel Heap Compaction

Properties of the Move Phase

Almost all objects are condensed to the bottom of the heap. Order of objects is essentially preserved. Good parallelism with almost no contention.

Small areas provide better load balancing. No hit on compaction quality.

Sensitivity of performance and compaction quality to area size is low.

Page 15: IBM Labs in Haifa © 2004 IBM Corporation An Efficient Parallel Heap Compaction Diab Abuaiadh, Yoav Ossia, Erez Petrank, Uri Silberstein IBM Haifa Research.

IBM Labs in Haifa

© 2004 IBM Corporation15/30 OOPSLA 2004: Efficient Parallel Heap Compaction

Area Size Tradeoff

“Holes” in the Heap

Preserve allocation order

Load balancing

Oversized areas ☺ “Normal” size

☺ ☺ ☺Areas too small

☺ ☺

Page 16: IBM Labs in Haifa © 2004 IBM Corporation An Efficient Parallel Heap Compaction Diab Abuaiadh, Yoav Ossia, Erez Petrank, Uri Silberstein IBM Haifa Research.

IBM Labs in Haifa

© 2004 IBM Corporation16/30 OOPSLA 2004: Efficient Parallel Heap Compaction

Phase 2: Fix up

The second task is to update all pointers to reference the new locations.

We divide the heap to n areas (possibly not the same n). Each thread fixes up pointers in one area at a time.

Remember: Information is recorded during the move phase to allow redirecting the pointers in the second phase.

Page 17: IBM Labs in Haifa © 2004 IBM Corporation An Efficient Parallel Heap Compaction Diab Abuaiadh, Yoav Ossia, Erez Petrank, Uri Silberstein IBM Haifa Research.

IBM Labs in Haifa

© 2004 IBM Corporation17/30 OOPSLA 2004: Efficient Parallel Heap Compaction

Fix up main idea

We see the heap as a sequence of blocks (say, block = 256 bytes)

We record information per block rather than per object. Objects in a block are moved together and we do not

allow objects of different blocks to be interleaved. An object belong to a block according to the starting

address of the object

The idea: instead of recording information per object, we record less information per block, but perform more computation during fix up of each reference

Page 18: IBM Labs in Haifa © 2004 IBM Corporation An Efficient Parallel Heap Compaction Diab Abuaiadh, Yoav Ossia, Erez Petrank, Uri Silberstein IBM Haifa Research.

IBM Labs in Haifa

© 2004 IBM Corporation18/30 OOPSLA 2004: Efficient Parallel Heap Compaction

Recorded Information

Blocks << areas (blocks are not divided between areas) An object’s block is determined by its starting address. For each block we record information on where the objects

in the blocks were moved to. Pointer to the new location of the first object in the block Relative distance between the other objects and the first

object before and after the move.

Page 19: IBM Labs in Haifa © 2004 IBM Corporation An Efficient Parallel Heap Compaction Diab Abuaiadh, Yoav Ossia, Erez Petrank, Uri Silberstein IBM Haifa Research.

IBM Labs in Haifa

© 2004 IBM Corporation19/30 OOPSLA 2004: Efficient Parallel Heap Compaction

Recorded Information Details

Block table: For each block --- pointer to the new location of the first

object in the block

Two bit maps One bit stands for 8 bytes in the heap

(due to 8-byte alignment of objects). Old bitmap represents objects before the move (created

while marking live objects) New bitmap represents objects after the move (created

while moving the objects).

Page 20: IBM Labs in Haifa © 2004 IBM Corporation An Efficient Parallel Heap Compaction Diab Abuaiadh, Yoav Ossia, Erez Petrank, Uri Silberstein IBM Haifa Research.

IBM Labs in Haifa

© 2004 IBM Corporation20/30 OOPSLA 2004: Efficient Parallel Heap Compaction

Calculating a New Location

Given an old address of an object A: Find A’s block Using the block table, obtain the new address (B) of the first

object in the block. Using the old bitmap: find the ordinal number (i) of the

object in the block. Using the new bitmap: find the relative new location (r) of

the i-th object in the block. Add B+r to obtain the new location.

Page 21: IBM Labs in Haifa © 2004 IBM Corporation An Efficient Parallel Heap Compaction Diab Abuaiadh, Yoav Ossia, Erez Petrank, Uri Silberstein IBM Haifa Research.

IBM Labs in Haifa

© 2004 IBM Corporation21/30 OOPSLA 2004: Efficient Parallel Heap Compaction

Example

Calculating the new location of object C. Old bitmap C is third in block (i=3) New bitmap relative address of C (to A) (r = 0x18) Block table new address of A = 0x58296200 A + r = new location = 0x58296218

A B C D

A B C D

Page 22: IBM Labs in Haifa © 2004 IBM Corporation An Efficient Parallel Heap Compaction Diab Abuaiadh, Yoav Ossia, Erez Petrank, Uri Silberstein IBM Haifa Research.

IBM Labs in Haifa

© 2004 IBM Corporation22/30 OOPSLA 2004: Efficient Parallel Heap Compaction

Space overhead

For each block (say, 256 bytes), A pointer: 4 (or 8 for 64-bits platforms) bytes 2 Bitmaps: 4+4 bytes Overall: 12 (or 16) bytes for each 256 bytes (4.7-6.2%)

We may reuse existing data structures, e.g., the mark bits map that the GC uses.

Other optimizations possible, e.g., depending on the minimum object size and object alignment, one might compress the old bitmap.

Increasing the size of the block: reduces the extra space but increases the computation cost.

Page 23: IBM Labs in Haifa © 2004 IBM Corporation An Efficient Parallel Heap Compaction Diab Abuaiadh, Yoav Ossia, Erez Petrank, Uri Silberstein IBM Haifa Research.

IBM Labs in Haifa

© 2004 IBM Corporation23/30 OOPSLA 2004: Efficient Parallel Heap Compaction

Measurements

We compared: Threaded algorithm Restricted parallel algorithm (to a single thread) Fully parallel algorithm

Platform: AIX (on 8-way PPC, 64 bits) and NT (on 4-way Pentium, 32 bits)

Benchmarks: Specjbb2000 and Trade 3 on Websphere. The new algorithm was compared with the threaded

algorithm previously implemented on IBM’s JVM. Heap size: determined so that live objects take 60% of the

heap: 600MB for SPECjbb and 180MB for Trade3.

Page 24: IBM Labs in Haifa © 2004 IBM Corporation An Efficient Parallel Heap Compaction Diab Abuaiadh, Yoav Ossia, Erez Petrank, Uri Silberstein IBM Haifa Research.

IBM Labs in Haifa

© 2004 IBM Corporation24/30 OOPSLA 2004: Efficient Parallel Heap Compaction

Specjbb2000

Testing rules: Average of 5 runs 16 warehouses After 16 warehouses heap is 60% full

Compaction run when a warehouse is added, those (substantial) parts of the run are not considered for the measurements

Thus, scores are not affected by the compaction times. affected by bad compaction quality.

We measure compaction times.

Page 25: IBM Labs in Haifa © 2004 IBM Corporation An Efficient Parallel Heap Compaction Diab Abuaiadh, Yoav Ossia, Erez Petrank, Uri Silberstein IBM Haifa Research.

IBM Labs in Haifa

© 2004 IBM Corporation25/30 OOPSLA 2004: Efficient Parallel Heap Compaction

Results: Throughput (Specjbb2000)

Throughput

0

20000

40000

60000

80000

100000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

warehouse

TP

M

( tho

usa

nd

s

)

parallel-restricted

Threaded

Page 26: IBM Labs in Haifa © 2004 IBM Corporation An Efficient Parallel Heap Compaction Diab Abuaiadh, Yoav Ossia, Erez Petrank, Uri Silberstein IBM Haifa Research.

IBM Labs in Haifa

© 2004 IBM Corporation26/30 OOPSLA 2004: Efficient Parallel Heap Compaction

Results: Compaction Times for (Specjbb2000)

Compaction Time

0

500

1000

1500

2000

2500

3000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Warehouses

time

(ms)

Threaded

parallel-restricted

Page 27: IBM Labs in Haifa © 2004 IBM Corporation An Efficient Parallel Heap Compaction Diab Abuaiadh, Yoav Ossia, Erez Petrank, Uri Silberstein IBM Haifa Research.

IBM Labs in Haifa

© 2004 IBM Corporation27/30 OOPSLA 2004: Efficient Parallel Heap Compaction

Results: Speedup (Specjbb2000)

Speedup

012345678

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16Warehouses

Sp

eed

up

facto

r

8 compacting threads 6 compacting threads

4 compacting threads 2 compacting threads

Page 28: IBM Labs in Haifa © 2004 IBM Corporation An Efficient Parallel Heap Compaction Diab Abuaiadh, Yoav Ossia, Erez Petrank, Uri Silberstein IBM Haifa Research.

IBM Labs in Haifa

© 2004 IBM Corporation28/30 OOPSLA 2004: Efficient Parallel Heap Compaction

Results: Trade3 (Websphere)

4-way NT machine Heap size: 180MB Additional test: we forced compaction each 20gc

Compaction type Compaction time #Requests per second

≈ 90 gc 20gcdefault

≈ 90 gc 20gc

default

Threaded

Parallel-restricted

Parallel

1698 1671

1387 1251

499 440

219.8 224.5

221.7 226.1

222.4 229.1

Page 29: IBM Labs in Haifa © 2004 IBM Corporation An Efficient Parallel Heap Compaction Diab Abuaiadh, Yoav Ossia, Erez Petrank, Uri Silberstein IBM Haifa Research.

IBM Labs in Haifa

© 2004 IBM Corporation29/30 OOPSLA 2004: Efficient Parallel Heap Compaction

Conclusion

We presented a new compaction algorithm which is Faster than the previously used threaded algorithm even

on a uniprocessor. Efficient and has an excellent speedup

11 times faster on an 8-way machine

Excellent compaction quality. Algorithm incorporated into the IBM production JVM.

With this efficient algorithm, compaction can be triggered more often to increase throughput !