Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu,...

26
Speculative Region- based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu, Brian Demsky Donald Bren School of Information and Computer Sciences

Transcript of Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu,...

Page 1: Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu, Brian Demsky Donald Bren School of Information and Computer.

Speculative Region-based Memory Management for

Big Data Systems

Khanh Nguyen, Lu Fang, Harry Xu, Brian DemskyDonald Bren School of Information and Computer Sciences

Page 2: Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu, Brian Demsky Donald Bren School of Information and Computer.

2

BIG DATA

Page 3: Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu, Brian Demsky Donald Bren School of Information and Computer.

3

BIG DATA

Scalability JVM crashes

due to OutOfMemory error at early stage

Page 4: Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu, Brian Demsky Donald Bren School of Information and Computer.

4

A moderate-size application on Giraph with 1GB input data can easily run out of memory on a 12 GB heap [Bu et al, ISMM’13]

Page 5: Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu, Brian Demsky Donald Bren School of Information and Computer.

5

BIG DATA

Scalability JVM crashes

due to OutOfMemory error at early stage

Management costGC time accounts for

up to 50% of the execution time

[Bu et al, ISMM’13]

High cost of the managed runtime is a fundamental problem!

Page 6: Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu, Brian Demsky Donald Bren School of Information and Computer.

6

Existing Work

• Facade [Nguyen et al, ASPLOS’15]

• Broom [Gog et al, HotOS’15]

This work: Purely dynamic technique

Huge manual effort from developers

Page 7: Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu, Brian Demsky Donald Bren School of Information and Computer.

7

Control Path vs. Data Path

Pipeline construction

Job state management

Perform optimization

Process the actual data

Code size is small (36%)

Create most of the runtime objects (95%)

Pipeline construction

Job state management

Perform optimization

[Bu et al, ISMM’13]

Page 8: Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu, Brian Demsky Donald Bren School of Information and Computer.

8

Execution Pattern

• Data-processing functions are iteration-based

• Each iteration processes a distinct data partition

• Iterations are well-defined

Page 9: Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu, Brian Demsky Donald Bren School of Information and Computer.

9

public interface GraphChiProgram <VertexDataType, EdgeDataType> {

public void update(ChiVertex<VertexDataType, EdgeDataType> vertex, GraphChiContext context);

public void beginIteration(GraphChiContext ctx); public void endIteration(GraphChiContext ctx);

public void beginInterval(GraphChiContext ctx, VertexInterval interval); public void endInterval(GraphChiContext ctx, VertexInterval interval);

public void beginSubInterval(GraphChiContext ctx, VertexInterval interval); public void endSubInterval(GraphChiContext ctx, VertexInterval interval);

}

GraphChi [Kyora et al, OSDI’12]

Page 10: Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu, Brian Demsky Donald Bren School of Information and Computer.

10

Weak Iteration Hypothesis

• Data objects do not escape iteration boundaries– GC run in the middle is wasted

• Control objects do escape iteration boundaries

PageRank – Twitter graph

5% 181 MILLIONS OBJECTS

Page 11: Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu, Brian Demsky Donald Bren School of Information and Computer.

11

Region-based Memory Management

• Region definition• Management:

– Allocation– Deallocation

Page 12: Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu, Brian Demsky Donald Bren School of Information and Computer.

12

Advantages

• Low overheads • Improved data locality• More flexible than stack allocation • No GC burden

Page 13: Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu, Brian Demsky Donald Bren School of Information and Computer.

13

Challenges

• Escaping control objects• Developers are responsible for semantic

correctness

Precise objects lifetime required!

Facadeannotation & refactoring

Broomspecialized API

static analyses?

Page 14: Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu, Brian Demsky Donald Bren School of Information and Computer.

14

Proposed Solution

Speculative Region Allocation

annotate iteration boundary: - iteration_start - iteration_end

Algorithms to guarantee program’s correctness automatically

Page 15: Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu, Brian Demsky Donald Bren School of Information and Computer.

15

Observations

• nested

• executed by multiple threads

iteration_ID, thread_ID

Iterations

Page 16: Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu, Brian Demsky Donald Bren School of Information and Computer.

16

Region Semi-lattice

T,*

1,t1

2,t1

3,t1

1,t2

2,t2

3,t2

heap

region

JOIN OPERATOR

GC never touches regions

void main() {

} //end of main

iteration_startfor( ) {

}iteration_end

iteration_start for( ) {

} iteration_end

iteration_start for( ) {

} iteration_end

Page 17: Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu, Brian Demsky Donald Bren School of Information and Computer.

17

Speculative Region Allocationiteration_start

iteration_start

iteration_end

iteration_end

Parent

Child

Page 18: Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu, Brian Demsky Donald Bren School of Information and Computer.

18

Components of Our Approach

• Speculative region allocation• Track inter-region references

– Update boundary set

• Recycle regions – Boundary set promotion

Page 19: Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu, Brian Demsky Donald Bren School of Information and Computer.

19

Remember Inter-Region References: Case 1

ba

a.f = b

b

x,tiy,ti

boundary set

Page 20: Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu, Brian Demsky Donald Bren School of Information and Computer.

20

Remember Inter-Region References: Case 2

c

a = b.fx,tiy,tj

c

boundary set

bf

Page 21: Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu, Brian Demsky Donald Bren School of Information and Computer.

21

Region Recycling AlgorithmT,*

1,t1

2,t1

3,t1

1,t2

2,t2

3,t2

3,t1

boundary set

JOIN( , ) = 1,t1 2,t1 1,t1

Page 22: Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu, Brian Demsky Donald Bren School of Information and Computer.

22

Region Recycling AlgorithmT,*

1,t1

2,t1

3,t1

1,t2

2,t2

3,t2

3,t1

JOIN( , ) = 2,t1 2,t2 T,*

boundary set

Page 23: Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu, Brian Demsky Donald Bren School of Information and Computer.

23

Region Recycling AlgorithmT,*

1,t1

2,t1

3,t1

1,t2

2,t2

3,t2

3,t1

boundary set

Page 24: Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu, Brian Demsky Donald Bren School of Information and Computer.

24

Handling of Intricacies

• Escape via the stack• Data-race-free object relocation

Details are in the paper

Page 25: Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu, Brian Demsky Donald Bren School of Information and Computer.

25

Conclusions

• Goal: Reduce user’s effort• Solution: Speculative region allocation

– The cost of object promotion is considerable• Can be reduced by adaptively allocating objects:

feedback-directed allocation policy

• Status: In the process of implementing & evaluating in the OpenJDK

Page 26: Speculative Region-based Memory Management for Big Data Systems Khanh Nguyen, Lu Fang, Harry Xu, Brian Demsky Donald Bren School of Information and Computer.

26

Thank you!