380C lecture 19 Where are we & where we are going –Managed languages Dynamic compilation Inlining...

36
380C lecture 19 Where are we & where we are going Managed languages Dynamic compilation • Inlining Garbage collection Opportunity to improve data locality on-the-fly Other opportunities? Why you need to care about workloads Alias analysis Dependence analysis Loop transformations EDGE architectures 1 CS380C Lecture 19

Transcript of 380C lecture 19 Where are we & where we are going –Managed languages Dynamic compilation Inlining...

Page 1: 380C lecture 19 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Opportunity to improve data locality.

380C lecture 19

• Where are we & where we are going– Managed languages

• Dynamic compilation• Inlining• Garbage collection

– Opportunity to improve data locality on-the-fly– Other opportunities?

– Why you need to care about workloads– Alias analysis– Dependence analysis– Loop transformations– EDGE architectures

1CS380C Lecture 19

Page 2: 380C lecture 19 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Opportunity to improve data locality.

2

Garbage Collection Advantage:

Improving Program Locality

Xianglong Huang (UT)Stephen M Blackburn (ANU), Kathryn S McKinley (UT)

J Eliot B Moss (UMass), Zhenlin Wang (MTU), Perry Cheng (IBM)

CS380C Lecture 19

Page 3: 380C lecture 19 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Opportunity to improve data locality.

3

Today: Advanced Topics

• Generational Garbage Collection• Copying objects is an opportunity

• Xianglong Huang (UT), Stephen M Blackburn (ANU), Kathryn S McKinley (UT), J Eliot B Moss (UMass), Zhenlin Wang (MTU), Perry Cheng (IBM), “The Garbage Collection Advantage: Improving Program Locality,” OOPSLA 2004.

CS380C Lecture 19

Page 4: 380C lecture 19 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Opportunity to improve data locality.

4

Motivation

• Memory gap problem• OO programs become more popular• OO programs exacerbates memory gap

problem– Automatic memory management– Pointer data structures– Many small methods

Goal: improve OO program locality

CS380C Lecture 19

Page 5: 380C lecture 19 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Opportunity to improve data locality.

5

Allocation Mechanisms

Fast (increment & bounds check)

contemporaneous object locality

Can't incrementally free & reuse: must free en masse

Bump-Pointer

CS380C Lecture 19

Page 6: 380C lecture 19 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Opportunity to improve data locality.

6

Allocation Mechanisms

Fast (increment & bounds check)

contemporaneous object locality

Can't incrementally free & reuse: must free en masse

Bump-Pointer

CS380C Lecture 19

Page 7: 380C lecture 19 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Opportunity to improve data locality.

7

Allocation Mechanisms

Fast (increment & bounds check)

contemporaneous object locality

Can't incrementally free & reuse: must free en masse

Bump-Pointer Free-List

Slightly slower (consult list for fit) Mystery locality

Can incrementally free & reuse cells

CS380C Lecture 19

Page 8: 380C lecture 19 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Opportunity to improve data locality.

8

State-of-the-art throughput Copying Generational GC

• Requirements– write-barrier to track inter-generation pointers

• remsets, cards– copy reserve

• Advantages:– Minimizes copying of older objects– Compaction of long-lived objects

• Problems:– Not very incremental– Very youngest objects always copied– What order should GC use to copy objects?

etc. etc …

‘nursery’ ‘older generation’

CS380C Lecture 19

Page 9: 380C lecture 19 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Opportunity to improve data locality.

9

Opportunity

• Generational copying garbage collector reorders objects at runtime

CS380C Lecture 19

Page 10: 380C lecture 19 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Opportunity to improve data locality.

10

1

4

65

7

2 3

Copying of Linked Objects

BreadthFirst

65

7

432

1

CS380C Lecture 19

Page 11: 380C lecture 19 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Opportunity to improve data locality.

11

71 2 3 4 5 6

1

4

65

7

2 3

Copying of Linked Objects

65

7

432

1

BreadthFirst

DepthFirst

CS380C Lecture 19

Page 12: 380C lecture 19 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Opportunity to improve data locality.

12

71 2 3 4 5 6

Copying of Linked Objects

DepthFirst

OnlineObjectReordering

1 4BreadthFirst

61 2 3 4 75

1

4

65

7

2 3

65

7

432

1

41

CS380C Lecture 19

Page 13: 380C lecture 19 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Opportunity to improve data locality.

13

Outline

• Motivation• Online Object Reordering

(OOR)• Methodology• Experimental Results• Conclusion

CS380C Lecture 19

Page 14: 380C lecture 19 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Opportunity to improve data locality.

14

Cache Performance Matters

_213_javac

05

10152025303540

8K DL1, 8K IL1, 128K L2Perfect L2 Perfect IL1, Perfect DL1Total Cycles (in billions)

CS380C Lecture 19

Page 15: 380C lecture 19 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Opportunity to improve data locality.

15

Online Object Reordering

• Where are the cache misses?• How to identify hot field accesses

at runtime?• How to reorder the objects?

CS380C Lecture 19

Page 16: 380C lecture 19 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Opportunity to improve data locality.

16

Where Are The Cache Misses?

VM Objects StackOlder

Generation

• Heap structure:

Nursery

Not to scale

CS380C Lecture 19

Page 17: 380C lecture 19 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Opportunity to improve data locality.

17

Where Are The Cache Misses?

_209_db

0200400600800

100012001400160018002000

VM ObjectsStack Older Gen NurseryTotal Accesses (in millions)

L2 hits

L2 misses

CS380C Lecture 19

Page 18: 380C lecture 19 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Opportunity to improve data locality.

18

Where Are The Cache Misses?

• Two opportunities to reorder objects in the older generation– Promote nursery objects– Full heap collection

CS380C Lecture 19

Page 19: 380C lecture 19 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Opportunity to improve data locality.

19

How to Find Hot Fields?

• Runtime info (intercept every read)?

• Compiler analysis?• Runtime information + compiler

analysis Key: Low overhead estimation

CS380C Lecture 19

Page 20: 380C lecture 19 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Opportunity to improve data locality.

20

Which Classes Need Reordering?

Step 1: Compiler analysis– Excludes cold basic blocks– Identifies field accesses

Step 2: JIT adaptive sampling identifies hot methods– Mark as hot field accesses in hot

methods

Key: Low overhead estimation

CS380C Lecture 19

Page 21: 380C lecture 19 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Opportunity to improve data locality.

21

Example: Compiler Analysis

Compiler

Hot BBCollect access info

Cold BBIgnore

Compiler

Access List:1. A.b2. ….….

Method Foo { Class A a; try { …=a.b; … } catch(Exception e){ …a.c }}

CS380C Lecture 19

Page 22: 380C lecture 19 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Opportunity to improve data locality.

22

Example: Adaptive Sampling

Method Foo { Class A a; try { …=a.b;

… } catch(Exception e){

…a.c }}

Adaptive Sampling

Foo is hot

Foo Accesses:1. A.b2. ….….

A.b is hot

A

B

b…..

c A’s type information

c b

CS380C Lecture 19

Page 23: 380C lecture 19 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Opportunity to improve data locality.

23

1

4

65

7

2 3

Copying of Linked Objects

65

7

43

OnlineObjectReordering

Type Information

143

2

1

Hot space Cold space

CS380C Lecture 19

Page 24: 380C lecture 19 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Opportunity to improve data locality.

24

OOR System Overview

BaselineCompiler

SourceCode

ExecutingCode

AdaptiveSampling Optimizing

Compiler

HotMethods

Access InfoDatabase

Register HotField Accesses

Look Up

AddsEntries

GC: CopiesObjects

Affects Locality

AdviceGC: CopiesObjects

OOR additionJikesRVM componentInput/Output

OptimizingCompiler

AdaptiveSampling

Improves Locality

CS380C Lecture 19

Page 25: 380C lecture 19 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Opportunity to improve data locality.

25

Outline

• Motivation• Online Object Reordering• Methodology• Experimental Results• Conclusion

CS380C Lecture 19

Page 26: 380C lecture 19 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Opportunity to improve data locality.

26

Methodology: Virtual Machine

• Jikes RVM– VM written in Java– High performance– Timer based adaptive sampling – Dynamic optimization

• Experiment setup– Pseudo-adaptive – 2nd iteration [Eeckhout et al.]

CS380C Lecture 19

Page 27: 380C lecture 19 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Opportunity to improve data locality.

27

Methodology: Memory Management

• Memory Management Toolkit (MMTk):– Allocators and garbage collectors– Multi-space heap

• Boot image• Large object space (LOS)• Immortal space

• Experiment setup– Generational copying GC with 4M

bounded nurseryCS380C Lecture 19

Page 28: 380C lecture 19 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Opportunity to improve data locality.

28

Overhead: OOR Analysis Only

Benchmark Base Execution Time (sec)

w/ only OOR Analysis (sec)

Overhead

jess 4.39 4.43 0.84%

jack 5.79 5.82 0.57%

raytrace 4.63 4.61 -0.59%

mtrt 4.95 4.99 0.70%

javac 12.83 12.70 -1.05%

compress 8.56 8.54 0.20%

pseudojbb 13.39 13.43 0.36%

db 18.88 18.88 -0.03%

antlr 0.94 0.91 -2.90%

hsqldb 160.56 158.46 -1.30%

ipsixql 41.62 42.43 1.93%

jython 37.71 37.16 -1.44%

ps-fun 129.24 128.04 -1.03%

Mean -0.19%CS380C Lecture 19

Page 29: 380C lecture 19 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Opportunity to improve data locality.

29

Detailed Experiments

• Separate application and GC time• Vary thresholds for method heat• Vary thresholds for cold basic

blocks• Three architectures

– x86, AMD, PowerPC

• x86 Performance counter: – DL1, trace cache, L2, DTLB, ITLB

CS380C Lecture 19

Page 30: 380C lecture 19 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Opportunity to improve data locality.

30

Performance javac

CS380C Lecture 19

Page 31: 380C lecture 19 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Opportunity to improve data locality.

31

Performance db

CS380C Lecture 19

Page 32: 380C lecture 19 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Opportunity to improve data locality.

32

Performance jython

Any static ordering leaves you vulnerable to pathological cases.

CS380C Lecture 19

Page 33: 380C lecture 19 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Opportunity to improve data locality.

33

Phase Changes

CS380C Lecture 19

Page 34: 380C lecture 19 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Opportunity to improve data locality.

34

Related Work

• Evaluate static orderings [Wilson et al.]– Large performance variation

• Static profiling [Chilimbi et al., and others]– Lack of flexibility

• Instance-based object reordering [Chilimbi et al.]– Too expensive

CS380C Lecture 19

Page 35: 380C lecture 19 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Opportunity to improve data locality.

35

Conclusion

• Static traversal orders have up to 25% variation

• OOR improves or matches best static ordering

• OOR has very low overhead• Past predicts future

CS380C Lecture 19

Page 36: 380C lecture 19 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Opportunity to improve data locality.

380C

• Where are we & where we are going– Managed languages

• Dynamic compilation• Inlining• Garbage collection

– Why you need to care about workloads & methodology

• Read: Blackburn et al., Wake Up and Smell the Coffee: Evaluation Methodology for the 21st Century, ACM CACM, 51(8): 83--89, August, 2008.

– Alias analysis– Dependence analysis– Loop transformations– EDGE architectures

36CS380C Lecture 19