380C lecture 19 Where are we & where we are going –Managed languages Dynamic compilation Inlining...
-
Upload
janice-noreen-strickland -
Category
Documents
-
view
215 -
download
0
Transcript of 380C lecture 19 Where are we & where we are going –Managed languages Dynamic compilation Inlining...
380C lecture 19
• Where are we & where we are going– Managed languages
• Dynamic compilation• Inlining• Garbage collection
– Opportunity to improve data locality on-the-fly– Other opportunities?
– Why you need to care about workloads– Alias analysis– Dependence analysis– Loop transformations– EDGE architectures
1CS380C Lecture 19
2
Garbage Collection Advantage:
Improving Program Locality
Xianglong Huang (UT)Stephen M Blackburn (ANU), Kathryn S McKinley (UT)
J Eliot B Moss (UMass), Zhenlin Wang (MTU), Perry Cheng (IBM)
CS380C Lecture 19
3
Today: Advanced Topics
• Generational Garbage Collection• Copying objects is an opportunity
• Xianglong Huang (UT), Stephen M Blackburn (ANU), Kathryn S McKinley (UT), J Eliot B Moss (UMass), Zhenlin Wang (MTU), Perry Cheng (IBM), “The Garbage Collection Advantage: Improving Program Locality,” OOPSLA 2004.
CS380C Lecture 19
4
Motivation
• Memory gap problem• OO programs become more popular• OO programs exacerbates memory gap
problem– Automatic memory management– Pointer data structures– Many small methods
Goal: improve OO program locality
CS380C Lecture 19
5
Allocation Mechanisms
Fast (increment & bounds check)
contemporaneous object locality
Can't incrementally free & reuse: must free en masse
Bump-Pointer
CS380C Lecture 19
6
Allocation Mechanisms
Fast (increment & bounds check)
contemporaneous object locality
Can't incrementally free & reuse: must free en masse
Bump-Pointer
CS380C Lecture 19
7
Allocation Mechanisms
Fast (increment & bounds check)
contemporaneous object locality
Can't incrementally free & reuse: must free en masse
Bump-Pointer Free-List
Slightly slower (consult list for fit) Mystery locality
Can incrementally free & reuse cells
CS380C Lecture 19
8
State-of-the-art throughput Copying Generational GC
• Requirements– write-barrier to track inter-generation pointers
• remsets, cards– copy reserve
• Advantages:– Minimizes copying of older objects– Compaction of long-lived objects
• Problems:– Not very incremental– Very youngest objects always copied– What order should GC use to copy objects?
etc. etc …
‘nursery’ ‘older generation’
CS380C Lecture 19
9
Opportunity
• Generational copying garbage collector reorders objects at runtime
CS380C Lecture 19
10
1
4
65
7
2 3
Copying of Linked Objects
BreadthFirst
65
7
432
1
CS380C Lecture 19
11
71 2 3 4 5 6
1
4
65
7
2 3
Copying of Linked Objects
65
7
432
1
BreadthFirst
DepthFirst
CS380C Lecture 19
12
71 2 3 4 5 6
Copying of Linked Objects
DepthFirst
OnlineObjectReordering
1 4BreadthFirst
61 2 3 4 75
1
4
65
7
2 3
65
7
432
1
41
CS380C Lecture 19
13
Outline
• Motivation• Online Object Reordering
(OOR)• Methodology• Experimental Results• Conclusion
CS380C Lecture 19
14
Cache Performance Matters
_213_javac
05
10152025303540
8K DL1, 8K IL1, 128K L2Perfect L2 Perfect IL1, Perfect DL1Total Cycles (in billions)
CS380C Lecture 19
15
Online Object Reordering
• Where are the cache misses?• How to identify hot field accesses
at runtime?• How to reorder the objects?
CS380C Lecture 19
16
Where Are The Cache Misses?
VM Objects StackOlder
Generation
• Heap structure:
Nursery
Not to scale
CS380C Lecture 19
17
Where Are The Cache Misses?
_209_db
0200400600800
100012001400160018002000
VM ObjectsStack Older Gen NurseryTotal Accesses (in millions)
L2 hits
L2 misses
CS380C Lecture 19
18
Where Are The Cache Misses?
• Two opportunities to reorder objects in the older generation– Promote nursery objects– Full heap collection
CS380C Lecture 19
19
How to Find Hot Fields?
• Runtime info (intercept every read)?
• Compiler analysis?• Runtime information + compiler
analysis Key: Low overhead estimation
CS380C Lecture 19
20
Which Classes Need Reordering?
Step 1: Compiler analysis– Excludes cold basic blocks– Identifies field accesses
Step 2: JIT adaptive sampling identifies hot methods– Mark as hot field accesses in hot
methods
Key: Low overhead estimation
CS380C Lecture 19
21
Example: Compiler Analysis
Compiler
Hot BBCollect access info
Cold BBIgnore
Compiler
Access List:1. A.b2. ….….
Method Foo { Class A a; try { …=a.b; … } catch(Exception e){ …a.c }}
CS380C Lecture 19
22
Example: Adaptive Sampling
Method Foo { Class A a; try { …=a.b;
… } catch(Exception e){
…a.c }}
Adaptive Sampling
Foo is hot
Foo Accesses:1. A.b2. ….….
A.b is hot
A
B
b…..
c A’s type information
c b
CS380C Lecture 19
23
1
4
65
7
2 3
Copying of Linked Objects
65
7
43
OnlineObjectReordering
Type Information
143
2
1
Hot space Cold space
CS380C Lecture 19
24
OOR System Overview
BaselineCompiler
SourceCode
ExecutingCode
AdaptiveSampling Optimizing
Compiler
HotMethods
Access InfoDatabase
Register HotField Accesses
Look Up
AddsEntries
GC: CopiesObjects
Affects Locality
AdviceGC: CopiesObjects
OOR additionJikesRVM componentInput/Output
OptimizingCompiler
AdaptiveSampling
Improves Locality
CS380C Lecture 19
25
Outline
• Motivation• Online Object Reordering• Methodology• Experimental Results• Conclusion
CS380C Lecture 19
26
Methodology: Virtual Machine
• Jikes RVM– VM written in Java– High performance– Timer based adaptive sampling – Dynamic optimization
• Experiment setup– Pseudo-adaptive – 2nd iteration [Eeckhout et al.]
CS380C Lecture 19
27
Methodology: Memory Management
• Memory Management Toolkit (MMTk):– Allocators and garbage collectors– Multi-space heap
• Boot image• Large object space (LOS)• Immortal space
• Experiment setup– Generational copying GC with 4M
bounded nurseryCS380C Lecture 19
28
Overhead: OOR Analysis Only
Benchmark Base Execution Time (sec)
w/ only OOR Analysis (sec)
Overhead
jess 4.39 4.43 0.84%
jack 5.79 5.82 0.57%
raytrace 4.63 4.61 -0.59%
mtrt 4.95 4.99 0.70%
javac 12.83 12.70 -1.05%
compress 8.56 8.54 0.20%
pseudojbb 13.39 13.43 0.36%
db 18.88 18.88 -0.03%
antlr 0.94 0.91 -2.90%
hsqldb 160.56 158.46 -1.30%
ipsixql 41.62 42.43 1.93%
jython 37.71 37.16 -1.44%
ps-fun 129.24 128.04 -1.03%
Mean -0.19%CS380C Lecture 19
29
Detailed Experiments
• Separate application and GC time• Vary thresholds for method heat• Vary thresholds for cold basic
blocks• Three architectures
– x86, AMD, PowerPC
• x86 Performance counter: – DL1, trace cache, L2, DTLB, ITLB
CS380C Lecture 19
30
Performance javac
CS380C Lecture 19
31
Performance db
CS380C Lecture 19
32
Performance jython
Any static ordering leaves you vulnerable to pathological cases.
CS380C Lecture 19
33
Phase Changes
CS380C Lecture 19
34
Related Work
• Evaluate static orderings [Wilson et al.]– Large performance variation
• Static profiling [Chilimbi et al., and others]– Lack of flexibility
• Instance-based object reordering [Chilimbi et al.]– Too expensive
CS380C Lecture 19
35
Conclusion
• Static traversal orders have up to 25% variation
• OOR improves or matches best static ordering
• OOR has very low overhead• Past predicts future
CS380C Lecture 19
380C
• Where are we & where we are going– Managed languages
• Dynamic compilation• Inlining• Garbage collection
– Why you need to care about workloads & methodology
• Read: Blackburn et al., Wake Up and Smell the Coffee: Evaluation Methodology for the 21st Century, ACM CACM, 51(8): 83--89, August, 2008.
– Alias analysis– Dependence analysis– Loop transformations– EDGE architectures
36CS380C Lecture 19