Riemann Hypothesis & Signal Processing Paradigm by Adrian J Rifat.
Taking Off The Gloves With Reference Counting Immix Rifat Shahriyar Xi Yang Stephen M. Blackburn...
-
Upload
jordan-harrington -
Category
Documents
-
view
213 -
download
0
Transcript of Taking Off The Gloves With Reference Counting Immix Rifat Shahriyar Xi Yang Stephen M. Blackburn...
Taking Off The GlovesWith Reference Counting Immix
Rifat ShahriyarXi Yang
Stephen M. BlackburnAustralian National University
Kathryn S. McKinleyMicrosoft Research
5
Why Reference Counting?
Advantages✔ Reclaim as-you-go✔ Object-local✔ Basic RC is easy
Disadvantages✘ Cycles✘ Performance
Series1
40%
9%Tota
l Tim
e v
Pro
du
cti
on
Backup tracing
<2013 2013
OurGoal
7
Looking a Little Deeper…
Mutator time Instructions retired L1D cache misses
9% 9%
32%
6%4%
28%
-2% -3% -2%-3% -3%
1%
RC MS SS Immix
Mu
tato
r v P
rod
ucti
on
Time InstructionsRetired
L1 DCache Misses
9
Looking a Little Deeper…
Mutator time Instructions retired L1D cache misses
9% 9%
32%
6%4%
28%
-2% -3% -2%-3% -3%
1%
RC MS SS Immix
Mu
tato
r v P
rod
ucti
on
Time InstructionsRetired
L1 DCache Misses
Free List
Bump Pointer
12
How RC worksFundamental optimizations
• Backup tracing [Weizenbaum 1969]
– Reclaim cyclic garbage
• Deferral [Deutsch and Bobrow 1976]
– Note changes to stacks & registers occasionally
• Coalescing [Levanoni and Petrank 2001]
– Note only initial and final state of references
13
Deferral[Deutsch and Bobrow 1976, Bacon et al. 2001]
Stacks & Registers
A++F++
B--D++ A--
F--
A B C D FE1 1 1 1 2 1
A--
21 0 2 2
mutator activityGC: scan rootsGC: apply incrementsGC: apply decrementsGC: collectGC: move deferred decsA--F--
++ -- --'
14
Coalescing[Levanoni and Patrank 2001]
B--
Remember A Ignore intermediate mutations
Compare A, Aold
B--, F++
C++C--
D++D--
E++E--
F++
A B C D FE
15
How RC worksRecent Optimizations
• Limited bit count [Shahriyar et al. 2012]
– Use just few bits, fix o/f with backup tracing
• Elision of new object counts [Shahriyar et al. 2012]
– Only do RC work if object survives to first GC
• Allocate as dead [Shahriyar et al. 2012]
– Avoid free-list work for short lived objects
16
How Immix works
0
• Contiguous allocation into regions– 256B lines and 32KB blocks– Objects span lines but not blocks
• Simple mark phase– Mark objects and containing regions
• Free unmarked regions • Recycled allocation and defragmentation
block
line
recyclable linesobject mark line mark
18
Goal & Challenges
• Goal– Object-local pay-as-you-go collection– Excellent mutator locality– Copying to eliminate fragmentation
• Immix provides opportunistic copying ✔Same mutator locality as contiguous allocator
• However, RC is inherently localReferences to an object generally unknown……but copying must redirect all references
19
Contributions
✔Identify heap layout as bottleneck for RC✔Introduce copying RC (RC Immix)
✔Exploit Immix’s opportunistic copy✔Observe new objects can be copied by first GC✔Observe old objects can be copied by backup GC✔Line/block reclamation, header bits
✔Deliver great performance
21
Reference Countingin RC Immix
• Reference count for object• Live object count for line
– Lines ‘born dead’ (zero live object count)– Inc when any object gets first RC increment– Dec when any object is dead
• Collect lines with zero live object count
0 01 3 1 2
11 3 2 1 2200
22
Cycle Collectionin RC Immix
0
• Live object counts zeroed• Trace marks live objects and lines
– Corrects incorrect counts (due to cycles)
• Sweep– Collects unmarked lines– Sweeps dead lines, not dead objects
13 2122 4 0 00 0 2
23
DefragmentationIn RC Immix
• RC is object-local, inhibiting copying• But, RC Immix seizes two opportunities
1. All references to new objects known at first GC2. Backup tracing performs a global trace
• Use opportunistic copying in both cases– Mix copying with in-place RC and marking – Stop copying when available space exhausted
24
Proactive Defragmentation
• Copy surviving new objects (with bounded reserve)
• Optimization, not for correctness– Reserve sized for performance unlike semi-space
• Use past survival rate to predict the future
1 2120 30 41 521 3
25
Reactive Defragmentation
• Backup tracing performs a global trace• Piggyback on this, copy live objects• Use available memory threshold
– If below threshold, do defrag at next cycle GC
27
Hardware, Software & Benchmarks
• 21 benchmarks– DaCapo, SPECjvm98 and pjbb2005
• 20 invocations for each benchmark• Jikes RVM and MMTk
– All garbage collectors are parallel
• Intel Core i7 2600K, 4GB• Ubuntu 10.04.1 LTS
29
Bottom LineGeomean of all benchmarks, versus production
heap size = 2x the minimum heap size
3% improvement over production on geomean
-35%
-30%
-25%
-20%
-15%
-10%
-5%
0%
5%
10%
15%
RC
RC Immix
TotalTime
MutatorTime
GCTime
30
Total TimeBy Benchmark
heap size = 2x the minimum heap size
+5% worst case, -25% best case
-30%
-20%
-10%
0%
10%
20%
30%
40%
RC RC Immix
fast
er
←
Tim
e
→
slo
we
r
jess db
javac
mtr
t
jack
avro
ra
blo
at
chart
ecl
ipse fop
hsq
ldb
jyth
on
luin
dex
luse
arc
hfix
pm
d
sunflow
xala
n
pjb
b2
00
5
com
pre
ss
31
Mutator TimeBy Benchmark
heap size = 2x the minimum heap size
+4% worst case, -10% best case
-15%
-10%
-5%
0%
5%
10%
15%
20%
25%
30%
RC RC Immix
fast
er ←
M
uta
tor
→ s
low
er
jess db
javac
mtr
t
jack
avro
ra
blo
at
chart
ecl
ipse fop
hsq
ldb
jyth
on
luin
dex
luse
arc
hfix
pm
d
sunflow
xala
n
pjb
b2
00
5
com
pre
ss
32
GC TimeBy Benchmark
heap size = 2x the minimum heap size
+5% worst case, -25% best case
-30%
-20%
-10%
0%
10%
20%
30%
40%
RC RC Immix
fast
er
←
GC
→
sl
ow
er
jess db
javac
mtr
t
jack
avro
ra
blo
at
chart
ecl
ipse fop
hsq
ldb
jyth
on
luin
dex
luse
arc
hfix
pm
d
sunflow
xala
n
pjb
b2
00
5
com
pre
ss
33
Total Time v Heap Size
RCImmix matches GenImmix at 1.3x and outperforms from 1.4x
1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 61
1.05
1.1
1.15
1.2
1.25
1.3
1.35
1.4
1.45
1.5
GenImmix RC RC Immix RC Immix (No PC)
Heap Size / Minimum Heap
Tim
e /
Be
st