©2011 Azul Systems, Inc.
Speeding up Enterprise Search
withIn-Memory Indexing
Gil Tene, CTO & co-Founder, Azul Systems
©2011 Azul Systems, Inc.
This Talk’s Purpose / Goals
Describe how in-memory indexing can help enterprise search
Explain why in-memory and large memory use has been delayed
Show how in-memory indexing is now (finally) practical for interactive search applications
©2011 Azul Systems, Inc.
About me: Gil Tene
co-founder, CTO @Azul Systems
Have been working on a “think different” GC approaches since 2002
Created Pauseless & C4 core GC algorithms (Tene, Wolf)
A Long history building Virtual & Physical Machines, Operating Systems, Enterprise apps, etc... * working on real-world trash compaction issues, circa 2004
©2011 Azul Systems, Inc.
About Azul
We make scalable Virtual Machines
Have built “whatever it takes to get job done” since 2002
3 generations of custom SMP Multi-core HW (Vega)
Now Pure software for commodity x86 (Zing)
“Industry firsts” in Garbage collection, elastic memory, Java virtualization, memory scale
Vega
C4
©2012 Azul Systems, Inc.
About Azul Supporting mission-critical deployments around
©2011 Azul Systems, Inc.
High level agenda
How can in-memory help things?
Why isn’t everyone just using it?
The “Application Memory Wall” problem
Some Garbage Collection banter
What an actual solution looks like...
©2011 Azul Systems, Inc.
Memory use How many of you use heap sizes of:
F more than ½ GB?
F more than 1 GB?
F more than 2 GB?
F more than 4 GB?
F more than 10 GB?
F more than 20 GB?
F more than 50 GB?
©2011 Azul Systems, Inc.
Why is In-Memory indexing interesting?
©2011 Azul Systems, Inc.
Interesting things happen whenan entire problem fits in a single heap
No more out-of-process access for “hot” stuff
~1 microsecond instead of ~1 millisecond access
10s of GB/sec instead of 100s of MB/sec rates
Work eliminationNo more reading/parsing/deserializing per access
No network or I/O load
These can amount to dramatic speed benefits
©2011 Azul Systems, Inc.
Many business problems can now comfortably fit in server memory
Product Catalogs are often in the 2-10GB size range
Overnight reports are often done on 10-20GB data sets
All of English Language Wikipedia text is ~29GB
Indexing “bloats” things, but not by that much...
A full index of English Language Wikipedia (including stored fields and term vectors fits for highlighting) fits in ~78GB.
You’ll need even more than that in heap size...
100%
CPU%
Heap sizeLive set
Heap size vs. GC CPU %
©2011 Azul Systems, Inc.
Case in point:The impact of in-memory index in Lucene
Lucene provides multiple index representation optionsSimpleFSDirectory, NIOFSDirectory (On-disk file-based)
MMapDirectory (mapped files, better leverage of OS cache memory)
RAMDirectory (In-process, in-heap)
Comparing throughput (e.g. on English Lang. Wikipedia)RAMDirectory ~2-3x throughput vs. MMapDirectory
Similar impact on average response.
Even higher impact with promising things in Lucene 4.0
But average is not all that mattersThere is a good reason people aren’t doing this (yet...)
©2011 Azul Systems, Inc.
Why isn’t all search done within-memory indexes already?
©2011 Azul Systems, Inc.
The “Application Memory Wall”
©2011 Azul Systems, Inc.
Memory use How many of you use heap sizes of:
F more than ½ GB?
F more than 1 GB?
F more than 2 GB?
F more than 4 GB?
F more than 10 GB?
F more than 20 GB?
F more than 50 GB?
Reality check: servers in 2012Retail prices, major web server store (US $, Oct 2012)
Cheap (< $1/GB/Month), and roughly linear to ~1TB
10s to 100s of GB/sec of memory bandwidth
24 vCore, 128GB server ≈ $5K
24 vCore, 256GB server ≈ $8K
32 vCore, 384GB server ≈ $14K
48 vCore, 512GB server ≈ $19K
64 vCore, 1TB server ≈ $36K
©2011 Azul Systems, Inc.
The Application Memory WallA simple observation:
Application instances appear to be unable to make effective use of modern server memory capacities
The size of application instances as a % of a server’s capacity is rapidly dropping
©2011 Azul Systems, Inc.
How much memory do applications need?
“640KB ought to be enough for anybody”
WRONG!
So what’s the right number?6,400K?64,000K?640,000K?6,400,000K?64,000,000K?
There is no right number
Target moves at 50x-100x per decade
“I've said some stupid things and some wrong things, but not that. No one involved in computers would ever say that a certain amount of memory is enough for all time …” - Bill Gates, 1996
©2011 Azul Systems, Inc.
“Tiny” application history
100KB apps on a ¼ to ½ MB Server
10MB apps on a 32 – 64 MB server
1GB apps on a 2 – 4 GB server
??? GB apps on 256 GBAssuming Moore’s Law means:
“transistor counts grow at ≈2x every ≈18 months”
It also means memory size grows ≈100x every 10 years
2010
2000
1990
1980
“Tiny”: would be “silly” to distribute
Application Memory Wall
©2011 Azul Systems, Inc.
What is causing theApplication Memory Wall?
Garbage Collection is a clear and dominant cause
There seem to be practical heap size limits for applications with responsiveness requirements
[Virtually] All current commercial JVMs will exhibit a multi-second pause on a normally utilized 2-6GB heap.
It’s a question of “When” and “How often”, not “If”.
GC tuning only moves the “when” and the “how often” around
Root cause: The link between scale and responsiveness
What quality of GC is responsiblefor the Application Memory Wall?
It is NOT about overhead or efficiency:CPU utilization, bottlenecks, memory consumption and utilization
It is NOT about speedAverage speeds, 90%, even 95% speeds, are all perfectly fine
It is NOT about minor GC events (right now)GC events in the 10s of msec are usually tolerable for most apps
It is NOT about the frequency of very large pauses
It is ALL about the worst observable pause behavior
People avoid building/deploying visibly broken systems
©2011 Azul Systems, Inc.
GC Problems
©2011 Azul Systems, Inc.
Framing the discussion:Garbage Collection at modern server scales
Modern Servers have 100s of GB of memory
Each modern x86 core (when actually used) produces garbage at a rate of ¼ - ½ GB/sec +
We want to make use of that memory and throughput in responsive applications
Monolithic stop-the-world operations are the cause of the current Application Memory Wall
Even if they are done “only a few times a day”
One way to deal with Monolithic-STW GC
0"
2000"
4000"
6000"
8000"
10000"
12000"
14000"
16000"
18000"
0" 2000" 4000" 6000" 8000" 10000"
Hiccup
&Dura*
on&(m
sec)&
&Elapsed&Time&(sec)&
Hiccups&by&Time&Interval&
Max"per"Interval" 99%" 99.90%" 99.99%" Max"
0%" 90%" 99%" 99.9%" 99.99%" 99.999%" 99.9999%"
Max=16023.552&
0"
2000"
4000"
6000"
8000"
10000"
12000"
14000"
16000"
18000"
Hiccup
&Dura*
on&(m
sec)&
&&
Percen*le&
Hiccups&by&Percen*le&Distribu*on&
0"
2000"
4000"
6000"
8000"
10000"
12000"
14000"
16000"
18000"
0" 2000" 4000" 6000" 8000" 10000"
Hiccup
&Dura*
on&(m
sec)&
&Elapsed&Time&(sec)&
Hiccups&by&Time&Interval&
Max"per"Interval" 99%" 99.90%" 99.99%" Max"
0%" 90%" 99%" 99.9%" 99.99%" 99.999%" 99.9999%"
Max=16023.552&
0"
2000"
4000"
6000"
8000"
10000"
12000"
14000"
16000"
18000"
Hiccup
&Dura*
on&(m
sec)&
&&
Percen*le&
Hiccups&by&Percen*le&Distribu*on&
©2012 Azul Systems, Inc.
Another way to cope: “Creative Language”
“Guarantee a worst case of X msec, 99% of the time”
“Mostly” Concurrent, “Mostly” IncrementalTranslation: “Will at times exhibit long monolithic stop-the-world pauses”
“Fairly Consistent”Translation: “Will sometimes show results well outside this range”
“Typical pauses in the tens of milliseconds”Translation: “Some pauses are much longer than tens of milliseconds”
©2012 Azul Systems, Inc.
Actually measuring things
©2012 Azul Systems, Inc.
Incontinuities in Java platform execution
0"
200"
400"
600"
800"
1000"
1200"
1400"
1600"
1800"
0" 200" 400" 600" 800" 1000" 1200" 1400" 1600" 1800"
Hiccup
&Dura*
on&(m
sec)&
&Elapsed&Time&(sec)&
Hiccups"by"Time"Interval"
Max"per"Interval" 99%" 99.90%" 99.99%" Max"
0%" 90%" 99%" 99.9%" 99.99%" 99.999%"
Max=1665.024&
0"
200"
400"
600"
800"
1000"
1200"
1400"
1600"
1800"
Hiccup
&Dura*
on&(m
sec)&
&&
Percen*le&
Hiccups"by"Percen@le"Distribu@on"
©2011 Azul Systems, Inc.
But reality is simple
People deal with GC problems primarily by keeping heap sizes artificially small.
Keeping heaps small has translated to avoiding the use of in-process, in-memory contents
That needs to change...
©2011 Azul Systems, Inc.
Solving the problems behind theApplication Memory Wall
©2011 Azul Systems, Inc.
We needed to solve the right problems
Focus on the causes of the Application Memory WallScale is artificially limited by responsiveness
Responsiveness must be unlinked from scale:Data size, Throughput, Queries per second, Concurrent Users, ...Heap size, Live Set size, Allocation rate, Mutation rate, ...Responsiveness must be continually sustainableCan’t ignore “rare” events
Eliminate all Stop-The-World FallbacksAt modern server scales, any Stop-The-World fallback is a failure
©2011 Azul Systems, Inc.
The GC problems that needed solving
Robust Concurrent MarkingIn the presence of high throughput (mutation and allocation rates)
Compaction that is not stop-the-world E.g. stay responsive while compacting 100+GB heapsMust be robust: not just a tactic to delay STW compaction
Young-Gen that is not stop-the-world Stay responsive while promoting multi-GB data spikesSurprisingly little work done in this specific area
©2011 Azul Systems, Inc.
Azul’s “C4” Collector Continuously Concurrent Compacting Collector
Concurrent guaranteed-single-pass markerOblivious to mutation rateConcurrent ref (weak, soft, final) processing
Concurrent CompactorObjects moved without stopping mutatorReferences remapped without stopping mutatorCan relocate entire generation (New, Old) in every GC cycle
Concurrent, compacting old generation
Concurrent, compacting new generation
No stop-the-world fallbackAlways compacts, and always does so concurrently
©2011 Azul Systems, Inc.
Benefits
©2011 Azul Systems, Inc.
Sample responsiveness behavior
๏ SpecJBB + Slow churning 2GB LRU Cache๏ Live set is ~2.5GB across all measurements๏ Allocation rate is ~1.2GB/sec across all measurements
©2012 Azul Systems, Inc.
A JVM for Linux/x86 servers
ELIMINATES Garbage Collection as a concern for enterprise applications
Decouples scale metrics from response time concerns
Transaction rate, data set size, concurrent users, heap size, allocation rate, mutation rate, etc.
Leverages elastic memory for resilient operation
©2011 Azul Systems, Inc.
Sustainable Throughput:The throughput achieved while safely maintaining service levels
UnsustainableThroughout
©2011 Azul Systems, Inc.
Instance capacity test: “Fat Portal”HotSpot CMS: Peaks at ~ 3GB / 45 concurrent users
* LifeRay portal on JBoss @ 99.9% SLA of 5 second response times
©2012 Azul Systems, Inc.
Instance capacity test: “Fat Portal”C4: still smooth @ 800 concurrent users
©2011 Azul Systems, Inc.
Lucene: English Language Wikipedia
RAMDirectory based index
©2012 Azul Systems, Inc.
Fun with jHiccup
©2012 Azul Systems, Inc.
Oracle HotSpot CMS, 1GB in an 8GB heap
0"
2000"
4000"
6000"
8000"
10000"
12000"
14000"
0" 500" 1000" 1500" 2000" 2500" 3000" 3500"
Hiccup
&Dura*
on&(m
sec)&
&Elapsed&Time&(sec)&
Hiccups&by&Time&Interval&
Max"per"Interval" 99%" 99.90%" 99.99%" Max"
0%" 90%" 99%" 99.9%" 99.99%" 99.999%"
Max=13156.352&
0"
2000"
4000"
6000"
8000"
10000"
12000"
14000"
Hiccup
&Dura*
on&(m
sec)&
&&
Percen*le&
Hiccups&by&Percen*le&Distribu*on&
Zing 5, 1GB in an 8GB heap
0"
5"
10"
15"
20"
25"
0" 500" 1000" 1500" 2000" 2500" 3000" 3500"
Hiccup
&Dura*
on&(m
sec)&
&Elapsed&Time&(sec)&
Hiccups&by&Time&Interval&
Max"per"Interval" 99%" 99.90%" 99.99%" Max"
0%" 90%" 99%" 99.9%" 99.99%" 99.999%" 99.9999%"
Max=20.384&
0"
5"
10"
15"
20"
25"
Hiccup
&Dura*
on&(m
sec)&
&&
Percen*le&
Hiccups&by&Percen*le&Distribu*on&
©2012 Azul Systems, Inc.
Oracle HotSpot CMS, 1GB in an 8GB heap
0"
2000"
4000"
6000"
8000"
10000"
12000"
14000"
0" 500" 1000" 1500" 2000" 2500" 3000" 3500"
Hiccup
&Dura*
on&(m
sec)&
&Elapsed&Time&(sec)&
Hiccups&by&Time&Interval&
Max"per"Interval" 99%" 99.90%" 99.99%" Max"
0%" 90%" 99%" 99.9%" 99.99%" 99.999%"
Max=13156.352&
0"
2000"
4000"
6000"
8000"
10000"
12000"
14000"
Hiccup
&Dura*
on&(m
sec)&
&&
Percen*le&
Hiccups&by&Percen*le&Distribu*on&
Zing 5, 1GB in an 8GB heap
0"
2000"
4000"
6000"
8000"
10000"
12000"
14000"
0" 500" 1000" 1500" 2000" 2500" 3000" 3500"
Hiccup
&Dura*
on&(m
sec)&
&Elapsed&Time&(sec)&
Hiccups&by&Time&Interval&
Max"per"Interval" 99%" 99.90%" 99.99%" Max"
0%" 90%" 99%" 99.9%" 99.99%" 99.999%" 99.9999%"Max=20.384&0"
2000"
4000"
6000"
8000"
10000"
12000"
14000"
Hiccup
&Dura*
on&(m
sec)&
&&
Percen*le&
Hiccups&by&Percen*le&Distribu*on&
Drawn to scale
©2011 Azul Systems, Inc.
Summary:In-memory indexing and search
Interesting content can now fit in memory
In-memory indexing makes things go fast
In-memory indexing makes new things possibleChange Batch to Interactive. Change business processes.
It is finally practical to use in-memory indexing for interactive search applications
Zing makes this work...
©2011 Azul Systems, Inc.
Q & ALucene In-RAM English Lang. Wikipedia test: http://blog.mikemccandless.com/2012/07/lucene-index-in-ram-with-azuls-zing-jvm.html
jHiccup: http://www.azulsystems.com/dev_resources/jhiccup
GC :G. Tene, B. Iyengar and M. WolfC4: The Continuously Concurrent Compacting CollectorIn Proceedings of ISMM’11, ACM, pages 79-88
Top Related