Large Scale Solr at FullStory: Presented by Scott Blum, FullStory
-
Upload
lucidworks -
Category
Technology
-
view
253 -
download
7
Transcript of Large Scale Solr at FullStory: Presented by Scott Blum, FullStory
O C T O B E R 1 1 - 1 4 , 2 0 1 6 • B O S T O N , M A
Large Scale Solr at FullStory Scott Blum
Staff Software Engineer, FullStory
A little about me• A year ago I barely knew
what Solr was :) • Solr search @FullStory • Committer, Apache Solr and
Apache Curator • contributions to solrcloud/ZK
Topics• A little about FullStory • Challenges & solutions scaling Solr • Ongoing challenges • Advice • Q & A
A little about FullStory• Customer experience platform • “A DVR for your website” • “See what your users see” • Search and analytics built on Solr
Tiny FullStory demo
FullStory’s Solr Cluster• 32 Solr nodes / JVMs / machines • 4000 collections • 6000 cores • 13 billion documents • 10 terabytes of data • Custom Solr 5.5.3 (numerous backports) • Multi-tenant • No replicas
Challenge #1: GC death spiral• Symptom: Solr nodes get slower and slower,
eventually becoming unresponsive. Java OutOfMemoryError in logs.
• Rebooting nodes provided a temporary fix.
Challenge #1: GC death spiral• Initial investigation: we turned on GC logging,
and immediately discovered huge stop-the-world GC pauses.
• Some pauses were long enough (>15s) to disrupt a node’s ZooKeeper connection!
• But why…?
Challenge #1: GC death spiralRendered GC graphs were illumnating
0
1.25
2.5
3.75
5
Challenge #1: GC death spiral• Root cause: Solr caches are per-core • In a multi-tenant environment, caches
will pin more and more memory over time • No global (JVM level) cache coordination
Challenge #1: GC death spiral• “Solution”: every few hours, cron job forces Solr to
reload each core • Clears core-level caches, unpinning heap • Caveat: problematic before Solr 5.4!
• Reloaded cores can be stuck in a permanent ClosedByInterruptException state
• Long term: Solr needs JVM-level cache management
Challenge #2: Solr overseer scaling• Premise: resist running multiple Solr clusters • Why build an additional tooling layer on top of
Solrcloud? • Undermines the shared resource story in a multi-
tenant environment. • More hardware in a single solr cluster allows us to apply
more “instantaneous” resources to a single query.
• Better to tackle Solr scaling issues head on!
Challenge #2: Solr overseer scaling• Prior to Solr 4, global cluster state resided in
a single ZK node (state format v1) • /solr/clusterstate.json
• Solr 5 added a state format v2, which separates each collection’s state into a different ZK node
• But no way to migrate old collections
Challenge #2: Solr overseer scaling• Solution: added a MIGRATESTATEFORMAT collection admin
command (5.4) • Moves collection states from the shared ZK node to
independent ZK nodes
clusterstate.json
c1/state.json
c2/state.json
c3/state.json
Challenge #2: Solr overseer scaling• ZkStateReader rewrite (5.4) • DistributedQueue rewrite (5.4) • LeaderElection bug fixes (5.5.1) • Increase Overseer concurrency (5.5.1) • Numerous bugfixes (5.5.2) • Avoid unnecessary ZK state refreshes (6.1) • Overseer task queue fixes (6.2)
Challenge #3: scaling shard splits• Individual shards get too big and must be split • Larger shards are slower • Queries that were fast at 1M docs might be super
slow at 20M docs • Large shards = more RAM, both heap and OS cache • Bottleneck on the indexing side
Challenge #3: scaling shard splits• TIL: Solr queries are single-thread per core!
• Weird implication: even on a single (multiprocessor) machine, shard splitting increases query performance.
• 8 cores with 1M docs each can use 8 CPUs to serve a query, but a single core with 8M docs only gets one CPU for the same query
solr node with 4 CPUs
shard4shard2
shard1 shard3
We each get our own CPU!
Challenge #3: scaling shard splitsOf course, splitting and then moving to new nodes is the ultimate goal
solr node with 4 CPUs
shard12shard11
shard9 shard10
solr node with 4 CPUs
shard4shard2
shard1 shard3
solr node with 4 CPUs
shard8shard7
shard5 shard6
solr node with 4 CPUs
shard16shard15
shard13 shard14
16 CPUs!
Challenge #3: scaling shard splits• Problem: Shard splits are slow (>10 minutes) • Overseer would not run other operations during a shard split
• We had to be able to run multiple shard splits at once • The shard split apocalypse was coming!
split shard 1
split shard 2
split shard 3
(not good)
Challenge #3: scaling shard splits• Solution: get help! • We hired Lucidworks • Noble Paul built and
contributed fine-grain locking for Overseer operations
Challenge #3: scaling shard splits• New Overseer
concurrency model • Multi-level locks • Cluster (global) • Collection • Shard • Replica
Cluster
Coll 1
Shard 1
Shard 2
r1 r2
r1 r2
Coll 2
Shard 1
Shard 2
r1 r2
r1 r2
Challenge #3: scaling shard splits• Fine grain: each
overseer collection action specifies exactly what it needs to lock
• Higher level locks encompass lower level locks
Cluster
Coll 1
Shard 1
Shard 2
r1 r2
r1 r2
Coll 2
Shard 1
Shard 2
r1 r2
r1 r2
Locks
Challenge #3: scaling shard splits• Result: massively increases overseer concurrency and throughput! • Seamless parallel shard splits solved our scaling problem
• Shipped in 6.1 (but you really need the bugfixes in 6.2) • Thank you Noble Paul and Lucidworks!
split shard 1
split shard 2
split shard 3
split shard 4
(already done!)
Challenge #4: global cluster balancing• Individual Solr nodes get wildly out of balance with each other
node 1 10M docs, 3GB
node 2 4M docs, 1GB
node 3 10M docs, 3GB
node 4 0 docs, 0GB
Challenge #4: global cluster balancing• Individual collections are not evenly distributed across machines
node 1 10M docs, 3GB
node 2 4M docs, 1GB
node 3 10M docs, 3GB
node 4 0 docs, 0GB
Challenge #4: global cluster balancing
• For a long time, we manually managed moves and splits • This is annoying when you have 100 cores • impossible when you have >1000 cores!
Solrman
https://github.com/fullstorydev/gosolr
Automatically balances collections, cores, docs, disk across a Solr cluster
SolrmanBefore…
node 1 10M docs, 3GB
node 2 4M docs, 1GB
node 3 10M docs, 3GB
node 4 0 docs, 0GB
Solrman
node 1 6M docs, 2GB
node 2 6M docs, 2GB
node 3 5M docs, 1.5GB
node 4 6M docs, 2GB
After…
Solrman• Runs continuously, computing an “optimal”
set of operations to balance your cluster • Just point it at your Solr cluster ZK server • Uses the Collections API, Solr does the work • moves (addreplica, deletereplica) • splits (splitshard, deleteshard)
Solrman• Brute force algorithm with a scoring model to
determine how cores should be moved around • A bit computationally expensive!
• Automatic management is defensive, halts if the cluster state seems bad
• Does not support multiple replicas today (in fact, it eliminates them as dupes!)
Solrman• https://github.com/fullstorydev/gosolr • Just open sourced, but we’ve been running in
production for several months • Our 16 -> 32 node expansion was far easier than
our 8 -> 16 node expansion • Just spun up new nodes and solrman did the rest
• Contributions welcome!
Solrman• Alternatively: Rebalance API for Solrcloud • If you’re on the bleeding edge • https://issues.apache.org/jira/browse/
SOLR-9241 • Solrman should work today
Ongoing Challenges• Solr needs JVM-level cache management • Periodic core reloads are a terrible hack :(
Ongoing Challenges• Too many cluster state updates on startup • On Solr 5, >20 state update operations per
core = thousands of operations per Solr node • Starting up the whole cluster at once is now
impossible • Should be better in Solr 6, but more work still
needs to be done
Advice• Upgrade Solr!
• Expect continuing scaling improvements • Contribute! Just identifying bottlenecks & problems is helpful
• Periodic core reloads if GC is a problem (>5.4) • MIGRATESTATEFORMAT on any old collections • Try solrman (fullstorydev/gosolr) node balancing • If all else fails, get help :)
• Questions?