Large Scale Solr at FullStory: Presented by Scott Blum, FullStory

O C T O B E R 1 1 - 1 4 , 2 0 1 6 • B O S T O N , M A

Large Scale Solr at FullStory Scott Blum

Staff Software Engineer, FullStory

A little about me• A year ago I barely knew

what Solr was :) • Solr search @FullStory • Committer, Apache Solr and

Apache Curator • contributions to solrcloud/ZK

Topics• A little about FullStory • Challenges & solutions scaling Solr • Ongoing challenges • Advice • Q & A

A little about FullStory• Customer experience platform • “A DVR for your website” • “See what your users see” • Search and analytics built on Solr

Tiny FullStory demo

FullStory’s Solr Cluster• 32 Solr nodes / JVMs / machines • 4000 collections • 6000 cores • 13 billion documents • 10 terabytes of data • Custom Solr 5.5.3 (numerous backports) • Multi-tenant • No replicas

Challenge #1: GC death spiral• Symptom: Solr nodes get slower and slower,

eventually becoming unresponsive. Java OutOfMemoryError in logs.

• Rebooting nodes provided a temporary fix.

Challenge #1: GC death spiral• Initial investigation: we turned on GC logging,

and immediately discovered huge stop-the-world GC pauses.

• Some pauses were long enough (>15s) to disrupt a node’s ZooKeeper connection!

• But why…?

Challenge #1: GC death spiralRendered GC graphs were illumnating

0

1.25

2.5

3.75

5

Challenge #1: GC death spiral• Root cause: Solr caches are per-core • In a multi-tenant environment, caches

will pin more and more memory over time • No global (JVM level) cache coordination

Challenge #1: GC death spiral• “Solution”: every few hours, cron job forces Solr to

reload each core • Clears core-level caches, unpinning heap • Caveat: problematic before Solr 5.4!

• Reloaded cores can be stuck in a permanent ClosedByInterruptException state

• Long term: Solr needs JVM-level cache management

Challenge #2: Solr overseer scaling• Premise: resist running multiple Solr clusters • Why build an additional tooling layer on top of

Solrcloud? • Undermines the shared resource story in a multi-

tenant environment. • More hardware in a single solr cluster allows us to apply

more “instantaneous” resources to a single query.

• Better to tackle Solr scaling issues head on!

Challenge #2: Solr overseer scaling• Prior to Solr 4, global cluster state resided in

a single ZK node (state format v1) • /solr/clusterstate.json

• Solr 5 added a state format v2, which separates each collection’s state into a different ZK node

• But no way to migrate old collections

Challenge #2: Solr overseer scaling• Solution: added a MIGRATESTATEFORMAT collection admin

command (5.4) • Moves collection states from the shared ZK node to

independent ZK nodes

clusterstate.json

c1/state.json

c2/state.json

c3/state.json

https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-MigrateClusterState

Challenge #2: Solr overseer scaling• ZkStateReader rewrite (5.4) • DistributedQueue rewrite (5.4) • LeaderElection bug fixes (5.5.1) • Increase Overseer concurrency (5.5.1) • Numerous bugfixes (5.5.2) • Avoid unnecessary ZK state refreshes (6.1) • Overseer task queue fixes (6.2)

Challenge #3: scaling shard splits• Individual shards get too big and must be split • Larger shards are slower • Queries that were fast at 1M docs might be super

slow at 20M docs • Large shards = more RAM, both heap and OS cache • Bottleneck on the indexing side

Challenge #3: scaling shard splits• TIL: Solr queries are single-thread per core!

• Weird implication: even on a single (multiprocessor) machine, shard splitting increases query performance.

• 8 cores with 1M docs each can use 8 CPUs to serve a query, but a single core with 8M docs only gets one CPU for the same query

solr node with 4 CPUs

shard4shard2

shard1 shard3

We each get our own CPU!

Challenge #3: scaling shard splitsOf course, splitting and then moving to new nodes is the ultimate goal


shard12shard11

shard9 shard10


shard4shard2

shard1 shard3


shard8shard7

shard5 shard6


shard16shard15

shard13 shard14

16 CPUs!

Challenge #3: scaling shard splits• Problem: Shard splits are slow (>10 minutes) • Overseer would not run other operations during a shard split

• We had to be able to run multiple shard splits at once • The shard split apocalypse was coming!

split shard 1

split shard 2

split shard 3

(not good)

Challenge #3: scaling shard splits• Solution: get help! • We hired Lucidworks • Noble Paul built and

contributed fine-grain locking for Overseer operations

Challenge #3: scaling shard splits• New Overseer

concurrency model • Multi-level locks • Cluster (global) • Collection • Shard • Replica

Cluster

Coll 1

Shard 1

Shard 2

r1 r2

r1 r2

Coll 2

Shard 1

Shard 2

r1 r2

r1 r2

Challenge #3: scaling shard splits• Fine grain: each

overseer collection action specifies exactly what it needs to lock

• Higher level locks encompass lower level locks

Cluster

Coll 1

Shard 1

Shard 2

r1 r2

r1 r2

Coll 2

Shard 1

Shard 2

r1 r2

r1 r2

Locks

Challenge #3: scaling shard splits• Result: massively increases overseer concurrency and throughput! • Seamless parallel shard splits solved our scaling problem

• Shipped in 6.1 (but you really need the bugfixes in 6.2) • Thank you Noble Paul and Lucidworks!

split shard 1

split shard 2

split shard 3

split shard 4

(already done!)

Challenge #4: global cluster balancing• Individual Solr nodes get wildly out of balance with each other

node 1 10M docs, 3GB

node 2 4M docs, 1GB


node 4 0 docs, 0GB

Challenge #4: global cluster balancing• Individual collections are not evenly distributed across machines


node 2 4M docs, 1GB


node 4 0 docs, 0GB

Challenge #4: global cluster balancing

• For a long time, we manually managed moves and splits • This is annoying when you have 100 cores • impossible when you have >1000 cores!

Solrman

https://github.com/fullstorydev/gosolr

Automatically balances collections, cores, docs, disk across a Solr cluster


SolrmanBefore…


node 2 4M docs, 1GB


node 4 0 docs, 0GB

Solrman

node 1 6M docs, 2GB

node 2 6M docs, 2GB

node 3 5M docs, 1.5GB

node 4 6M docs, 2GB

After…

Solrman• Runs continuously, computing an “optimal”

set of operations to balance your cluster • Just point it at your Solr cluster ZK server • Uses the Collections API, Solr does the work • moves (addreplica, deletereplica) • splits (splitshard, deleteshard)

Solrman• Brute force algorithm with a scoring model to

determine how cores should be moved around • A bit computationally expensive!

• Automatic management is defensive, halts if the cluster state seems bad

• Does not support multiple replicas today (in fact, it eliminates them as dupes!)

Solrman• https://github.com/fullstorydev/gosolr • Just open sourced, but we’ve been running in

production for several months • Our 16 -> 32 node expansion was far easier than

our 8 -> 16 node expansion • Just spun up new nodes and solrman did the rest

• Contributions welcome!


Solrman• Alternatively: Rebalance API for Solrcloud • If you’re on the bleeding edge • https://issues.apache.org/jira/browse/

SOLR-9241 • Solrman should work today

https://issues.apache.org/jira/browse/SOLR-9241

Ongoing Challenges• Solr needs JVM-level cache management • Periodic core reloads are a terrible hack :(

Ongoing Challenges• Too many cluster state updates on startup • On Solr 5, >20 state update operations per

core = thousands of operations per Solr node • Starting up the whole cluster at once is now

impossible • Should be better in Solr 6, but more work still

needs to be done

Advice• Upgrade Solr!

• Expect continuing scaling improvements • Contribute! Just identifying bottlenecks & problems is helpful

• Periodic core reloads if GC is a problem (>5.4) • MIGRATESTATEFORMAT on any old collections • Try solrman (fullstorydev/gosolr) node balancing • If all else fails, get help :)

• Questions?

https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-MigrateClusterState


Large Scale Solr at FullStory: Presented by Scott Blum, FullStory

Technology

Transcript of Large Scale Solr at FullStory: Presented by Scott Blum, FullStory