Running Neo4j in Production: Tips, Tricks and Optimizations
Click here to load reader
-
Upload
nick-manning -
Category
Technology
-
view
188 -
download
0
Transcript of Running Neo4j in Production: Tips, Tricks and Optimizations
Running Neo4j in Production
Tips, Tricks and Optimizations
This Talk...
● How we scaled our prod graph● Challenges faced doing this● Various lessons we learned and techniques
we used● Some stuff I’m looking forward to in Neo4j
SNAP Interactive
● Presented by David Fox (Big Data Engineer)● Social dating app AYI (Are You Interested?)● Friends and interests
How We Use Neo4j
● Model the friend data of our millions of users● Indicate connections everywhere on app● 1.1+ billion nodes● 8.5+ billion relationships● 450gb+ store● 3 instance cluster
Importing lots of data
● Find the right toolo First try normal Cyphero No good? Bring out the big guns - Java Batch
Inserter● Java Batch Inserter
o Sort relationships (GNU sort)o Try to keep index lookups to in-memory lookups only
Giant HashMap!
But wait!!!
● Cypher CSV importo 2.1 M01o Supposed to be good for importing large data setso Anyone tried it?
Read Querying
● Always try Cypher firsto Performance is being improved
● How can you tell if performance is where you need it to be?o Time queries (cold vs. warm cache)o Load testing!
Read Querying cont.
● Dark queryingo Great for benchmarking system where Neo4j
functionality is being injectedo Mitigates risko Provides results that are very close to real world
patterns
Read Querying cont.
● Reads too slow? Try these things.o Write high-throughput business-critical queries in
Java unmanaged extension faster hard limits
o Cache shard country, age, gender, etc. you hit warm cache more often
Read Querying cont.
● Warm the cache!o Touch all the nodeso Touch all the relationships
Writing
● Decide which writes need to be synchronous and which can be asynchronous
● Queue up asynchronous writes (routine updates, non-vital to immediate user-experience)o Try to evenly distribute themo How do we do this? Baserunner!
Baserunner
● Written by SNAP developer● Walks userbase randomly instead of
sequentiallyo This avoids pockets of heavily increased write
querieso Allows us to do high-velocity updating of our data
Tuning the JVM
● For a really high-throughput environment, G1 GC has been very helpfulo Good at adapting itselfo We experienced less system-stopping pauses than
with CMSo Try CMS first but remember G1 as option
Hardware is Important
● Lots of memory● Working set too big for memory?
o SSDs are helpfulo Optimization techniques discussed become much
more important
Not Everything is Your Fault!
● Like any software, Neo4j has bugs● Developers are receptive● File reports on Github when you find issues
Some stuff to look forward to...
● Relationship grouping (2.1 M01)o helps mitigate the super node/dense node problem
● Ronja (rewrite of the Cypher query language, 2.1?)
● More flexible label index searching (after 2.1)
Questions?