Download - Running Neo4j in Production: Tips, Tricks and Optimizations

Running Neo4j in Production

Tips, Tricks and Optimizations

This Talk...

● How we scaled our prod graph

● Challenges faced doing this

● Various lessons we learned and techniques

we used

● Some stuff I’m looking forward to in Neo4j

SNAP Interactive

● Presented by David Fox (Big Data Engineer)

● Social dating app AYI (Are You Interested?)

● Friends and interests

How We Use Neo4j

● Model the friend data of our millions of users

● Indicate connections everywhere on app

● 1.1+ billion nodes

● 8.5+ billion relationships

● 450gb+ store

● 3 instance cluster

Importing lots of data

● Find the right toolo First try normal Cypher

o No good? Bring out the big guns - Java Batch

Inserter

● Java Batch Insertero Sort relationships (GNU sort)

o Try to keep index lookups to in-memory lookups only

Giant HashMap!

But wait!!!

● Cypher CSV importo 2.1 M01

o Supposed to be good for importing large data sets

o Anyone tried it?

Read Querying

● Always try Cypher firsto Performance is being improved

● How can you tell if performance is where you

need it to be?o Time queries (cold vs. warm cache)

o Load testing!

Read Querying cont.

● Dark queryingo Great for benchmarking system where Neo4j

functionality is being injected

o Mitigates risk

o Provides results that are very close to real world

patterns

Read Querying cont.

● Reads too slow? Try these things.o Write high-throughput business-critical queries in

Java

unmanaged extension

faster

hard limits

o Cache shard

country, age, gender, etc.

you hit warm cache more often

Read Querying cont.

● Warm the cache!o Touch all the nodes

o Touch all the relationships

Writing

● Decide which writes need to be synchronous

and which can be asynchronous

● Queue up asynchronous writes (routine

updates, non-vital to immediate user-

experience)o Try to evenly distribute them

o How do we do this? Baserunner!

Baserunner

● Written by SNAP developer

● Walks userbase randomly instead of

sequentiallyo This avoids pockets of heavily increased write

queries

o Allows us to do high-velocity updating of our data

Tuning the JVM

● For a really high-throughput environment,

G1 GC has been very helpfulo Good at adapting itself

o We experienced less system-stopping pauses than

with CMS

o Try CMS first but remember G1 as option

Hardware is Important

● Lots of memory

● Working set too big for memory?o SSDs are helpful

o Optimization techniques discussed become much

more important

Not Everything is Your Fault!

● Like any software, Neo4j has bugs

● Developers are receptive

● File reports on Github when you find issues

Some stuff to look forward to...

● Relationship grouping (2.1 M01)o helps mitigate the super node/dense node problem

● Ronja (rewrite of the Cypher query

language, 2.1?)

● More flexible label index searching (after

2.1)

Questions?