Migrating from MongoDB to Neo4j - Lessons Learned

27
Meetup Feb 17 th , 2014 Migrating from MongoDB to Neo4j 1

description

This month we will learn how to use the (somewhat) new 2.0 branch of Michael Hunger's Batch Importer We'll also have a concise presentation on our experiences at Shindig Labs when we migrated production MongoDB data into Neo4j. Specifically: 1. key considerations made when choosing to move to a graph database 2. estimating the effort involved in changing our code 3. our data modeling approach 4. how we exported our data 5. how we used the Batch Importer to import our data (step by step) Others will be sharing their experiences as well and an open discussion will follow.

Transcript of Migrating from MongoDB to Neo4j - Lessons Learned

Page 1: Migrating from MongoDB to Neo4j - Lessons Learned

1

Meetup Feb 17th, 2014

Migrating from MongoDB to Neo4j

Page 2: Migrating from MongoDB to Neo4j - Lessons Learned

source: http://neo4j.rubyforge.org/guides/why_graph_db.html

Agenda• Intros

– name, what you do, interest in Neo4j?

• Case Study, Moving from MongoDB

– considerations, why and how

– steps taken, findings

– using the Batch Importer

• Group Discussion

– experiences from others?

Page 3: Migrating from MongoDB to Neo4j - Lessons Learned

source: http://neo4j.rubyforge.org/guides/why_graph_db.html

Case Study, Moving from MongoDB

Page 4: Migrating from MongoDB to Neo4j - Lessons Learned

4

Our Startup– A mobile drink discovery platform: explore

new drinks, post photos, learn new facts,

follow other drink afficionados (whisky, beer,

wine, cocktail experts)

Page 5: Migrating from MongoDB to Neo4j - Lessons Learned

5

Using MongoDB– Pluses for us:

• flexible (by far, most substantial benefit)

• good documentation

• easy to host and integrate with our code

– Downsides for us:

• lots of collections needed (i.e. for mapping data, many to

many relationships)

• queries with multiple joins

Page 6: Migrating from MongoDB to Neo4j - Lessons Learned

6

Relying on Redis– Needed to cache a lot in Redis

– We cached

• user profile

• news feed

– Too much complexity

• another denormalized data model to manage

• more difficult to test

• increase in bugs and edge cases

– Still awesome, but just relied on it too much

Page 7: Migrating from MongoDB to Neo4j - Lessons Learned

7

Evaluating Neo4j– Our goals

• simplify data model (less

denormalization)

• speed up highly relational queries

• keep our flexibilty (schemaless

data model)

– Considerations

• how will we host?

• will it make our codebase more

complex?

• support?

• easy to troubleshoot production

issues?

Page 8: Migrating from MongoDB to Neo4j - Lessons Learned

8

How We Evaluated

1. We set up an instance on Amazon EC2 (though

Heroku was still an option as well)

2. Imported realistic production data with the Batch

Importer

3. Took our most popular, slowest query and tested it

4. Wrote more example queries for standard use cases

(creating nodes, relationships, etc), easy to use?

5. Ran a branch of our code with Neo4j for a month

Page 9: Migrating from MongoDB to Neo4j - Lessons Learned

9

How We Evaluated1. Made sure we could get good support for the product

2. Determined effort involved in hosting it on Amazon EC2 (though

Heroku was also an option)

3. Determined effort needed to import bulk data and change our data

model

4. Audited each line of code and made a list of the types of queries we’d

need. Estimated effort involved in updating our codebase.

5. Imported production data and took our most popular, slowest query

and tested performance.

6. Wrote other more common queries and tested performance more

(using Apache Benchmark)

7. Was the driver (this case Ruby) support okay and was it well-written?

Would it be maintained years from now?

8. Test it out as a code branch for at least a month

Page 10: Migrating from MongoDB to Neo4j - Lessons Learned

10

Our Findings

1. So far so good (been testing for a few weeks now)

2. Set up an instance on Amazon EC2. Wasn’t that

bad.

3. Complex queries were a lot faster

4. Ruby driver (Neography) does the job though not

perfect.

5. Planning to use Neo4j’s official Ruby library once

they finish version 3.0 (which seems to not require

JRuby)

Page 11: Migrating from MongoDB to Neo4j - Lessons Learned

11

Our Findings

6. We needed to create an abstraction layer in the

code to simplify reads and write with the database.

Wasn’t that bad though.

7. Our data model got a lot more intuitive. No more

map collections (yay)

8. We can now implement recommendations a lot

more easily when we want to

9. No longer need to rely heavily on Redis and

caching

Page 12: Migrating from MongoDB to Neo4j - Lessons Learned

12

Our Findings

10.We think about our data differently

now

11.Managing the data model is

actually fun

Page 13: Migrating from MongoDB to Neo4j - Lessons Learned

13

Tutorial on Batch Importer

1. Our example involves real data

2. We will be using Ruby to generate .CSV files

representing nodes and relationships

3. Beware, existing documentation is “not good”

to put it lightly

4. Using the 2.0 version! (Precompiled binary)

https://github.com/jexp/batch-import/tree/20

Page 14: Migrating from MongoDB to Neo4j - Lessons Learned

14

Steps1. Install Neo4j

2. Download a binary version of batch importer

3. Batch Importer requires .CSV files. One type of

file will import nodes, another will import

relationships

4. Decide on fields that make nodes unique

1. ex: user has a username, a drink has a name

2. makes the process of mapping node relationships

later a lot easier too

Page 15: Migrating from MongoDB to Neo4j - Lessons Learned

15

.CSV Format for Nodes• Tab separated columns

• Importing Nodes

– node property names in first row

– format is <field name>:<field type> (defaults to String)

– all rows after that are corresponding property values

• Importing Relationships

– sepate .CSV file, source node’s unique field in first col, target

node’s unique field in second col, the word “type” in the 3rd column

– since we’re arleady using unique index on nodes, it’s easy to relate

them!

– can import multiple relations between two types of nodes in the

same .CSV file

Page 16: Migrating from MongoDB to Neo4j - Lessons Learned

16

Creating Drink Nodes

• Example output (tab delimited)

Page 17: Migrating from MongoDB to Neo4j - Lessons Learned

17

Creating Drink Nodesnamespace :export do

require 'csv'

task :generate_drink_nodes => :environment do

CSV.open("drink_nodes.csv", "wb", { :col_sep => "\t" }) do |csv|

csv << ["name:string:drink_name_index", "type:label", "name"]

Drink.all.each do |drink|

csv << [drink.name, "Drink", drink.name]

end

end

end

end

Page 18: Migrating from MongoDB to Neo4j - Lessons Learned

18

Running the Script

• Make sure all nodes, relationships deleted

from Neo4j– MATCH (n) OPTIONAL MATCH (n)-[r]-() DELETE n,r

• Stop your Neo4j server before importing

• Run the import command (per the binary

batch importer we downloaded earlier):– ./import.sh ~/neo4j-community-2.0/data/graph.db user_nodes.csv

Page 19: Migrating from MongoDB to Neo4j - Lessons Learned

19

Creating User Nodes

• Example output (tab delimited):

Page 20: Migrating from MongoDB to Neo4j - Lessons Learned

20

Creating User Nodes

CSV.open("user_nodes.csv", "wb", { :col_sep => "\t" }) do |csv|

csv << ["username:string:user_username_index",

"type:label",

"first_name",

"last_name"]

User.all.each do |user|

csv << [user.username, "User", user.first_name, user.last_name]

end

Page 21: Migrating from MongoDB to Neo4j - Lessons Learned

21

User to User Relationships

• NOTE: it’s easy to relate users to users

since we already have an index set up.

• Example output (tab delimited):

Page 22: Migrating from MongoDB to Neo4j - Lessons Learned

22

User to User RelationshipsCSV.open("user_rels.csv", "wb", { :col_sep => "\t" }) do |csv|

csv << ["username:string:user_username_index",

"username:string:user_username_index",

"type"]

User.all.each do |user|

user.following.each do |other_user|

csv << [user.username, other_user.username, "FOLLOWS"]

end

user.followers.each do |other_user|

csv << [other_user.username, user.username, "FOLLOWS"]

end

end

end

Page 23: Migrating from MongoDB to Neo4j - Lessons Learned

23

User to Drink Relationships

• Example output:

Page 24: Migrating from MongoDB to Neo4j - Lessons Learned

24

User to Drink Relationships

CSV.open("user_drink_rels.csv", "wb", { :col_sep => "\t" }) do |csv|

csv << ["username:string:user_username_index", "name:string:drink_name_index",

"type"]

User.all.each do |user|

user.liked_drinks.each do |drink|

csv << [user.username, drink.name, "LIKED"]

end

user.disliked_drinks.each do |drink|

csv << [user.username, drink.name, "DISLIKED"]

end

user.drink_journal_entries.each do |entry|

csv << [user.username, entry.drink.name, "JOURNALED"]

end

end

end

Page 25: Migrating from MongoDB to Neo4j - Lessons Learned

25

Test Your Data• Test with some cypher queries

– cheat sheet: http://docs.neo4j.org/refcard/2.0

– ex:

MATCH(n:User)-[r:FOLLOWS]-(o) WHERE

n.username='nickTribeca' RETURN n, r limit 50

• Note: you must limit your results or else the

Data Browser will become too slow to use

Page 26: Migrating from MongoDB to Neo4j - Lessons Learned

26

That’s the Tutorial• You can always migrate data yourself

without the batch importer

– ie. script to query MongoDB data and

insert it to Neo4j in real time using your API

• Using the Batch Importer is really fast

though

• Have found it faster to write and less

error prone than writing my own script

Page 27: Migrating from MongoDB to Neo4j - Lessons Learned

27

Group Q&A

• Thanks for coming

• @seenickcode

[email protected] for

questions

• Want to present? Let me know.