GraphConnect 2014 SF: Neo4j at Scale using Enterprise Integration Patterns
Wanderu – Lessons from Building a Travel Site with Neo4j - Eddy Wong @ GraphConnect NY 2013
-
Upload
neo4j-the-open-source-graph-database -
Category
Technology
-
view
1.096 -
download
0
description
Transcript of Wanderu – Lessons from Building a Travel Site with Neo4j - Eddy Wong @ GraphConnect NY 2013
Wanderu: Lessons Learned
Lessons Learned and Unlearned from Building a Travel Site with Graphs and Neo4j
Eddy WongCTO, Wanderu.com
@eddywongch
About Wanderu.comSearch Engine for (Intercity) Buses and Trains
Demo
From pt A to pt B
Nomenclature: Stations, Trips
A: NYCB: DC
Philly
BOLT, $13, 11/07/2013
MEG, $9, 11/07/2013 MEG, $4, 11/07/2013
A Shortest Path Problem as a function of depart, arrive, price, duration, date times
Lessons
LearnedUnLearned
Idea
•Architectural•Modeling•Geo
Our Story
• 2 yr startup, Tech started about 1+ yr ago
• Beta in Mar 2013, Launch in Aug 2013
• Knew nothing about Neo4j when we started (Jun 2012)
• Did not like the relational model: wanted schema-less and no self-joins
• Wanted a graph model
Workflow
Store
Scraping JSON
Bus Websites Non-uniform Data
Uniform Data
Server
Architectural Lessons
Art: MC Escher
Our Situation
• Data is written only in one direction
• Users search for paths, then segments
• Searches are done by date
• Needed online capability
• Trip info (price/avail) could change on some
Solution
Scraping JSON
Bus Websites Non-uniform Data
Uniform Data
MongoDBNeo4jMongoConn
Nodes & Edges
Replica Mechanism
MongoConnector
• MongoDB Lab project, open source, unsupported
• Uses Replica Mechanism: Oplog
• Eventually Consistent (not real time)
• Written in Python
• Main methods: Upserts and Deletes, passes doc
• Implement DocMgr->Neo4jDocMgr->py2neo
• We can add new properties easily on the fly
Polyglot Arch
Scraping JSON
Bus Websites Non-uniform Data
MongoDB
Neo4j
MongoConnNodes & Edges
Replica Mechanism
REST Server
BOS, NYCBOS, PHLNYC, DC
NYC, PHL
Modeling Lessons
Art: MC Escher
Our Story
• We tried to “dump” all data into Neo4j
• Edges had dates -> too many Edges -> “Super Node Problem”
• Query perf was terrible (1+ mins) and worse as # edges increased
• Tried Gremlin -> No improvements
• Needed range queries on Edges
“Dehydate”
• Don’t store everything in the Neo4j, only metadata
• Use Neo4j as a “connection index”
• Don’t store entities in Nodes, only keys
• Don’t store heavy properties in Edges
Neo4j Model
source: Wes Freeman, Tobias Lindaaker
Our Solution
• Serve paths from Neo4j
• Segments from MongoDB (with date constraints)
• Back to “Joins”
• “Join” across Neo4j + MongoDB:
1 != 525d9031e6c9236072114387
Joins across DBs
MongoDB: Stations Neo4j: Nodes
BOS BOS
NYC NYC
DC DC
... ...
MongoDB: Trips Neo4j: Edges
BOS-NYC BOS-NYC
BOS-DC BOS-DC
NYC-DC NYC-DC
... ...
• Forget seq id generated by dbs
• Use a human-created “UUID” string for id
• Convert pair into id: depart-arrive
• For example: BOS-NYC
Geo Lessons
Art: MC Escher
Hybrid Solution
• Google Autocomplete
• Google Maps
• MongoDB station geo lookup
Lessons of Lessons
• Really understand the Neo4j Runtime Model
• Pick universal human generated ids
• Join across dbs better than RDBMS: 10s paths x 100s segments vs. 500k x 500k
• Glad to have picked Neo4j: doing content gen and more geo features now
Useful Links
• Neo4j Internals
slideshare.net/thobe/an-overview-of-neo4j-internals
• Aseem’s Lessons Learned with Neo4j
http://aseemk.com/talks/neo4j-lessons-learned#/14
• Wes Freeman, Neo4j Internals
http://wes.skeweredrook.com/graphdb-meetup-may-2013.pdf
• MongoConnector
blog.mongodb.org/post/29127828146/introducing-mongo-connector