Wanderu – Lessons from Building a Travel Site with Neo4j - Eddy Wong @ GraphConnect NY 2013

22
Wanderu: Lessons Learned Lessons Learned and Unlearned from Building a Travel Site with Graphs and Neo4j Eddy Wong CTO, Wanderu.com @eddywongch

description

Wanderu is a consumer-focused search engine for buses and trains. Eddy will recount the architectural, modeling and other technical “lessons learned” and “lessons unlearned” in implementing our geospatial and search features using Neo4j in the context of a NoSQL polyglot solution.

Transcript of Wanderu – Lessons from Building a Travel Site with Neo4j - Eddy Wong @ GraphConnect NY 2013

Page 1: Wanderu – Lessons from Building a Travel Site with Neo4j - Eddy Wong @ GraphConnect NY 2013

Wanderu: Lessons Learned

Lessons Learned and Unlearned from Building a Travel Site with Graphs and Neo4j

Eddy WongCTO, Wanderu.com

@eddywongch

Page 2: Wanderu – Lessons from Building a Travel Site with Neo4j - Eddy Wong @ GraphConnect NY 2013

About Wanderu.comSearch Engine for (Intercity) Buses and Trains

Page 3: Wanderu – Lessons from Building a Travel Site with Neo4j - Eddy Wong @ GraphConnect NY 2013

Demo

Page 4: Wanderu – Lessons from Building a Travel Site with Neo4j - Eddy Wong @ GraphConnect NY 2013

From pt A to pt B

Nomenclature: Stations, Trips

A: NYCB: DC

Philly

BOLT, $13, 11/07/2013

MEG, $9, 11/07/2013 MEG, $4, 11/07/2013

A Shortest Path Problem as a function of depart, arrive, price, duration, date times

Page 5: Wanderu – Lessons from Building a Travel Site with Neo4j - Eddy Wong @ GraphConnect NY 2013

Lessons

LearnedUnLearned

Idea

•Architectural•Modeling•Geo

Page 6: Wanderu – Lessons from Building a Travel Site with Neo4j - Eddy Wong @ GraphConnect NY 2013

Our Story

• 2 yr startup, Tech started about 1+ yr ago

• Beta in Mar 2013, Launch in Aug 2013

• Knew nothing about Neo4j when we started (Jun 2012)

• Did not like the relational model: wanted schema-less and no self-joins

• Wanted a graph model

Page 7: Wanderu – Lessons from Building a Travel Site with Neo4j - Eddy Wong @ GraphConnect NY 2013

Workflow

Store

Scraping JSON

Bus Websites Non-uniform Data

Uniform Data

Server

Page 8: Wanderu – Lessons from Building a Travel Site with Neo4j - Eddy Wong @ GraphConnect NY 2013

Architectural Lessons

Art: MC Escher

Page 9: Wanderu – Lessons from Building a Travel Site with Neo4j - Eddy Wong @ GraphConnect NY 2013

Our Situation

• Data is written only in one direction

• Users search for paths, then segments

• Searches are done by date

• Needed online capability

• Trip info (price/avail) could change on some

Page 10: Wanderu – Lessons from Building a Travel Site with Neo4j - Eddy Wong @ GraphConnect NY 2013

Solution

Scraping JSON

Bus Websites Non-uniform Data

Uniform Data

MongoDBNeo4jMongoConn

Nodes & Edges

Replica Mechanism

Page 11: Wanderu – Lessons from Building a Travel Site with Neo4j - Eddy Wong @ GraphConnect NY 2013

MongoConnector

• MongoDB Lab project, open source, unsupported

• Uses Replica Mechanism: Oplog

• Eventually Consistent (not real time)

• Written in Python

• Main methods: Upserts and Deletes, passes doc

• Implement DocMgr->Neo4jDocMgr->py2neo

• We can add new properties easily on the fly

Page 12: Wanderu – Lessons from Building a Travel Site with Neo4j - Eddy Wong @ GraphConnect NY 2013

Polyglot Arch

Scraping JSON

Bus Websites Non-uniform Data

MongoDB

Neo4j

MongoConnNodes & Edges

Replica Mechanism

REST Server

BOS, NYCBOS, PHLNYC, DC

NYC, PHL

Page 13: Wanderu – Lessons from Building a Travel Site with Neo4j - Eddy Wong @ GraphConnect NY 2013

Modeling Lessons

Art: MC Escher

Page 14: Wanderu – Lessons from Building a Travel Site with Neo4j - Eddy Wong @ GraphConnect NY 2013

Our Story

• We tried to “dump” all data into Neo4j

• Edges had dates -> too many Edges -> “Super Node Problem”

• Query perf was terrible (1+ mins) and worse as # edges increased

• Tried Gremlin -> No improvements

• Needed range queries on Edges

Page 15: Wanderu – Lessons from Building a Travel Site with Neo4j - Eddy Wong @ GraphConnect NY 2013

“Dehydate”

• Don’t store everything in the Neo4j, only metadata

• Use Neo4j as a “connection index”

• Don’t store entities in Nodes, only keys

• Don’t store heavy properties in Edges

Page 16: Wanderu – Lessons from Building a Travel Site with Neo4j - Eddy Wong @ GraphConnect NY 2013

Neo4j Model

source: Wes Freeman, Tobias Lindaaker

Page 17: Wanderu – Lessons from Building a Travel Site with Neo4j - Eddy Wong @ GraphConnect NY 2013

Our Solution

• Serve paths from Neo4j

• Segments from MongoDB (with date constraints)

• Back to “Joins”

• “Join” across Neo4j + MongoDB:

1 != 525d9031e6c9236072114387

Page 18: Wanderu – Lessons from Building a Travel Site with Neo4j - Eddy Wong @ GraphConnect NY 2013

Joins across DBs

MongoDB: Stations Neo4j: Nodes

BOS BOS

NYC NYC

DC DC

... ...

MongoDB: Trips Neo4j: Edges

BOS-NYC BOS-NYC

BOS-DC BOS-DC

NYC-DC NYC-DC

... ...

• Forget seq id generated by dbs

• Use a human-created “UUID” string for id

• Convert pair into id: depart-arrive

• For example: BOS-NYC

Page 19: Wanderu – Lessons from Building a Travel Site with Neo4j - Eddy Wong @ GraphConnect NY 2013

Geo Lessons

Art: MC Escher

Page 20: Wanderu – Lessons from Building a Travel Site with Neo4j - Eddy Wong @ GraphConnect NY 2013

Hybrid Solution

• Google Autocomplete

• Google Maps

• MongoDB station geo lookup

Page 21: Wanderu – Lessons from Building a Travel Site with Neo4j - Eddy Wong @ GraphConnect NY 2013

Lessons of Lessons

• Really understand the Neo4j Runtime Model

• Pick universal human generated ids

• Join across dbs better than RDBMS: 10s paths x 100s segments vs. 500k x 500k

• Glad to have picked Neo4j: doing content gen and more geo features now

Page 22: Wanderu – Lessons from Building a Travel Site with Neo4j - Eddy Wong @ GraphConnect NY 2013

Useful Links

• Neo4j Internals

slideshare.net/thobe/an-overview-of-neo4j-internals

• Aseem’s Lessons Learned with Neo4j

http://aseemk.com/talks/neo4j-lessons-learned#/14

• Wes Freeman, Neo4j Internals

http://wes.skeweredrook.com/graphdb-meetup-may-2013.pdf

• MongoConnector

blog.mongodb.org/post/29127828146/introducing-mongo-connector