Mongodb, our Swiss Army Knife Database

42
MongoDB at fotopedia Timeline storage Our Swiss Army Knife Database

description

Experience feedback on 10 monthes of happy mongodb usage at fotopedia. You may also checkout: http://www.slideshare.net/octplane/mongodb-vs-mysql-a-devops-point-of-view

Transcript of Mongodb, our Swiss Army Knife Database

Page 1: Mongodb, our Swiss Army Knife Database

MongoDB at fotopediaTimeline storage

Our Swiss Army Knife Database

Page 2: Mongodb, our Swiss Army Knife Database

MongoDB at fotopedia

• Context

• Wikipedia data storage

• Metacache

Page 3: Mongodb, our Swiss Army Knife Database

Fotopedia

• Fotonauts, an American/French company

• Photo — Encyclopedia

• Heavily interconnected system : flickr, facebook, wikipedia, picassa, twitter…

• MongoDB in production since last october

• main store lives in MySQL… for now

Page 4: Mongodb, our Swiss Army Knife Database

First contact

• Wikipedia imported data

Page 5: Mongodb, our Swiss Army Knife Database
Page 6: Mongodb, our Swiss Army Knife Database
Page 7: Mongodb, our Swiss Army Knife Database
Page 8: Mongodb, our Swiss Army Knife Database
Page 9: Mongodb, our Swiss Army Knife Database

Wikipedia queries

• wikilinks from one article

• links to one article

• geo coordinates

• redirect

• why not use wikipedia API ?

Page 10: Mongodb, our Swiss Army Knife Database

Download ~ 5.7GB gzipXML

GeoRedirectBacklinkRelated

~12GB tabular data

Page 11: Mongodb, our Swiss Army Knife Database

Problem

Load ~12GB into a K/V store

Page 12: Mongodb, our Swiss Army Knife Database

CouchDB 0.9 attempt

• CouchDB had no dedicated import tool

• need to go through HTTP / Rest API

Page 13: Mongodb, our Swiss Army Knife Database

“DATA LOADING”

LOADING!

(obviously hijacked from xkcd.com)

Page 14: Mongodb, our Swiss Army Knife Database

Problem, rephrased

Load ~12GB into any K/V store

in hours, not days

Page 15: Mongodb, our Swiss Army Knife Database

Hadoop HBase ?

• as we were already using Hadoop Map/Reduce for preparation

• bulk load was just emerging at that time, requiring to code against HBase private APIs, generate the data in an ad-hoc binary format, ...

Page 17: Mongodb, our Swiss Army Knife Database

Problem, rerephrasedLoad ~12GB into any K/V store

in hours, not days

without wasting a week on development

and another week on setup

and several months on tuning

please ?

Page 18: Mongodb, our Swiss Army Knife Database

MongoDB attempt• Transforming the tabular data into a JSON

form : about half an hour or code, 45 minutes of hadoop parallel processing

• setup mongo server : 15 minutes

• mongoimport : 3 minutes to start it, 90 minutes to run

• plug RoR app on mongo : minutes

• prototype was done in a day

Page 19: Mongodb, our Swiss Army Knife Database

Download ~ 5.7GB gzip

GeoRedirectBacklinkRelated

~12GB, 12M docs

Batch Synchronous

Ruby on Rails

Page 20: Mongodb, our Swiss Army Knife Database

Hot swap ?

• Indexing was locking everything.

• Just run two instances of MongoDB.

• One instance is servicing the web app

• One instance is asleep or loading data

• One third instance knows the status of the two instances.

Page 21: Mongodb, our Swiss Army Knife Database

We loved:

• JSON import format

• efficiency of mongoimport

• simple and flexible installation

• just one cumbersome dependency

• easy to start (we use runit)

• easy to have several instances on one box

Page 22: Mongodb, our Swiss Army Knife Database

Second contact

• itʼs just all about graphes, anyway.

• wikilinks

• people following people

• related community albums

• and soon, interlanguage links

Page 23: Mongodb, our Swiss Army Knife Database
Page 24: Mongodb, our Swiss Army Knife Database

all about graphes...

• ... and itʼs also all about cache.

• The application needs to “feel” faster, letʼs cache more.

• The application needs to “feel” right, so letʼs cache less.

• or — big sigh — invalidate.

Page 25: Mongodb, our Swiss Army Knife Database
Page 27: Mongodb, our Swiss Army Knife Database

There are only two hard thingsin Computer Science:cache invalidation and naming things.

Phil Karlton

Haiku ?

Page 28: Mongodb, our Swiss Army Knife Database

Naming things

• REST have been a strong design principle in fotopedia since the early days, and the efforts are paying.

Page 29: Mongodb, our Swiss Army Knife Database

/en/2nd_arrondissement_of_Paris

/en/Paris/fragment/left_col

/en/Paris/fragment/related

/users/john/fragment/contrib

Page 30: Mongodb, our Swiss Army Knife Database

Invalidating

• Rest allows us to invalidate by URL prefix.

• When the Paris album changes, we have to invalidate /en/Paris.*

Page 31: Mongodb, our Swiss Army Knife Database

Varnish invalidation

• Varnish built-in regexp based invalidation is not designed for intensive, fine grained invalidation.

• We need to invalidate URLs individually.

Page 32: Mongodb, our Swiss Army Knife Database

/en/Paris.*

/en/Paris

/en/Paris/fragment/left_col

/en/Paris/photos.json?skip=0&number=20

/en/Paris/photos.json?skip=13&number=27

Page 33: Mongodb, our Swiss Army Knife Database

Metacache workflow

RoR application

Varnish HTTP cache

Nginx SSI

metacache feeder

varnish log

invalidation worker

/en/Paris/en/Paris/fragment/left_col/en/Paris/photos.json?skip=0&number=20/en/Paris/photos.json?skip=13&number=27

/en/Paris/fragment/left_col

/en/Paris.*

Page 34: Mongodb, our Swiss Army Knife Database

Waw.

• This time we are actually using MongoDB as a BTree. Impressive.

• The metacache has been running fine for several months, and we want to go further.

Page 35: Mongodb, our Swiss Army Knife Database

Invalidate less

• We need to be more specific as to what we invalidate.

• Today, if somebody votes on a photo in the Paris album, we invalidate all /en/Paris prefix, and most of it is unchanged.

• We will move towards a more clever metacache.

Page 36: Mongodb, our Swiss Army Knife Database

Metacache reloaded• Pub/Sub metacache

• Have the backend send a specific header to be caught by the metacache-feeder, conaining “subscribe” message.

• This header will be a JSON document, to be pushed to the metacache.

• The purge commands will be mongo search queries.

Page 37: Mongodb, our Swiss Army Knife Database

{url:/en/Paris, observe:[summary,links]}

{url:/en/Paris/fragment/left_col, observe: [cover]}

{url:/en/Paris/photos.json?skip=0&number=20, observe:[photos]}

{url:/en/Paris/photos.json?skip=0&number=20, observe:[photos]}

/en/Paris

/en/Paris/fragment/left_col

/en/Paris/photos.json?skip=0&number=20

/en/Paris/photos.json?skip=13&number=27

Page 38: Mongodb, our Swiss Army Knife Database

{url:/en/Paris, observe:[summary,links]}

{url:/en/Paris/fragment/left_col, observe: [cover]}

{url:/en/Paris/photos.json?skip=0&number=20, observe:[photos]}

{url:/en/Paris/photos.json?skip=0&number=20, observe:[photos]}

when somebody votes{ url:/en/Paris.*, observe:photos }

when the summary changes{ url:/en/Paris.*, observe:summary }

when the a new link is created{ url:/en/Paris.*, observe:links }

Page 39: Mongodb, our Swiss Army Knife Database

Other uses cases

• Timeline activities storage: just one more BTree usage.

• Moderation workflow data: tiny dataset, but more complex queries, map/reduce.

• Suspended experimentation around log collection and analysis

Page 40: Mongodb, our Swiss Army Knife Database

Current situation

• Mysql: main data store

• CouchDB: old timelines (+ chef)

• MongoDB: metacache, wikipedia, moderation, new timelines

• Redis: raw data cache for counters, recent activity (+ resque)

Page 41: Mongodb, our Swiss Army Knife Database

What about the main store ?

• albums are good fit for documents

• votes and score may be more tricky

• recent introduction of resque

Page 42: Mongodb, our Swiss Army Knife Database

In short

• Simple, fast.

• Hackable: in a language most can read.

• Clear roadmap.

• Very helpful and efficient team.

• Designed with application developer needs in mind.