I’ve outgrown my basic stack. Now what?

Hivereader.com

I’ve outgrown my basic stack. Now what?

Thoughts and feelings about growing with Django and NoSQL

Hivereader.com

Our common stack is built on:

Super awesome and fast (once you learn what knobs to turn).Lots of cool features and tools: pg_tune, pg_top, pg_bouncer

Django is a high-level Python Web framework that encourages rapid development and clean, pragmatic design.

In-memory key-value store for small chunks of data.Super simple and awesome

Hivereader.com

People are signing up and using your app/game/site/bread maker/whatever

Hivereader.com

hooray

Hivereader.com

But

Hivereader.com

Things get start to get slow

Hivereader.com

or worse

Hivereader.com

Single server

appapp

DBDB

otherother

Hivereader.com

Find your bottlenecks:

Unless your app is doing something crazyyou’re mostly abusing the DB

Add more caching or, better yet, smarter caching

Hivereader.com

Hivereader.com

pg_bouncer

Hivereader.com

Welcome to Postgres config filepg_tune is here to help*

*kind of

Hivereader.com

Still Growing?

Hivereader.com

appapp

DBDB

otherother

appapp

DBDB

appapp

DBDB

otherother

appapp

DBDB

otherother

Hivereader.com

Need a solution for lots of data that is growing quickly.

The solution needs to be targeted for my problem.

Hivereader.com

NoSQL to the rescue?

Hivereader.com

But wait, can Postgres handle this?

most likely

partitioningand

sharding

Can have lots of app code sometimes.What about when you outgrow your shard key

“Don’t shard until you have to” - every single talk I’ve seen

Slony?“master to multiple slaves” replication

Hivereader.com

Lots of options

+ tons more

Hivereader.com

Nearly all of them are based on 2 papers

Built on CAP theoremThe theorem began as a conjecture made by University of California, Berkeley computer scientist Eric Brewer at the 2000 Symposium on Principles of Distributed Computing.

In 2002, Seth Gilbert and Nancy Lynch of MIT published a formal proof of Brewer's conjecture, rendering it a theorem.

Hivereader.com

Which one?

Hivereader.com

Super fastAdvanced key-value storeThink of it as super memcached. With union math.All data must fit in ramIt is often referred to as a data structure serverKeys can contain strings, hashes, lists, sets and sorted setsNot a db solution but more of a helper.

This is now a part of our basic stack for most apps.

Hivereader.com

Document storeJSON-like documents with dynamic schemasAd-hoc queriesIndexingLoad-balancing MongoDB scales horizontally using sharding

MongoDB uses a readers-writer lock that allows concurrent reads access to a database but gives exclusive access to a single write operation.However, when a write lock exists, a single write operation holds the lock exclusively, and no other read or write operations may share the lock.

Global write lock*Uncompressed field namesSafe off by defaultJust google “mongo problems” or “moving off mongodb”

Hivereader.com

Uses pre-defined column family formatMap ReduceUsed by all people with with ‘big data’ problemsAmazing workhorse for data

You need a sizable clusterCluster setup can be difficult

/ hbase

I need to personally spend more time with this

Hivereader.com

JSON to store data, JavaScript for MapReduce and HTTP for an API. Views: embedded map/reduceMulti-master replication

BigCouch, couchbase, Membase?Kind of in a dev rut......but just pushed a new huge upgrade

Hivereader.com

Based hardcore on Amazon's Dynamo paperKey Value storeSuper good about failure, “no downtime”Map Reduce / Secondary IndexesBuilt-in full text searchLink walking

2 types of mapreduceJavascript - can be slow as hellErlang - super fast

Hivereader.com

Key value + row-oriented = column familyLinear scalability and fault-tolerance on commodity hardware or cloud infrastructureBuilt by Facebook for MessagesHas CQL3 - think SQL, kind ofBaked auto cluster AMISuper fast writesCompresses data that’s not accessed a lotCan tie in to Hadoop for big map reduce

Hivereader.com

So now what?

Hivereader.com

Things to think about:

Is eventual consistency ok for you?Do you know your queries you need right now?

Is your data complicated or simple?How fast does it grow?

How long do you want that data to hang around?Really think about trade offs.

Every system has its good and badThere is no “winner”, so stop searching “which is best”

Think about which fits your use case

http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redishttp://docs.basho.com/riak/1.2.1/references/appendices/comparisons/

Tons of links out there, just make sure they are relatively new

http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis

I’ve outgrown my basic stack. Now what?

Technology

Transcript of I’ve outgrown my basic stack. Now what?