NoSQL and SQL Anti Patterns

20
NoSQL SQL anti patterns and NoSQL alternatives Gleicon Moraes http://zenmachine.wordpress.com http://github.com/gleicon

description

Updated presentation based on NoSQL Intro for noSQLbr event (http://nosqlbr.com) 05/10/2010

Transcript of NoSQL and SQL Anti Patterns

Page 1: NoSQL and SQL Anti Patterns

NoSQLSQL anti patterns and NoSQL alternatives

Gleicon Moraes

http://zenmachine.wordpress.comhttp://github.com/gleicon

Page 2: NoSQL and SQL Anti Patterns

Doing it wrong, Junior !

Page 3: NoSQL and SQL Anti Patterns

SQL Anti patterns and related stuff

The eternal tree (rows refer to the table itself - think threaded discussion)Dynamic table creation (and dynamic query building)Table as cache (lets save it in another table)Table as queue (wtf)Table as log file (table cleaning slave required)Stoned Procedures (living la vida business)Row Alignment (the careful gentleman)Extreme JOINs (app requires a warmed up cache)Your scheme must be printed in an A3 sheet.Your ORM issue full queries for Dataset iterations

Page 4: NoSQL and SQL Anti Patterns

The eternal treeProblem: Most threaded discussion example uses something like a table which contains all threads and answers, relating to each other by an id. Usually the developer will come up with his own binary-tree version to manage this mess.

id - parent_id -author - text1 - 0 - gleicon - hello world2 - 1 - elvis - shout !

NoSQL alternative: Document storage:{ thread_id:1, title: 'the meeting', author: 'gleicon', replies:[ { 'author': elvis, text:'shout', replies:[{...}] } ]}

Page 5: NoSQL and SQL Anti Patterns

Dynamic table creationProblem: To avoid huge tables, one must come with a "dynamic schema". For example, lets think about a document management company, which is adding new facilities over the country. For each storage facility, a new table is created:

item_id - row - column - stuff1 - 10 - 20 - cat food2 - 12 - 32 - trout

Now you have to come up with "dynamic queries", which will probably query a "central storage" table and issue a huge join to check if you have enough cat food over the country.

NoSQL alternative: - Document storage, modeling a facility as a document- Key/Value, modeling each facility as a SET

Page 6: NoSQL and SQL Anti Patterns

Table as cacheProblem: Complex queries demand that a result be stored in a separated table, so it can be queried quickly. Worst than views

NoSQL alternative: - Really ?- Memcached- Redis + AOF + EXPIRE- Denormalization

Page 7: NoSQL and SQL Anti Patterns

Table as queueProblem: A table which holds messages to be completed. Worse, they must be ordered.

NoSQL alternative: - RestMQ, Resque- Any other message broker- Redis (LISTS - LPUSH + RPOP)- Use the right tool

Page 8: NoSQL and SQL Anti Patterns

Table as log fileProblem: A table in which data gets written as a log file. From time to time it needs to be purged. Truncating this table once a day usually is the first task assigned to new DBAs.

NoSQL alternative: - MongoDB capped collection- Redis, and a RRD pattern- RIAK

Page 9: NoSQL and SQL Anti Patterns

Stoned proceduresProblem: Stored procedures hold most of your applications logic. Also, some triggers are used to - well - trigger important data events.

SP and triggers has the magic property of vanishing of our memories and being impossible to keep versioned.

NoSQL alternative: - Now be careful so you dont use map/reduce as stoned procedures. - Use your preferred language for business stuff, and let event handling to pub/sub or message queues.

Page 10: NoSQL and SQL Anti Patterns

Row AlignmentProblem: Extra rows are created but not used, just in case. Usually they are named as a1, a2, a3, a4 and called padding.

There's good will behind that, specially when version 1 of the software needed an extra column in a 150M lines database and it took 2 days to run an ALTER TABLE.

NoSQL alternative: - Document based databases as MongoDB and CouchDB, where new atributes are local to the document. Also, having no schema helps

- Column based databases may be not the best choice if column creation need restart/migrations

Page 11: NoSQL and SQL Anti Patterns

Extreme JOINsProblem: Business stuff modeled as tables. Table inheritance (Product -> SubProduct_A). To find the complete data for a user plan, one must issue gigantic queries with lots of JOINs.

NoSQL alternative: - Document storage, as MongoDB- Denormalization- Serialized objects

Page 12: NoSQL and SQL Anti Patterns

Your scheme fits in an A3 sheetProblem: Huge data schemes are difficult to manage. Extreme specialization creates tables which converges to key/value model. The normal form get priority over common sense.

Product_A Product_Bid - desc id - desc

NoSQL alternative: - Denormalization- Another scheme ? - Document store for flattening model- Key/Value

Page 13: NoSQL and SQL Anti Patterns

Your ORM ...Problem: Your ORM issue full queries for dataset iterations, your ORM maps and creates tables which mimics your classes, even the inheritance, and the performance is bad because the queries are huge, etc, etc

NoSQL alternative: Apart from denormalization and good old common sense, ORMs are trying to bridge two things with distinct impedance.

There is nothing to relational models which maps cleanly to classes and objects. Not even the basic unit which is the domain(set) of each column. Black Magic ?

Page 14: NoSQL and SQL Anti Patterns

No silver bullet- Consider alternatives

- Think outside the norm

- Denormalize

- Simplify

Page 15: NoSQL and SQL Anti Patterns

Cycle of changes - Product A1. There was the database model2. Then, the cache was needed. Performance was no good.3. Cache key: query, value: resultset4. High or inexistent expiration time [w00t]

(Now there's a turning point. Data didn't need to change often. Denormalization was a given with cache)

5. The cache needs to be warmed or the app wont work.6. Key/Value storage was a natural choice. No data on MySQL anymore.

Page 16: NoSQL and SQL Anti Patterns

Cycle of changes - Product B1. Postgres DB storing crawler results.2. There was a counter in each row, and updating this counter

caused contention errors.3. Memcache for reads. Performance is better.4. First MongoDB test, no more deadlocks from counter

update.5. Data model was simplified, the entire crawled doc was

stored.

Page 17: NoSQL and SQL Anti Patterns

Stuff to think aboutThink if the data you use aren't denormalized (cached)

Most of the anti-patterns contain signs that the NoSQL route (or at least a partial NoSQL route) may simplify.

Are you dependent on cache ? Does your application fails when there is no cache ? Does it just slows down ?

Are you ready to think more about your data ?

Think about the way to put and to get back your data from the database (be it SQL or NoSQL).

Page 18: NoSQL and SQL Anti Patterns

Extra - MongoDB and RedisThe next two slides are here to show what is like to use MongoDB and Redis for the same task.

There is more to managing your data than stuffing it inside a database. You gotta plan ahead for searches and migrations.

This example is about storing books and searching between them. MongoDB makes it simpler, just liek using its query language. Redis requires that you keep track of tags and ids to use SET operations to recover which books you want.

Check http://rediscookbook.org and http://cookbook.mongodb.org/ for recipes on data handling.

Page 19: NoSQL and SQL Anti Patterns

MongoDB/Redis recap - BooksMongoDB

{ 'id': 1, 'title' : 'Diving into Python', 'author': 'Mark Pilgrim', 'tags': ['python','programming', 'computing'] }

{ 'id':2, 'title' : 'Programing Erlang', 'author': 'Joe Armstrong', 'tags': ['erlang','programming', 'computing', 'distributedcomputing', 'FP'] }

{ 'id':3, 'title' : 'Programing in Haskell', 'author': 'Graham Hutton', 'tags': ['haskell','programming', 'computing', 'FP'] }

Redis

SET book:1 {'title' : 'Diving into Python', 'author': 'Mark Pilgrim'}SET book:2 { 'title' : 'Programing Erlang', 'author': 'Joe Armstrong'}SET book:3 { 'title' : 'Programing in Haskell', 'author': 'Graham Hutton'}

SADD tag:python 1SADD tag:erlang 2SADD tag:haskell 3SADD tag:programming 1 2 3SADD tag computing 1 2 3SADD tag:distributedcomputing 2SADD tag:FP 2 3

Page 20: NoSQL and SQL Anti Patterns

MongoDB/Redis recap - BooksMongoDB

Search tags for erlang or haskell:

db.books.find({"tags": { $in: ['erlang', 'haskell'] }})

Search tags for erlang AND haskell (no results)

db.books.find({"tags": { $all: ['erlang', 'haskell'] }})

This search yields 3 resultsdb.books.find({"tags": { $all: ['programming', 'computing'] }})

Redis

SINTER 'tag:erlang' 'tag:haskell'0 results

SINTER 'tag:programming' 'tag:computing'3 results: 1, 2, 3

SUNION 'tag:erlang' 'tag:haskell'2 results: 2 and 3

SDIFF 'tag:programming' 'tag:haskell'2 results: 1 and 2 (haskell is excluded)