SDEC2011 NoSQL Data modelling

49
NoSQL Data Modeling Concepts and Cases Shashank Tiwari blog: shanky.org | twitter: @tshanky st@treasuryofideas.com

Transcript of SDEC2011 NoSQL Data modelling

Page 1: SDEC2011 NoSQL Data modelling

NoSQL Data ModelingConcepts and Cases

Shashank Tiwariblog: shanky.org | twitter: @[email protected]

Page 2: SDEC2011 NoSQL Data modelling

NoSQL?

Page 3: SDEC2011 NoSQL Data modelling

NoSQL : Various Shapes and Sizes

• Document Databases

• Column-family Oriented Stores

• Key/value Data stores

• XML Databases

• Object Databases

• Graph Databases

Page 4: SDEC2011 NoSQL Data modelling

Key Questions

• How do I model data for my application?

• How do I determine which one is right for me?

• Can I easily shift from one database to the other?

• Is there a standard way of storing, accessing, and querying data?

Page 5: SDEC2011 NoSQL Data modelling

Agenda for this session

• Explore some of the main NoSQL products

• Understand how they are similar and different

• How best to use these products in the stack

Page 6: SDEC2011 NoSQL Data modelling

Document Databases

• also GenieDB, SimpleDB

Page 7: SDEC2011 NoSQL Data modelling

What is a document db?

• One that stores documents

• Popular options:

• MongoDB -- C++

• CouchDB -- Erlang

• Also Amazon’s SimpleDB

• ...what exactly is a document?

Page 8: SDEC2011 NoSQL Data modelling

In the real world

• (Source: http://guide.couchdb.org/draft/why.html)

Page 9: SDEC2011 NoSQL Data modelling

In terms of JSON

• {name: “John Doe”,

• zip: 10001}

Page 10: SDEC2011 NoSQL Data modelling

What about db schema?

• Schema-less

• Different documents could be stored in a single collection

Page 11: SDEC2011 NoSQL Data modelling

Data types: MongoDB

• Essential JSON types:

• string

• integer

• boolean

• double

Page 12: SDEC2011 NoSQL Data modelling

Data types: MongoDB (...cont)

• Additional JSON types

• null, array and object

• BSON types -- binary encoded serialization of JSON like documents

• date, binary data, object id, regular expression and code

• (Reference: bsonspec.org)

Page 13: SDEC2011 NoSQL Data modelling

A BSON example: object id

Page 14: SDEC2011 NoSQL Data modelling

Data types: CouchDB

• Everything JSON

• Large objects: attachments

Page 15: SDEC2011 NoSQL Data modelling

CRUD operations for documents

• Create

• Read

• Update

• Delete

Page 16: SDEC2011 NoSQL Data modelling

MongoDB: Create Document

• use mydb

• w = {name: “John Doe”, zip: 10001};

• db.location.save(w);

Page 17: SDEC2011 NoSQL Data modelling

Create db and collection

• Lazily created

• Implicitly created

• use mydb

• db.collection.save(w)

Page 18: SDEC2011 NoSQL Data modelling

MongoDB: Read Document

• db.location.find({zip: 10001});

• { "_id" : ObjectId("4c97053abe67000000003857"), "name" : "John Doe", "zip" : 10001 }

Page 19: SDEC2011 NoSQL Data modelling

MongoDB: Read Document (...cont)

• db.location.find({name: "John Doe"});

• { "_id" : ObjectId("4c97053abe67000000003857"), "name" : "John Doe", "zip" : 10001 }

Page 20: SDEC2011 NoSQL Data modelling

MongoDB: Update Document

• Atomic operations on single documents

• db.location.update( { name:"John Doe" }, { $set: { name: "Jane Doe" } } );

Page 21: SDEC2011 NoSQL Data modelling

CouchDB: RESTful

• Supports REST verbs: GET, HEAD, PUT, POST, DELETE

• Supports Replication

• Supports the notion of attachments

• Could work in offline modes and supports small footprint profiles

Page 22: SDEC2011 NoSQL Data modelling

Sorted Ordered Column-family Datastores

• Sorted

• Ordered

• Distributed

• Map

Page 23: SDEC2011 NoSQL Data modelling

Essential schema

Page 24: SDEC2011 NoSQL Data modelling

Multi-dimensional View

Page 25: SDEC2011 NoSQL Data modelling

A Map/Hash View

• {

• "row_key_1" : { "name" : {

• "first_name" : "Jolly", "last_name" : "Goodfellow"

• } } },

• "location" : { "zip": "94301" },

Page 26: SDEC2011 NoSQL Data modelling

Architectural View (HBase)

Page 27: SDEC2011 NoSQL Data modelling

The Persistence Mechanism

Page 28: SDEC2011 NoSQL Data modelling

Model Wrappers (The GAE Way)

• Python

• Model, Expando, PolyModel

• Java

• JDO, JPA

Page 29: SDEC2011 NoSQL Data modelling

HBase Data Access

• Thrift + Avro

• Java API -- HTable, HBaseAdmin

• Hive (SQL like)

• MapReduce -- sink and/or source

Page 30: SDEC2011 NoSQL Data modelling

Transactions

• Atomic row level

• GAE Entity Groups

Page 31: SDEC2011 NoSQL Data modelling

Indexes

• Row ordered

• Secondary indexes

• GAE style multiple indexes

• thinking from output to query

Page 32: SDEC2011 NoSQL Data modelling

Use cases

• Many Google’s Products

• Facebook Messaging

• StumbleUpon

• Open TSDB

• Mahalo, Ning, Meetup, Twitter, Yahoo!

• Lily -- open source CMS built on HBase & Solr

Page 34: SDEC2011 NoSQL Data modelling

Distributed Systems & Consistency (case: success)

Page 35: SDEC2011 NoSQL Data modelling

Distributed Systems & Consistency (case: failure)

Page 36: SDEC2011 NoSQL Data modelling

Binding by Transactions

Page 37: SDEC2011 NoSQL Data modelling

Consistency Spectrum

Page 38: SDEC2011 NoSQL Data modelling

Inconsistency Window

Page 39: SDEC2011 NoSQL Data modelling

RWN Math

• R – Number of nodes that are read from.

• W – Number of nodes that are written to.

• N – Total number of nodes in the cluster.

• In general: R < N and W < N for higher availability

Page 40: SDEC2011 NoSQL Data modelling

R + W > N

• Easy to determine consistent state

• R + W = 2N

• absolutely consistent, can provide ACID gaurantee

• In all cases when R + W > N there is some overlap between read and write nodes.

Page 41: SDEC2011 NoSQL Data modelling

R = 1, W = N

• more reads than writes

• W = N

• 1 node failure = entire system unavailable

Page 42: SDEC2011 NoSQL Data modelling

R = N, W =1

• W = N

• Chance of data inconsistency quite high

• R = N

• Read only possible when all nodes in the cluster are available

Page 43: SDEC2011 NoSQL Data modelling

R = W = ceiling ((N + 1)/2)

Effective quorum for eventual consistency

Page 44: SDEC2011 NoSQL Data modelling

Eventual consistency variants

• Causal consistency -- A writes and informs B then B always sees updated value

• Read-your-writes-consistency -- A writes a new value and never see the old one

• Session consistency -- read-your-writes-consistency within a client session

• Monotonic read consistency -- once seen a new value, never return previous value

• Monotonic write consistency -- serialize writes by the same process

Page 45: SDEC2011 NoSQL Data modelling

Dynamo Techniques

• Consistent Hashing (Incremental scalability)

• Vector clocks (high availability for writes)

• Sloppy quorum and hinted handoff (recover from temporary failure)

• Gossip based membership protocol (periodic, pair wise, inter-process interactions, low reliability, random peer selection)

• Anti-entropy using Merkle trees

• (source: http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf)

Page 46: SDEC2011 NoSQL Data modelling

Consistent Hashing

Page 47: SDEC2011 NoSQL Data modelling

CouchDB MVCC Style

• (Source: http://guide.couchdb.org/draft/consistency.html)

Page 48: SDEC2011 NoSQL Data modelling

Key/value Stores

• Memcached

• Membase

• Redis

• Tokyo Cabinet

• Kyoto Cabinet

• Berkeley DB

Page 49: SDEC2011 NoSQL Data modelling

Questions?

• blog: shanky.org | twitter: @tshanky

[email protected]