NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdfNoSQL Systems > Memcached • What...

35
NoSQL Data Stores Luca Rossi [email protected]

Transcript of NoSQL Data Stores - Roma Tre Universitytorlone/bd2/noSQL-2.pdfNoSQL Systems > Memcached • What...

NoSQL Data Stores

Luca [email protected]

NoSQL Systems > Timeline

2003    Memcached2006 Google BigTable2007 Amazon Dynamo

2007 HBase2008 Cassandra, CouchDB2009 P.Voldemort, Redis, Riak, MongoDB

30/05/2011 Sistemi NoSQL 2

NoSQL Systems > Memcached

• What ismemcached– Caching system intended to alleviate database load.– In‐memory key‐value store for small chunks of data.

• Extremely successful– Facebook, Yahoo, Wikipedia, Ebay, Digg, ….

30/05/2011 Sistemi NoSQL 3

Memcached > How does it work

30/05/2011 Sistemi NoSQL 4

Super simple!

v = memcachedClient.get(key);if(v == NULL) {

v = db.query( SOME SLOW QUERY );memcachedClient.set(key, v);

}

Key‐value cache1. Keys are hashed2. Hash table span across an 

arbitrary number of servers

NoSQL Systems > Google BigTable

30/05/2011 Sistemi NoSQL 5

NoSQL Systems > Google BigTable

30/05/2011 Sistemi NoSQL 6

• BigTable is a distributed storagesystem for managing structureddata that is designed to scale to a very large size.

• Petabytes of data across thousands of commodity servers.

• Built on top of Google File System

Google BigTable > Data Model

30/05/2011 Sistemi NoSQL 7

Row Id ColumnFamily1 ColumnFamily2 … ColumnFamilyN

rowid1 qualifier1 = “abc”qualifier2 = “def”qualifier3 = “123”…

qualifier1 = “xyz”qualifier5 = “fgh”

… …

rowid2…

• Column Families are (the only things) defined in the schema• Qualifiers are added dynamically.

• Simple queries• Get a row by key• Get a range of rows by (start key, end key)

Google BigTable > Data Model > Example

30/05/2011 Sistemi NoSQL 8

• Student – Course– 1 student > many courses– 1 course > many students

Studentsid PKnameemailbirthdate

Courseid PKtitledescriptionteacher_id

Student2Coursestudent_idcourse_id

Google BigTable > Data Model > Example

30/05/2011 Sistemi NoSQL 9

De‐normalized data

Single key‐space

Google BigTable > Infrastructure

• Partition model: sharding on the row key :– Data is divided into tablets– Each tablet is defined by the range of row keys it isresponsible for (start key – end key)

– Each tablet is served by one tablet server at a time– Each tablet server may serve (has the lock for) manytablets.

• Distributed locking service called Chubby– Manages tablet servers lifecycle

30/05/2011 Sistemi NoSQL 10

Google BigTable > Infrastructure

• Three‐level hierarchy to store tablet location– Analogous to a B+ Tree

30/05/2011 Sistemi NoSQL 11

Master ServerTablet Servers

Google BigTable > Infrastructure

30/05/2011 Sistemi NoSQL 12

Client Master Server

Tablet Server

Tablet Server

Tablet Server

Tablet Serverrequest

request

response

• Strong consistency– Only one tablet server is responsible for a given piece of data.– Replication is handled on the GFS layer

• Trade‐off with availability– If a tablet server fails, its portion of data is temporarily unavailable until a new 

server is assigned

NoSQL Systems > Amazon Dynamo

“An extra tenth of second in response times will cost us1% in sales” ‐ Amazon

• Dynamo: Highly available key‐value store

• Challenge: reliability at massive scale– Tens of millions of customers.– Tens of thousands of servers.

30/05/2011 Sistemi NoSQL 13

Amazon Dynamo > Data Model

• Binary objects (i.e. blobs) identified by uniquekeys

• Query model:  – Simple read and write operations to data retrievedby primary key

– No operations span multiple data items

30/05/2011 Sistemi NoSQL 14

Amazon Dynamo > Infrastructure

• Partitioning similar to P2P (Chord, Pastry, etc.)– Keys are hashed.– The range of the hash function is treated as a circular space (ring).

– Each node is responsible for a region of the ring.– Distributed Hash Table (DHT)

30/05/2011 Sistemi NoSQL 15

AA

N=1N=1

N=2N=2

N=2N=2

N=3N=3

NoSQL Systems > Amazon Dynamo

30/05/2011 Sistemi NoSQL 16

“AE107FB…”

• Each node is responsiblefor the region between itand its N predecessors.

• N is tuned on per‐nodebasis

NoSQL Systems > Amazon Dynamo

• Replication– Each data item is replicated at many hosts

• Eventual consistency– Updates are propagated to replicas asynchronously– The system eventually reaches a consistent state

• Tradeoff between consistency and availability– Number of replicas is crucial

30/05/2011 Sistemi NoSQL 17

Case Study > Facebook Messages

30/05/2011 Sistemi NoSQL 18

Case Study > Facebook Messages

• Real‐time, reliable messaging system that combines chat, messagesand emails.

• 135+ billion messages per month

• Two main usage patterns– A short set of temporal data that tends to be volatile– An ever growing set of data that rarely gets accessed

• Candidate systems: – MySQL– Apache Cassandra– Apache HBase

30/05/2011 Sistemi NoSQL 19

Facebook Messages > MySQL

• Attractive choice:+ Facebook core infrastructure is MySQL‐based

• It is indeed a giant LAMP application+ Facebook team has extensive knowledge in running and managing MySQL

• But…– MySQL clustering is hard to mantain (and scale)– MySQL performances suffer with large indexes and data sets

30/05/2011 Sistemi NoSQL 20

Facebook Messages > Apache HBase

• BigTable’s open‐source clone– Extensible record store– Strong consistency

• Availability trade‐off

• Part of the Hadoop ecosystem– Built on top of HDFS– Integrates with Hive, ZooKeeper, etc.

30/05/2011 Sistemi NoSQL 21

Facebook Messages > Apache Cassandra

• Marriage between BigTable and Dynamo– Data model: Extensible record store (BigTable)– Infrastructure: Distributed Hash Table (Dynamo)

• Eventual consistency• High availability

• Developed by Facebook itself– To serve the (old) inbox system

30/05/2011 Sistemi NoSQL 22

Facebook Messages > Evaluation results

• MySQL soon discarded• Hbase vs Cassandra

30/05/2011 Sistemi NoSQL 23

Data model Consistency model Availability

HBase Extensible record store

Strong consistency ‐ Replicas managed by HDFS

‐ Region servers are singlepoints of failure

Cassandra Extensible record store

Eventual consistency ‐ No single point of failure

Facebook Messages > Evaluation results

• MySQL soon discarded• Hbase vs Cassandra

30/05/2011 Sistemi NoSQL 24

Data model Consistency model Availability

HBase Extensible record store

Strong consistency ‐ Replicas managed by HDFS

‐ Region servers are singlepoints of failure

Cassandra Extensible record store

Eventual consistency ‐ No single point of failure

• Hbase won– Strong consistency is a better match for real‐time systems

NoSQL Systems > Overview

• We have seen:– Extensible record stores

• BigTable, HBase, Cassandra

– Key‐value stores• Dynamo

• There’s more to it!– Document stores

30/05/2011 Sistemi NoSQL 25

NoSQL Systems > Document stores

• Systems that store collections of documents

• What is a document?– Generally, an object with a number of fields, whosevalues can be scalars, lists, or nested documents aswell

• e.g.: XML, JSON

30/05/2011 Sistemi NoSQL 26

Case Study > Guardian.co.uk

30/05/2011 Sistemi NoSQL 27

Guardian.co.uk > 2005‐09

Modern Java application– Strong model in Java– Oracle RDBMS– Database abstractedwith ORM

30/05/2011 Sistemi NoSQL 28

Problems: increasing complexity– Complex Hibernate binding (10.000+ lines of XML config)– Lots of optimisations– Complex caching strategy– Load becoming an issue– …

Guardian.co.uk > 2009‐10

30/05/2011 Sistemi NoSQL 29

• Introduce yet more caching

Memcached

• Decouple applications from db by building APIs– Power APIs using scalable technologies (Apache Solr)– JSON results

DB Load

Guardian.co.uk > 2009‐10

30/05/2011 Sistemi NoSQL 30

Three models now:– RDBMS Tables– Java objects– JSON API

JSON model is very simple:– Multiple domain objects expressed in a single doc– Can be designed in a forwardly extensible way

headache

Guardian.co.uk > 2009‐10

30/05/2011 Sistemi NoSQL 31

Article

Tags

Guardian.co.uk > 2009‐10

30/05/2011 Sistemi NoSQL 32

Article

Tags

What if the JSON API was the primary model?• CouchDB• MongoDB

What if the JSON API was the primary model?• CouchDB• MongoDB

NoSQL Systems > MongoDB vs CouchDB

30/05/2011 Sistemi NoSQL 33

CouchDB MongoDB

Data Model Collections of JSON docs Collections of BSON docs

Queries Low‐level query language Rich, declarative query language

Consistency Model Eventual Consistency Strong Consistency (tunable though)

Replication Master‐Master Master‐Slave

Scalability Through replication Sharding

NoSQL Systems > MongoDB vs CouchDB

30/05/2011 Sistemi NoSQL 34

CouchDB MongoDB

Data Model Collections of JSON docs Collections of BSON docs

Queries Low‐level query language Rich, declarative query language

Consistency Model Eventual Consistency Strong Consistency (tunable though)

Replication Master‐Master Master‐Slave

Scalability Through replication Sharding

• MongoDB was chosen:• Can easily express complex queries• Good if you come from RDBMS• No need for extreme scalability (where CouchDB shines)

NoSQL Systems > Links and References

• Rick Cattel – Scalable SQL and NoSQL Datastores• R.Cattel, M.Stonebraker – Ten Rules for Scalable Performance in “Simple 

Operation” Datastores• M.Stonebraker – SQL vs NoSQL Databases

• A.Popescu – MyNoSQL Blog

• Chang et al. – Google BigTable• DeCandia et al – Amazon Dynamo

• We have encountered:– Cassandra – cassandra.apache.org– Hbase ‐ hbase.apache.org– CouchDB ‐ couchdb.apache.org– MongoDB ‐ http://www.mongodb.org– Memcached ‐memcached.org

30/05/2011 Sistemi NoSQL 35