NoSQL Databases

17
Anotomy of NoSQL Databases Date: 11/10/13 Amit Kumar

Transcript of NoSQL Databases

Page 1: NoSQL Databases

Anotomy of NoSQL Databases

Date: 11/10/13

Amit Kumar

Page 2: NoSQL Databases

2

Agenda

+Background

+What are NoSQL Databases

+Relational vs NoSQL Databases

+HBase

+Cassandra

+Design Strategies behind NoSQL Databases

Page 3: NoSQL Databases

3

Background

+Traditional Applications

Limited Data

Top priority on consistency

Focus on average latency

Ideally fit with RDBMS

Utilized the DB intrinsic features well

Good part of logic resided in DB

+Next Gen Applications

Web Scale (~infinite)

ALWAYS available

High performance in ALL cases

Data in the form of key/value pair

Logic part of Application Layer

Page 4: NoSQL Databases

4

RDBMS with Nextgen Apps – Failure

+Scale

Limit to maximum data supported

Sharding is an option, but then RDBMS features are lost

+Economy

Requires large arrays of fast, expensive disks

Very expensive

+Availability still an issue

Page 5: NoSQL Databases

5

NoSQL Databases

+Name is confusingNot RDBMS at all

NoREL Databases a better name

+Key Value Store

+Extremely scalable

+High performance

+Always available

+Weak Consistency (CAP Theorem)

+Distributed

Use commodity hardware - Cheap

+Might not hold ACID properties

+Only for specific Use – Not everything is good

Page 6: NoSQL Databases

RDBMS vs NoSQL Databases

+Go for RDBMS when

Small instances of simple straight forward systems

Joins, secondary indexing, referential integrity, group by/order by

+Go for NoSQL when

Data scale

Read/write scale

Data model isFlexible

Semi-structured

6

Page 7: NoSQL Databases

NoSQL Current Limitations

+Maturity

+Support

+Analytics & Business Intelligence

+Administration

+Ease of Use

7

Page 8: NoSQL Databases

Some famous NoSQL Databases

+Open-source

HBase

Cassandra

Voldemort

Dynomite

Hypertable

CouchDB

VPork

MongoDB

Riak

+Closed-source

BigTable

Dynamo

PNUTS

8

Page 9: NoSQL Databases

9

HBase

+Based on Google BigTable

+Sparse distributed persistent multi-dimensional sorted map

+On top of Hadoop HDFS

+Master Slave Model

Single Master (SPOF)

+Especially good when

Objects are huge

Data production/consumption is distributed and is tunneled through map/reduce

jobs

+Loose Data Model

Column Families

+Timestamp based versioning

+Not supported on Windows

+Major Users – Adobe, Twitter, Yahoo, Veoh, Streamy, Trend Micro

Page 10: NoSQL Databases

HBase Architecture & Table Structure

+Loosely based on Consistent Hashing

+Table made up of regions

Region specified by startkey and endkey

A region may live on a different node.

+Tables sorted by Rows

+Schema defines column families only

Each family consists of any no. of columns

Each column consists of any no. of versions

Columns within a family are sorted & stored together

+Everything except table name are byte[]

10

Page 11: NoSQL Databases

Connecting to Hbase

+Java Client APIHBaseConfiguration config = new HBaseConfiguration();

HTable table = new HTable(config, “table_name”);

Put p = new Put(Bytes.toBytes(“key”));

p.add(Bytes.toBytes(“key”), Bytes.toBytes(“column”), Bytes.toBytes(“value”));

table.put(p);

Get g = new Get(Bytes.toBytes(“key”));

Result r = table.get(g);

+HBase Shell$ ${HBASE_HOME}/bin/hbase shell

hbase> describe “table_name”

hbase> put “table_name", “key”, “columnfamily:columnname", "value“

hbase> get “table_name”, “key”

hbase> scan “table_name”

+Thrift Gateway

+REST Gateway

+Many other non-java clients

11

Page 12: NoSQL Databases

Cassandra

+Based on Amazon Dynamo

+Open sourced by Facebook in 2008

+Peer to Peer Model

No Master Node

+Works on Windows as well

+Distributed Key/Value Store

+Configurable parameters for Consistency/Availability

+Especially suited if

Number of Objects is huge

objects are of small sizes (<1 MB)

+Major Users: Facebook, Digg, Twitter etc.

12

Page 13: NoSQL Databases

13

NoSQL Databases – Assumptions

+Data size is huge

System must partition its data across multiple nodes

+Reliable

Data must be safe even when disks and nodes failSystem must replicate data

+Performance

Needs to perform well on cheap hardware and maintain low latency ALWAYS

Page 14: NoSQL Databases

14

NoSQL Databases – Design Strategies

+Complex Distributed System

+Partitioning

Consistent Hashing

+Consistency

Eventual Consistency

Vector Clocks

+Data ModelsPrimary Key -> Value

Value can be semi-structured

Multi-version Storage

+Storage Layouts

Column storage with Locality groups

Log structured Merge Trees

+Cluster Management

Peer to Peer vs Master/Slave approach

Gossip

Page 15: NoSQL Databases

15

References

+Bigtable: A Distributed Storage System for Structured Data

http://labs.google.com/papers/bigtable-osdi06.pdf

+Dynamo: Amazon's Highly Available Key-value Store

http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf

+NOSQL debrief, June 2009

http://static.last.fm/johan/nosql-20090611/intro_nosql.pdf

http://static.last.fm/johan/nosql-20090611/hbase_nosql.pdf

http://static.last.fm/johan/nosql-20090611/cassandra_nosql.ppt

+NoSQL Databases Official Site

http://nosql-database.org

+Hbase – Hadoop Wiki

http://wiki.apache.org/hadoop/Hbase

+Apache Cassandra Wikipedia

http://en.wikipedia.org/wiki/Apache_Cassandra

Page 16: NoSQL Databases

16

Questions + Answers

Page 17: NoSQL Databases

Thank You