NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS...

29
NoSQL Or Peles

Transcript of NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS...

Page 1: NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...

NoSQL

Or Peles

Page 2: NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...

What is NoSQL

• A collection of various technologies meant to work around RDBMS limitations (mostly performance)

• Not much of a definition...

Page 3: NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...

RDBMS Limitations

• Hard to scale horizontally (for updates)– Distributed ACID requires 2 phase commit.

• Schema can be a bitch– Hard to change.– Data normalization can slow down queries.

Page 4: NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...

Web Scale

• Some numbers:– Youtube serves over 100MM videos a day.– Ebay adds over 10TB of storage every week.– Facebook holds over 80 Billion photos, and serves

hundreds of thousands of requests/second.

Page 5: NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...

Ideal System

• Available – can always read and write.• Consistent – Reads always pick up the latest

write.• Partition tolerant – The system can be split

across multiple machines and datacenters.

Page 6: NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...

Starbucks doesn’t use two phase commit

• A great example presented here.• Asynchronous execution• Correlation• Exception handling:– Write off– Retry– Compensation

• 2 phase commit would create a choke point.

Page 7: NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...

CAP Theorem

• CAP (Eric Brewer, 2000):Simply put, of the following 3 properties: • Consistency• Availability• Partition tolerance

Only two can hold at any system.

Page 8: NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...

CAP in practice

Page 9: NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...

• CATwo phase commits, works best at a single data

center. Scaling issues.• CP

Sharding. Data may become unavailable if a shard fails.

• APMay return inaccurate data. DNS is a prime example.

Page 10: NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...

Consistency Types

• Strict• Eventual– Causual– Read your writes– Session– Monotonic read– Monotonic write

Page 11: NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...

Concepts

• In memory vs. disk based.• Shared everything vs. shared nothing.• Master slave vs. server symmetry.• Elastic scalability.• MapReduce.

Page 12: NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...

Sharding

• Split data across machines (database instances).– Feature based sharding.– Key based sharding.– Lookup table.

Page 13: NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...

NoSQL Categories

• Key-Value stores• Document store• Tabular

Page 14: NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...

Lean & MeanThe Key-Value In-Memory DBs

• In memory DBs are simpler and faster than their on-disk counterparts.

• Key value stores offer a simple interface with no schema.

• Major limitation – data size is limited to RAM size.

• Often used as caches for on-disk DB systems.

Page 15: NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...

Open Source In-Memory DBs

• Memcached/MemchachedDB• Redis– Both are key-value stores that rely on hash

partitioning– Memcached is an LRU based cache.– Redis is more of a data structure server.– Both use a Shared-Nothing architecture

Page 16: NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...

Memcached

• Really a giant, distributed hash table.• Advantages:– Relatively simple– Practically no server to server talk.– Linear scalability

• Disadvantages:– Doesn’t understand data – no server side operations.

The key and value are always strings.– It’s really meant to only be a cache – no more, no less.– No recovery, limited elasticity.

Page 17: NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...

Redis

• Like Memcached, it’s a distributed hash in memory.

• Offers support for lists and sets, as well as strings.

• Offers limited server side operations.• Supports master-slave architecture and data

replicas for scalability and high availability. • Also supports a persistent mode that writes to

disk.

Page 18: NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...

Document Stores

• As the name implies, these databases store documents.

• Usually schema-free. The same database can store multiple documents.

• Allow indexing based on document content.• Prominent examples: CouchDB, MongoDB.

Page 19: NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...

Documents

• A document is just a collection of values, usually serialized in JSON.

• Many implementations offer nesting of documents

• Example:{ "username" : "bob", "address" : { "street" : "123 Main Street", "city" :

"Springfield", "state" : "NY" } }

Page 20: NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...

CouchDB

• Written in ERLANG.• Offers ACID guarantees based on multi-version

control.• Supports replication, but isn’t a real

distributed database.

Page 21: NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...

MongoDB

• Written in C++.• Atomic operations on single documents only.• Excellent scalability based on sharding.• Support for server side javascript and

MapReduce.

Page 22: NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...

Tabular stores

• The original: Google’s BigTable– Proprietary, not open source.

• The open source elephant alternative – Hadoop with HBase.

• A top level Apache Project.• Large number of users.• Contains a distributed file system, MapReduce, a

database server (Hbase), and more.• Rack aware.

Page 23: NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...

Hadoop components

Page 24: NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...

Hadoop basic components

• At it’s core, Hadoop is a framework for running MapReduce operations on large data sets.

• The data sets are placed as text files on the distributed file system.

Page 25: NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...

Hadoop MapReduce Flow

Page 26: NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...

HBase

• A database engine built on top of Hadoop distributed file system.

• Scales up to Billions of rows with Millions of columns.

• Has a Java interface for queries.

Page 27: NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...

The Tradeoff – SQL vs. NoSQL

• RDBMS:– Mature.– Standard SQL (but not for DDL, extensions).– Robust tools.

• NoSQL:– Scale– Schemaless

Page 28: NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...

References

• Eventual Consistency (Werner Vogels, CTO, Amazon)• Starbucks doesn’t use two phase commit.• Hadoop the definitive guide (O’Reilly) • MongoDB the definitive guide (O’Reilly)• Many wiki pages

Page 29: NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...

Questions