NoSQL Databases
-
Upload
nas-tra -
Category
Engineering
-
view
171 -
download
1
description
Transcript of NoSQL Databases
![Page 1: NoSQL Databases](https://reader036.fdocuments.in/reader036/viewer/2022062319/554fb40db4c905ad218b5411/html5/thumbnails/1.jpg)
DatabasesEduard Tudenhöfner
![Page 2: NoSQL Databases](https://reader036.fdocuments.in/reader036/viewer/2022062319/554fb40db4c905ad218b5411/html5/thumbnails/2.jpg)
Overview
● Why NoSQL?● Classification● CAP Theorem● BASE vs ACID● Cassandra in Action● Summary
![Page 3: NoSQL Databases](https://reader036.fdocuments.in/reader036/viewer/2022062319/554fb40db4c905ad218b5411/html5/thumbnails/3.jpg)
Overview
● Why NoSQL?● Classification● CAP Theorem● BASE vs ACID● Cassandra in Action● Summary
![Page 4: NoSQL Databases](https://reader036.fdocuments.in/reader036/viewer/2022062319/554fb40db4c905ad218b5411/html5/thumbnails/4.jpg)
Why NoSQL?
● original intention: modern web-scale DBs○ amount of data drastically increased○ data in the web is less structured
● higher requirements regarding performance
● some problems are easier to solve without the relational approach
● scaling out & running on commodity HW is much cheaper than scaling up
![Page 5: NoSQL Databases](https://reader036.fdocuments.in/reader036/viewer/2022062319/554fb40db4c905ad218b5411/html5/thumbnails/5.jpg)
Typical Characteristics
● non-relational
● horizontally scalable
● flexible schema
● easy replication support
● simple API
● eventually consistent -> BASE principle
![Page 6: NoSQL Databases](https://reader036.fdocuments.in/reader036/viewer/2022062319/554fb40db4c905ad218b5411/html5/thumbnails/6.jpg)
![Page 7: NoSQL Databases](https://reader036.fdocuments.in/reader036/viewer/2022062319/554fb40db4c905ad218b5411/html5/thumbnails/7.jpg)
Overview
● Why NoSQL?● Classification● CAP Theorem● BASE vs ACID● Cassandra in Action● Summary
![Page 8: NoSQL Databases](https://reader036.fdocuments.in/reader036/viewer/2022062319/554fb40db4c905ad218b5411/html5/thumbnails/8.jpg)
Classification
source: http://blog.octo.com/wp-content/uploads/2012/07/QuadrantNoSQL.png
![Page 9: NoSQL Databases](https://reader036.fdocuments.in/reader036/viewer/2022062319/554fb40db4c905ad218b5411/html5/thumbnails/9.jpg)
Classification
source: http://www.sics.se/~amir/files/download/dic/NoSQL%20Databases.pdf
![Page 10: NoSQL Databases](https://reader036.fdocuments.in/reader036/viewer/2022062319/554fb40db4c905ad218b5411/html5/thumbnails/10.jpg)
Key/Value Stores
● data model: collection of key/value pairs
● keys and values can be complex compounds
● based on Amazon’s Dynamo Paper
● designed to handle massive load
![Page 11: NoSQL Databases](https://reader036.fdocuments.in/reader036/viewer/2022062319/554fb40db4c905ad218b5411/html5/thumbnails/11.jpg)
Key/Value Stores
● no complex query filters
● all joins must be in the code
● easy to distribute across cluster
● very predictable performance -> O(1)
![Page 12: NoSQL Databases](https://reader036.fdocuments.in/reader036/viewer/2022062319/554fb40db4c905ad218b5411/html5/thumbnails/12.jpg)
Wide Column Stores
● Tables are similar to RDBMS, but semi-structured
● based on Google’s BigTable
● Rows can have arbitrary columns
![Page 13: NoSQL Databases](https://reader036.fdocuments.in/reader036/viewer/2022062319/554fb40db4c905ad218b5411/html5/thumbnails/13.jpg)
Wide Column Stores -> BigTable
● <RowKey, ColumnKey, Timestamp> triple as key for lookups, inserts, deletes● ColumnKey uses syntax family:qualifier● arbitrary columns on a row-by-row basis● does not support a relational model
○ no table-wide integrity constraints○ no multi-row transactions
source: http://research.google.com/archive/bigtable.html
![Page 14: NoSQL Databases](https://reader036.fdocuments.in/reader036/viewer/2022062319/554fb40db4c905ad218b5411/html5/thumbnails/14.jpg)
Document Stores
● inspired by Lotus Notes
● central concept of a Document
● Documents encapsulate/encode data in some format/encoding
● Encodings:○ XML, YAML, JSON, BSON, PDF
![Page 15: NoSQL Databases](https://reader036.fdocuments.in/reader036/viewer/2022062319/554fb40db4c905ad218b5411/html5/thumbnails/15.jpg)
Document Stores
source: http://www.mongodb.org/
![Page 16: NoSQL Databases](https://reader036.fdocuments.in/reader036/viewer/2022062319/554fb40db4c905ad218b5411/html5/thumbnails/16.jpg)
Document Stores
source: http://www.mongodb.org/
![Page 17: NoSQL Databases](https://reader036.fdocuments.in/reader036/viewer/2022062319/554fb40db4c905ad218b5411/html5/thumbnails/17.jpg)
Graph Databases
● based on Graph Theory -> G = (V, E)
● designed for data that is well represented in a graph○ social networks, public transport links, network topologies, road maps
● nodes, edges, properties are used to represent and store data
● graph relationships are queryable
![Page 18: NoSQL Databases](https://reader036.fdocuments.in/reader036/viewer/2022062319/554fb40db4c905ad218b5411/html5/thumbnails/18.jpg)
Graph Databases
source: http://www.neo4j.org/
![Page 19: NoSQL Databases](https://reader036.fdocuments.in/reader036/viewer/2022062319/554fb40db4c905ad218b5411/html5/thumbnails/19.jpg)
Graph Databases
source: http://en.wikipedia.org/wiki/Graph_database
![Page 20: NoSQL Databases](https://reader036.fdocuments.in/reader036/viewer/2022062319/554fb40db4c905ad218b5411/html5/thumbnails/20.jpg)
Overview
● Why NoSQL?● Classification● CAP Theorem● BASE vs ACID● Cassandra in Action● Summary
![Page 21: NoSQL Databases](https://reader036.fdocuments.in/reader036/viewer/2022062319/554fb40db4c905ad218b5411/html5/thumbnails/21.jpg)
CAP Theorem source: http://blog.nahurst.com/visual-guide-to-nosql-systems
![Page 22: NoSQL Databases](https://reader036.fdocuments.in/reader036/viewer/2022062319/554fb40db4c905ad218b5411/html5/thumbnails/22.jpg)
Overview
● Why NoSQL?● Classification● CAP Theorem● BASE vs ACID● Cassandra in Action● Summary
![Page 23: NoSQL Databases](https://reader036.fdocuments.in/reader036/viewer/2022062319/554fb40db4c905ad218b5411/html5/thumbnails/23.jpg)
ACID
● Atomicity○ all-or-nothing approach
● Consistency○ DB will be in a consistent state before & after a transaction
● Isolation○ transaction will behave as if it’s the only operation being performed upon the
DB● Durability
○ once a transaction is committed, it is durably preserved
● CA-Systems are ACID-Systems
![Page 24: NoSQL Databases](https://reader036.fdocuments.in/reader036/viewer/2022062319/554fb40db4c905ad218b5411/html5/thumbnails/24.jpg)
BASE
● an application that works basically all the time, does not have to be consistent all the time, but will be in some known state eventually
● Basically Available○ achieved by using a highly distributed approach
● Soft State○ state of the system is always “soft” due to eventual consistency
● Eventual Consistency (in German: schlussendliche Konsistenz)○ at some point in the future, the data will be consistent○ no guarantees are made about when this will occur
![Page 25: NoSQL Databases](https://reader036.fdocuments.in/reader036/viewer/2022062319/554fb40db4c905ad218b5411/html5/thumbnails/25.jpg)
BASE vs ACID
source: http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf
![Page 26: NoSQL Databases](https://reader036.fdocuments.in/reader036/viewer/2022062319/554fb40db4c905ad218b5411/html5/thumbnails/26.jpg)
Overview
● Why NoSQL?● Classification● CAP Theorem● BASE vs ACID● Cassandra in Action● Summary
![Page 27: NoSQL Databases](https://reader036.fdocuments.in/reader036/viewer/2022062319/554fb40db4c905ad218b5411/html5/thumbnails/27.jpg)
Cassandra
● initially created by Facebook for Inbox Search
● distributed, horizontally scalable database
● high availability
● very flexible data model○ data might be structured, semi-structured, unstructured
● commercial support through DataStax
![Page 28: NoSQL Databases](https://reader036.fdocuments.in/reader036/viewer/2022062319/554fb40db4c905ad218b5411/html5/thumbnails/28.jpg)
Cassandra - Design
● all nodes are equally important
● no Single-Point-of-Failure
● no central controller
● no master/slave relationships
● every node knows how to route requests and where the data lives
source: http://cassandra.apache.org/
![Page 29: NoSQL Databases](https://reader036.fdocuments.in/reader036/viewer/2022062319/554fb40db4c905ad218b5411/html5/thumbnails/29.jpg)
Scales Linearly
source: http://www.datastax.com
![Page 30: NoSQL Databases](https://reader036.fdocuments.in/reader036/viewer/2022062319/554fb40db4c905ad218b5411/html5/thumbnails/30.jpg)
Uses Consistent Hashing
Murmur3Partitioner generates hash
source: http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureDataDistributeHashing_c.html
![Page 31: NoSQL Databases](https://reader036.fdocuments.in/reader036/viewer/2022062319/554fb40db4c905ad218b5411/html5/thumbnails/31.jpg)
Uses Consistent Hashing
source: http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureDataDistributeHashing_c.html
![Page 32: NoSQL Databases](https://reader036.fdocuments.in/reader036/viewer/2022062319/554fb40db4c905ad218b5411/html5/thumbnails/32.jpg)
Writes are very fast
● All writes are sequential● no reading & seeking before a
write● Each of the N node will perform
the following upon receiving the RowMutation message:○ Append write to the commit log○ Update in-memory Memtable data
structure○ Write is done!
● If Memtable gets full, it’s flushed to disk (SSTable)
source: http://www.roman10.net/how-apache-cassandra-write-works/
![Page 33: NoSQL Databases](https://reader036.fdocuments.in/reader036/viewer/2022062319/554fb40db4c905ad218b5411/html5/thumbnails/33.jpg)
Write Requests
● Client requests can go to any node in the cluster because all nodes are peers
source: http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureClientRequestsWrite.html
write consistency level is configurable
![Page 34: NoSQL Databases](https://reader036.fdocuments.in/reader036/viewer/2022062319/554fb40db4c905ad218b5411/html5/thumbnails/34.jpg)
Write Requests
● Cassandra chooses one Coordinator per remote data center to handle requests to replicas
● coordinator only needs to forward WR to one node in each remote data center
source: http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureClientRequestsWrite.html
![Page 35: NoSQL Databases](https://reader036.fdocuments.in/reader036/viewer/2022062319/554fb40db4c905ad218b5411/html5/thumbnails/35.jpg)
Read Requests
● Two different types of Read Requests○ direct read request (RR)○ background read repair request (RRR)
● number of replicas contacted by a RR is determined by Consistency Level
● RRR are sent to any additional nodes that did not get a direct RR
● RRR ensure consistency
![Page 36: NoSQL Databases](https://reader036.fdocuments.in/reader036/viewer/2022062319/554fb40db4c905ad218b5411/html5/thumbnails/36.jpg)
Read Requests
source: http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureClientRequestsRead_c.html
![Page 37: NoSQL Databases](https://reader036.fdocuments.in/reader036/viewer/2022062319/554fb40db4c905ad218b5411/html5/thumbnails/37.jpg)
Read Requests
source: http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureClientRequestsRead_c.html
2 of the 3 replicas for the given row must respond to fulfill the read request
![Page 38: NoSQL Databases](https://reader036.fdocuments.in/reader036/viewer/2022062319/554fb40db4c905ad218b5411/html5/thumbnails/38.jpg)
Read Requestssource: http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureClientRequestsRead_c.html
![Page 39: NoSQL Databases](https://reader036.fdocuments.in/reader036/viewer/2022062319/554fb40db4c905ad218b5411/html5/thumbnails/39.jpg)
CQL
● very similar to SQL● does not support JOINS / Subqueries● no referential integrity● no cascading operations
We denormalize the data because joins are not performant in a distributed system
![Page 40: NoSQL Databases](https://reader036.fdocuments.in/reader036/viewer/2022062319/554fb40db4c905ad218b5411/html5/thumbnails/40.jpg)
CQL
![Page 41: NoSQL Databases](https://reader036.fdocuments.in/reader036/viewer/2022062319/554fb40db4c905ad218b5411/html5/thumbnails/41.jpg)
CQL
no index, no service :)
![Page 42: NoSQL Databases](https://reader036.fdocuments.in/reader036/viewer/2022062319/554fb40db4c905ad218b5411/html5/thumbnails/42.jpg)
CQL - Collections
● CQL introduced collections to columns○ list○ map○ set
● Add new collections to the previous example
![Page 43: NoSQL Databases](https://reader036.fdocuments.in/reader036/viewer/2022062319/554fb40db4c905ad218b5411/html5/thumbnails/43.jpg)
CQL - Collections
![Page 44: NoSQL Databases](https://reader036.fdocuments.in/reader036/viewer/2022062319/554fb40db4c905ad218b5411/html5/thumbnails/44.jpg)
Cassandra vs MySQL (50GB)
● MySQL○ writes avg: ~300ms○ reads avg: ~350ms
● Cassandra○ writes avg: ~0.12ms○ reads avg: ~15ms
source: http://www.odbms.org/wp-content/uploads/2013/11/cassandra.pdf
![Page 45: NoSQL Databases](https://reader036.fdocuments.in/reader036/viewer/2022062319/554fb40db4c905ad218b5411/html5/thumbnails/45.jpg)
Overview
● Why NoSQL?● Classification● CAP Theorem● BASE vs ACID● Cassandra in Action● Summary
![Page 46: NoSQL Databases](https://reader036.fdocuments.in/reader036/viewer/2022062319/554fb40db4c905ad218b5411/html5/thumbnails/46.jpg)
Summary
● elastic scaling (scaling out instead of up)● huge amounts of data can be handled while maintaining high
throughput rates● require less DBA’s and management resources
○ automatic repairs/data distribution○ simpler data models
● better economics○ cost per GB is much lower than for RDBMS due to clusters of
commodity HW○ we handle more data with less money
● flexible data models○ very relaxed or even non-existent data model restrictions○ changes to data model are much cheaper
![Page 47: NoSQL Databases](https://reader036.fdocuments.in/reader036/viewer/2022062319/554fb40db4c905ad218b5411/html5/thumbnails/47.jpg)
Summary
● might not be mature enough for enterprises● compatibility issues regarding standards
○ each DB has its own API○ not easy to switch to another NoSQL DB
● search support is not the same as in RDBMS● easier to find experienced RDBMS experts than NoSQL experts
![Page 48: NoSQL Databases](https://reader036.fdocuments.in/reader036/viewer/2022062319/554fb40db4c905ad218b5411/html5/thumbnails/48.jpg)
Which DB for which purpose?
● NoSQL is an alternative○ addresses certain limitations of the relational DB world
● depends on characteristics of data○ if data is well structured -> relational DB might be better○ if data is very complex -> might be difficult to map it to the
relational model● depends on volatility of the data model
○ what if schema changes daily?● relational DBs still have their pluses
○ relational model / transactions / query language○ should be used when multi-row transactions and strict consistency is
required
![Page 49: NoSQL Databases](https://reader036.fdocuments.in/reader036/viewer/2022062319/554fb40db4c905ad218b5411/html5/thumbnails/49.jpg)
Thank you! - Questions?