NoSQL
By Zenyk MatchyshynStaff Engineer, Lohika
1
Agenda
• History
• Architecture vs Technology
• Classification
• Pros and Cons of usage
• Trends
• Q/A
2
HISTORY
3
4
History
• NoSQL Technologies are not new
• Many ideas originate from distributed computing, grid computing and parallel computing
• Main drivers:
• Scalability
• Parallelization
• Costs
5
Google• In the beginning… there was Google!
• Google shared scientific papers:
• “The Google File System”, October 2003
• “MapReduce: Simplified Data Processing on Large Clusters”, December 2004
• “Bigtable: A Distributed Storage System for Structured Data”, November 2006
• “The Chubby Lock Service for Loosely-Coupled Distributed Systems”, November 2006
6
Amazon
• … and Amazon!
• “Dynamo: Amazon Highly Available key/value Store”, October 2007
7
New technologies!
• Creators of Lucene wanted to create a full search solution
• Ended up with Hadoop and Hadoop Distributed File System (HDFS)
• Success helped adoption and new solutions emerged
8
ARCHITECTURE VS TECHNOLOGY
9
Architecture vs Technology
• SQL is not bad, it’s just different
• You can use SQL DB in NoSQL way, e.g. MySQL as a key-value database
• You can do SQL queries on Hadoop data
10
Architecture
• The way you store data
• The way you query data
• Technology environment
11
CLASSIFICATION
12
Terms
• ACID – Atomicity, Consistency, Isolation, Durability
• CAP Theorem – Consistency, Availability, Partition tolerance
• Eventual consistency
• Hashing
• Schema
13
Classification
• Column oriented stores
• Key/Value stores
• Key/Value stores with configurable consistency
• Document stores
• Graph stores
14
Chart
mem-cached Key/value Column
oriented Document store
RDBMS
Depth of Functionality
Scala
bilit
y &
Perf
orm
an
ce
15
Column oriented
• Based on Google Bigtable
• Column oriented is a revers of Row oriented
• Assumption is that datacenters are transcontinental and connected using standard Internet
• C and P from CAP Theorem
• Data consistent and partitioned but trouble with availability
16
HBase• Spin off from Hadoop project -
http://hbase.apache.org/
• Written in Java
• A lot of interfaces – Thrift, REST, JRuby, etc.
• SQL-like access through Hive - http://hive.apache.org/
• HBase ORM – Surus - https://github.com/mushkevych/surus
• Used by Facebook, Hulu, Yahoo!, Ning, etc. 17
Hypertable
• Developed by Zvents, open sourced
• Written in C++
• Running on top of distributed file system
• Used by Baidu
18
Key/Value
• Key/Value Store – Oracle Berkley DB (Oracle NoSQL), Redis, Kyoto Cabinet
• Can store strings, arrays, hashes
19
Oracle NoSQL
• Sign of things to come!
• http://www.oracle.com/technetwork/database/nosqldb/overview/index.html
• Written in Java
• Configurable consistency
• BerkleyDB as a backend
• No single node of failure
• Transactions
20
Redis
• http://redis.io/
• Lots of bindings
• Written in C
• In-memory, with optional durability
• Also a document store
21
Key/Value – eventual consistency
• K/V Availability over Consistency
• Inspired by Amazon Dynamo
• Dynamo based on assumption of high speed network links between data centers and datacenters are close to each other
• A and P from CAP Theorem
• Achieve eventual consistency through replication and verification
• Consistency is eventual 22
Cassandra
• http://cassandra.apache.org/
• Multidimensional map indexed by key
• No single point of failure
• Decentralized
• Tunable consistency
• Used by Facebook, Cisco, IBM, Rackspace
23
Voldemort
• http://project-voldemort.com/
• Developed by LinkedIn
• Written in Java
• Developers oriented – a lot of modules are pluggable
• Strictly key/value
24
Document stores
• Document Databases
• Document oriented stores are semi structured
• Mostly JSON oriented
• Also called schema free rows
• Can query by field
25
MongoDB
• http://www.mongodb.org/
• Schema-free, document-oriented
• Written in C++
• Lots of interfaces
• JSON documents
• Query language, supports indexing
• Map/Reduce
26
CouchDB
• http://couchdb.apache.org/
• RESTful API
• JSON documents
• Written in Erlang
• Supports ACID
• Map/Reduce
• Eventual consistency
27
Graph
• Provide ways to store graphs
• Provide traversing
• Graph oriented functionality
28
Neo4j
• http://neo4j.org/
• Written in Java
• Stores and navigates graphs
• Stable and proven
• Commercial and free licenses
29
PROS AND CONS OF USAGE
30
Pros and Cons
• Scalability
• Transactional Integrity and Consistency
• Data Modeling
• Query Support
• Access and Interface Availability
31
Typical Usage
• Large amount of data
• Read/Write balanced?
• Read Heavy
• Write Heavy
• Scan
• Geospatial
• Map/Reduce
• Social data32
Is it for you?
• Technology is still developing
• Be ready to patch
• SQL is easier
• Not all startups will end up being Facebooks
• Some things can be solvable only with NoSQL
33
TRENDS
34
Trends
• Oracle released Oracle NoSQL!
• Adoption of Hadoop soars
• SQL like access to NoSQL stores taking form – UnSQL - http://www.unqlspec.org/display/UnQL/Home
• You can participate!
35
Opportunities
• Spring Data - http://www.springsource.org/spring-data
• Cloud Foundry PaaS - http://www.cloudfoundry.com/
• ORM/Simplification
36
Q/A ?37
Top Related