Download - Lviv EDGE 2 - NoSQL

NoSQL

By Zenyk MatchyshynStaff Engineer, Lohika

1

Agenda

• History

• Architecture vs Technology

• Classification

• Pros and Cons of usage

• Trends

• Q/A

2

HISTORY

3

History

• NoSQL Technologies are not new

• Many ideas originate from distributed computing, grid computing and parallel computing

• Main drivers:

• Scalability

• Parallelization

• Costs

5

Google• In the beginning… there was Google!

• Google shared scientific papers:

• “The Google File System”, October 2003

• “MapReduce: Simplified Data Processing on Large Clusters”, December 2004

• “Bigtable: A Distributed Storage System for Structured Data”, November 2006

• “The Chubby Lock Service for Loosely-Coupled Distributed Systems”, November 2006

6

Amazon

• … and Amazon!

• “Dynamo: Amazon Highly Available key/value Store”, October 2007

7

New technologies!

• Creators of Lucene wanted to create a full search solution

• Ended up with Hadoop and Hadoop Distributed File System (HDFS)

• Success helped adoption and new solutions emerged

8

ARCHITECTURE VS TECHNOLOGY

9

Architecture vs Technology

• SQL is not bad, it’s just different

• You can use SQL DB in NoSQL way, e.g. MySQL as a key-value database

• You can do SQL queries on Hadoop data

10

Architecture

• The way you store data

• The way you query data

• Technology environment

11

CLASSIFICATION

12

Terms

• ACID – Atomicity, Consistency, Isolation, Durability

• CAP Theorem – Consistency, Availability, Partition tolerance

• Eventual consistency

• Hashing

• Schema

13

Classification

• Column oriented stores

• Key/Value stores

• Key/Value stores with configurable consistency

• Document stores

• Graph stores

14

Chart

mem-cached Key/value Column

oriented Document store

RDBMS

Depth of Functionality

Scala

bilit

y &

Perf

orm

an

ce

15

Column oriented

• Based on Google Bigtable

• Column oriented is a revers of Row oriented

• Assumption is that datacenters are transcontinental and connected using standard Internet

• C and P from CAP Theorem

• Data consistent and partitioned but trouble with availability

16

HBase• Spin off from Hadoop project -

http://hbase.apache.org/

• Written in Java

• A lot of interfaces – Thrift, REST, JRuby, etc.

• SQL-like access through Hive - http://hive.apache.org/

• HBase ORM – Surus - https://github.com/mushkevych/surus

• Used by Facebook, Hulu, Yahoo!, Ning, etc. 17



https://github.com/mushkevych/surus

https://github.com/mushkevych/surus

Hypertable

• Developed by Zvents, open sourced

• Written in C++

• Running on top of distributed file system

• Used by Baidu

18

Key/Value

• Key/Value Store – Oracle Berkley DB (Oracle NoSQL), Redis, Kyoto Cabinet

• Can store strings, arrays, hashes

19

Oracle NoSQL

• Sign of things to come!

• http://www.oracle.com/technetwork/database/nosqldb/overview/index.html

• Written in Java

• Configurable consistency

• BerkleyDB as a backend

• No single node of failure

• Transactions

20

Redis

• http://redis.io/

• Lots of bindings

• Written in C

• In-memory, with optional durability

• Also a document store

21

Key/Value – eventual consistency

• K/V Availability over Consistency

• Inspired by Amazon Dynamo

• Dynamo based on assumption of high speed network links between data centers and datacenters are close to each other

• A and P from CAP Theorem

• Achieve eventual consistency through replication and verification

• Consistency is eventual 22

Cassandra

• http://cassandra.apache.org/

• Multidimensional map indexed by key

• No single point of failure

• Decentralized

• Tunable consistency

• Used by Facebook, Cisco, IBM, Rackspace

23

Voldemort

• http://project-voldemort.com/

• Developed by LinkedIn

• Written in Java

• Developers oriented – a lot of modules are pluggable

• Strictly key/value

24

Document stores

• Document Databases

• Document oriented stores are semi structured

• Mostly JSON oriented

• Also called schema free rows

• Can query by field

25

MongoDB

• http://www.mongodb.org/

• Schema-free, document-oriented

• Written in C++

• Lots of interfaces

• JSON documents

• Query language, supports indexing

• Map/Reduce

26

CouchDB

• http://couchdb.apache.org/

• RESTful API

• JSON documents

• Written in Erlang

• Supports ACID

• Map/Reduce

• Eventual consistency

27

http://couchdb.apache.org/

http://couchdb.apache.org/

Graph

• Provide ways to store graphs

• Provide traversing

• Graph oriented functionality

28

Neo4j

• http://neo4j.org/

• Written in Java

• Stores and navigates graphs

• Stable and proven

• Commercial and free licenses

29

PROS AND CONS OF USAGE

30

Pros and Cons

• Scalability

• Transactional Integrity and Consistency

• Data Modeling

• Query Support

• Access and Interface Availability

31

Typical Usage

• Large amount of data

• Read/Write balanced?

• Read Heavy

• Write Heavy

• Scan

• Geospatial

• Map/Reduce

• Social data32

Is it for you?

• Technology is still developing

• Be ready to patch

• SQL is easier

• Not all startups will end up being Facebooks

• Some things can be solvable only with NoSQL

33

TRENDS

34

Trends

• Oracle released Oracle NoSQL!

• Adoption of Hadoop soars

• SQL like access to NoSQL stores taking form – UnSQL - http://www.unqlspec.org/display/UnQL/Home

• You can participate!

35

http://www.unqlspec.org/display/UnQL/Home

http://www.unqlspec.org/display/UnQL/Home

Opportunities

• Spring Data - http://www.springsource.org/spring-data

• Cloud Foundry PaaS - http://www.cloudfoundry.com/

• ORM/Simplification

36

http://www.springsource.org/spring-data

http://www.springsource.org/spring-data

http://www.cloudfoundry.com/

http://www.cloudfoundry.com/

Q/A ?37