Big Data And No SQL

50
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 1 DISCLAIMER All views / opinions expressed in this presentation are based on my understanding of the information that I have gathered.

Transcript of Big Data And No SQL

Page 1: Big Data And No SQL

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 1

DISCLAIMER

All views / opinions expressed in this presentation are based on my understanding of the information

that I have gathered.

Page 2: Big Data And No SQL

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 2

No SQL

- Bansi Haudakari

This space is for the video image, pls leave this

space blank. Also, please add your email id, as a footer, in every slide

Page 3: Big Data And No SQL

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 3

Agenda ■ My First Love with SQL ■ Romance with SQL ■ CAP Theorem ■ Dating NO SQL ■ NO SQL Types

■ Key-Value ■ Document ■ Column Family

This space is for the video image, pls leave this

space blank. Also, please add your email id, as a footer, in every slide

Page 4: Big Data And No SQL

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 4

3 Generations of Databases

This space is for the video image, pls leave this

space blank. Also, please add your email id, as a footer, in every slide

RDBMS For Transactions, Data Warehouse for

Analytics, NO SQL for What?

Page 5: Big Data And No SQL

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 5

RDBMS Recap

This space is for the video image, pls leave this

space blank. Also, please add your email id, as a footer, in every slide

■ Data is stored in rows ■ Data Model i.e. schema pre-defined before you add

data ■ Joins are used to merge the data from multiple tables ■ SQL to query database ■ Pros

▪ mature ACID transactions with fine-grain control ■ Cons

▪ requires up-front data model ▪ doesn’t scale well ■ Key Players : Oracle, MS SQL Server

Page 6: Big Data And No SQL

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 6

My Love with SQL

This space is for the video image, pls leave this

space blank. Also, please add your email id, as a footer, in every slide

■ Queries▪ Filters using WHERE clause▪ Sorting using ORDER BY clause

▪ Subsets using CONTAINS /ANY clause ■ Transactions▪ Start txn▪ Select Stmts followed by Update Stmts▪ Commit ■ Consistency

▪ Master + Synchronous Replication ■ Schema Strictly enforced

Page 7: Big Data And No SQL

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 7

Impedance Mismatch

This space is for the video image, pls leave this

space blank. Also, please add your email id, as a footer, in every slide

■ Key Players▪ Hibernate, Ruby On

Rails

Page 8: Big Data And No SQL

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 8

Web Services

This space is for the video image, pls leave this

space blank. Also, please add your email id, as a footer, in every slide

Page 9: Big Data And No SQL

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 9

Big Data - Tree of Data Types

This space is for the video image, pls leave this

space blank. Also, please add your email id, as a footer, in every slide

Page 10: Big Data And No SQL

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 10

This space is for the video image, pls leave this

space blank. Also, please add your email id, as a footer, in every slide

RDBMS Crumbles

Page 11: Big Data And No SQL

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 11

Big Data : Storage

This space is for the video image, pls leave this

space blank. Also, please add your email id, as a footer, in every slide

■ Big Data impossible to store using RDBMS▪ Too big, too fast for RDBMS to ingest▪ RDBMS needs “schema before write”▪ Unknown structures = “schema during read”

■ So what is limiting RDBMS?▪ ACID requirement drives “protection” mechanism▪ Redo and Undo in Oracle provides ACID▪ Easy to get “small bits”; hard to get “large pieces”

Page 12: Big Data And No SQL

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 12

Big Data : Storage

This space is for the video image, pls leave this

space blank. Also, please add your email id, as a footer, in every slide

■ Architectural Limitations▪ The scale-up, master-slave, non-distributed

architecture of RDBMS was never designed for use cases of both big and fast data, which comes in from millions of user interfaces from many different locations.

■ Data Model Limitations▪ RDBMS data model is not flexible enough to handle

modern online application use cases that contain a mixture of structured, semi-structured, and unstructured data

Page 13: Big Data And No SQL

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 13

Big Data : RDBMS Storage Pitfalls■ RDBMS’ are essentially ACID ▪ “In partitioned databases, trading some consistency foravailability can lead to dramatic improvements in

scalability”▪ ACID is pessimistic; enforces consistency at the end of aTransaction■ New concept! BASE (Basically Available, Soft state,

Eventual Consistency)▪ BASE is optimistic; accepts eventual consistency

This space is for the video image, pls leave this

space blank. Also, please add your email id, as a footer, in every slide

Page 14: Big Data And No SQL

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 14

Big Data – Storage ■ Large amounts of data coming in - Amazon &

Google▪ Scale-Up didn’t worked for them. ▪ Developed their own data stores different from

relational■ Google creates BigTable (~ 2006)■ Amazon creates Dynamo (~ 2007)▪ Inspires a new set of alternate data storage projects which led to NO SQL

movement

This space is for the video image, pls leave this

space blank. Also, please add your email id, as a footer, in every slide

Page 15: Big Data And No SQL

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 15

What is NoSQL?■ #nosql – Twitter hash tag to organize meet-up for

“open-source, distributed, non-relational databases” ■ NoSQL – Loosely it is “Not Only SQL” (i.e. NOSQL)■ NoSQL does NOT always mean Big Data▪ But Big Data stores are almost always No SQL based▪ That is, if you count Hadoop as a NoSQL datastore *

This space is for the video image, pls leave this

space blank. Also, please add your email id, as a footer, in every slide

Page 16: Big Data And No SQL

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 16

No SQL Characteristics

This space is for the video image, pls leave this

space blank. Also, please add your email id, as a footer, in every slide

■ ACID is “relaxed” ■ Non –relational■ Schema-less■ Cluster-friendly : Ability to run on large clusters■ Open Source■ 21st century web■ Use different data models compared to relational model

Page 17: Big Data And No SQL

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 17

No SQL Characteristics

This space is for the video image, pls leave this

space blank. Also, please add your email id, as a footer, in every slide

■ BASE: A “weaker” concurrency model than the ACID transactions in most SQL systems –Basically Available – partial system failures are OK–Soft state – inconsistency is OK–Eventual consistency – stale data is OK

■ Scalability : The ability to horizontally scale simple-operation throughput over many servers

■ Distributed environment : Efficient use of distributed indexes and RAM for data storage

■ Schema-less : The ability to dynamically define new attributes or data schema

Page 18: Big Data And No SQL

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 18

CAP Theorem

This space is for the video image, pls leave this

space blank. Also, please add your email id, as a footer, in every slide

It is impossible for a distributed systems to simultaneously provide all three of the following guarantees:• ■ (C)onsistency–▪ all nodes see the same data at the same time even

with concurrent updates • ■ High (A)vailability–▪ a guarantee that every request receives a response

about whether it was successful or failed• ■ (P)artition Tolerance–▪ the system continues to operate Despite arbitrary message loss or failure of part of the system

Page 19: Big Data And No SQL

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 19

This space is for the video image, pls leave this

space blank. Also, please add your email id, as a footer, in every slide

Page 20: Big Data And No SQL

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 20

This space is for the video image, pls leave this

space blank. Also, please add your email id, as a footer, in every slide

No SQL Databases

Page 21: Big Data And No SQL

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 21

Key-Value : Locker Metaphor

This space is for the video image, pls leave this

space blank. Also, please add your email id, as a footer, in every slide

Page 22: Big Data And No SQL

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 22

Key-Value Stores

This space is for the video image, pls leave this

space blank. Also, please add your email id, as a footer, in every slide

Page 23: Big Data And No SQL

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 23

Key-Value Stores■ An Hash Table of keys & values

■ A “blob” of data (“Value”) indexed and accessed via a “Key”

“■ Value” part also known as Aggregate and can contain any type of data (images , video, collections, objects) ■ Pros

■ Blazingly fast & easy to scale (no joins) ■ Simple API : put, get, delete

■ Cons ■ no way to query based on the content

of the value

This space is for the video image, pls leave this

space blank. Also, please add your email id, as a footer, in every slide

Page 24: Big Data And No SQL

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 24

No subset queries in Key-Value Stores

This space is for the video image, pls leave this

space blank. Also, please add your email id, as a footer, in every slide

Page 25: Big Data And No SQL

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 25

Types of Key-Value Stores■ Eventual Consistent K-V Store■ Hierarchical K-V Store■ K-V Stores in RAM : Memcache, Redis ■ K-V Stores on Disk : Dynamo DB■ High Availability K-V Store : Riak■ Ordered K-V Store

This space is for the video image, pls leave this

space blank. Also, please add your email id, as a footer, in every slide

Page 26: Big Data And No SQL

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 26

Memcache ■ Open source in-memory Key Value Caching

system■ Effectively use for highly distributed systems as a

cache■ RAM resident K-V store for small chunks of data

■ 30ms read times . Use in Read Intensive systemsThis space is for the video

image, pls leave this space blank. Also, please

add your email id, as a footer, in every slide

Page 27: Big Data And No SQL

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 27

Riak ■ Open source distributed Key Value Store

■ A Dynamo inspired database■ Supports Replication and Auto-Sharding

■ Supports Map Reduce, Full Text Search and Secondary Indexes ■ Focuses on providing HA,

Scalability & Fault Tolerance

This space is for the video image, pls leave this

space blank. Also, please add your email id, as a footer, in every slide

Page 28: Big Data And No SQL

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 28

Redis ■ Open source in-memory Key Value Store

■ Stores simple lists, sets and hashes ■ Focuses on high speed Reads & Writes of

common data structures to RAM ■ Supports features like – expiration, transactions,

pub/sub, partionning

This space is for the video image, pls leave this

space blank. Also, please add your email id, as a footer, in every slide

Page 29: Big Data And No SQL

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 29

Amazon Dynamo Db ■ Scalable Key Value Store

■ SSD only database service ■ Focuses on throughput not storage & predictable

read/write times

This space is for the video

image, pls leave this space blank. Also, please

add your email id, as a footer, in every slide

Page 30: Big Data And No SQL

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 30

Key-Value stores■ CAP theorem : great around the Availability and Partition

aspects but definitely lack in Consistency.■ NO atomicity, or consistency when multiple transactions

are executed simultaneously)■ as the volume of data increases, maintaining unique

values as keys may become more difficult; ■ Examples of K-V store: Riak and Amazon Dynamo

This space is for the video image, pls leave this

space blank. Also, please add your email id, as a footer, in every slide

Page 31: Big Data And No SQL

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 31

■ Typical Use cases▪ Shines when you need simple GET/PUT operations▪ Session state; Tokens – Enables web-scale▪ User profiles and preferences – Typically latent caching

layer▪ Latency bridge: Support RYOW’s in some cases

■ Anti-patterns▪ No ad-hoc query patterns - (i.e. need key to access)▪ Not meant for analytics type workload▪ When multi-key/multi-operation consistency is required▪ Set based operations (i.e. related data)

This space is for the video image, pls leave this

space blank. Also, please add your email id, as a footer, in every slide

Page 32: Big Data And No SQL

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 32

Document stores

This space is for the video image, pls leave this

space blank. Also, please add your email id, as a footer, in every slide

Page 33: Big Data And No SQL

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 33

■ Use JSON or XML to store documents ■ Attributes of the document or sub-trees can be queried using XQuery ■ There is NO Object-relational mapping ■ Data stored in nested-hierarchies i.e. nested documents

■ Pros: ▪ NO ORM Layer. i.e. document stores need no translation ▪ Documents in Application▪ Documents in Database ▪ NO object middle tier or “object shredding” or re-assembly ▪ Ideal For Search This space is for the video

image, pls leave this space blank. Also, please

add your email id, as a footer, in every slide

Page 34: Big Data And No SQL

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 34

This space is for the video image, pls leave this

space blank. Also, please add your email id, as a footer, in every slide

• XML database designed to scale to Petabytes • Schema-free document store• ACID Compliant• Supports “high-variability” data• The most successful NO SQL database• Heavily use by Federal Agencies and document-

publishers

Page 35: Big Data And No SQL

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 35

This space is for the video image, pls leave this

space blank. Also, please add your email id, as a footer, in every slide

• Open Source JSON data store from 10gen • Master-Slave scale out model• Sharding built-in• Implemented in C++

Page 36: Big Data And No SQL

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 36

This space is for the video image, pls leave this

space blank. Also, please add your email id, as a footer, in every slide

• Open Source JSON data store• Built-around Memcache model• Peer to Peer Scale Out model• Implemented in C++ • Pros : Scale out, Replication & HA

Page 37: Big Data And No SQL

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 37

This space is for the video image, pls leave this

space blank. Also, please add your email id, as a footer, in every slide

• Open Source JSON data store from Apache • Implemented in ERLANG• Distributed, robust, incremental replication with bi-

directional conflict detection and management • B-Tree based indexing• RESTful JSON API• Mobile version

Page 38: Big Data And No SQL

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 38

■ Typical Use cases▪ Of course, any collection of document-type models▪ Easy-to-start NoSQL projects when moving from RDBMS▪ Almost any NoSQL use case needing secondary index access▪ Content and Metadata store: typically multiple keys▪ Queries using materialized views (CouchBase)▪ Non-trivial sharding (MongoDB)▪ Horizontally scaled or Cached reads (MongoDB, CouchBase)▪ Models requiring simple relationships (Blogs, User modeling)

■ Anti-patterns:▪ Not a drop-in replacement for RDBMS▪ Evolving relationships or query patterns• ▪ Usually not good for write-heavy

This space is for the video image, pls leave this

space blank. Also, please add your email id, as a footer, in every slide

Page 39: Big Data And No SQL

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 39

Column Family stores

This space is for the video image, pls leave this

space blank. Also, please add your email id, as a footer, in every slide

• Preserve table structure of RDBMS . One row could have millions of columns and the data can be very “sparse”• Modeled after Big Table systems which uses a

combination of Row and Column information as Key• Big Table systems have Keys that include row, columnID

and other attributes

Page 40: Big Data And No SQL

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 40

Column Families

This space is for the video image, pls leave this

space blank. Also, please add your email id, as a footer, in every slide

• Group columns into “Column Families”• Group columns families into “Super-Columns”• So when a table is created, Super-Columns/Column-

families are created.

Page 41: Big Data And No SQL

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 41

Key = Column Family + Timestamp

This space is for the video image, pls leave this

space blank. Also, please add your email id, as a footer, in every slide

• So What is the Key?• Of course it is Column Family, Column Name, Time

Stamp

• Querying?• Be able to Query ALL Columns within a Family or Super-

Column

Page 42: Big Data And No SQL

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 42

This space is for the video image, pls leave this

space blank. Also, please add your email id, as a footer, in every slide

■ data is stored in cells grouped in columns of data rather than as rows of data

■Column families can contain a virtually unlimited number of columns that can

be created at runtime or the definition of the schema

Page 43: Big Data And No SQL

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 43

■ RDBMS store single row as a continuous disk entry, whereas different rows are stored in different places on disk while Columnar databases store all the cells corresponding to a column as a continuous disk entry thus makes the search/access faster

■ Read and write is done using columns rather than rows. ■ the benefit of storing data in

columns, is fast search/ access and data aggregation

This space is for the video image, pls leave this

space blank. Also, please add your email id, as a footer, in every slide

Page 44: Big Data And No SQL

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 44

■ Typical Use cases▪ Data is mostly organized by sets of columns (super columns)▪ Key – Value based access▪ “Value” consists of sets of columns (but still unstructured)▪ Lots of repeated sets of values (e.g. Customer transactions)▪ No joins (except via another keyed table, using MapReduce)▪ Write-intensive patterns (Internet-of-Things type data)▪ Rolling expiry patterns such as Time series data

■ Anti-patterns▪ IMHO Low-latency reads (in comparison to other NoSQL stores)▪ Need access via secondary or other keys

This space is for the video image, pls leave this

space blank. Also, please add your email id, as a footer, in every slide

Page 45: Big Data And No SQL

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 45

This space is for the video image, pls leave this

space blank. Also, please add your email id, as a footer, in every slide

• Open Source implementation of Map Reduce Algorithm• Created by Yahoo• Column-Oriented data store• Java Interface• Designed specifically to work with Hadoop• The query language is PIG

Page 46: Big Data And No SQL

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 46

This space is for the video image, pls leave this

space blank. Also, please add your email id, as a footer, in every slide

• Apache Open Source Column Family database• Written in Java and works well with HDFS and Map

Reduce• Peer to Peer distribution model• Linear scale out i.e. millions of writes/sec• Database security

Page 47: Big Data And No SQL

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 47

Graph base NoSQL

This space is for the video image, pls leave this

space blank. Also, please add your email id, as a footer, in every slide

• Data is stored in series of Nodes, Relationships & properties

• Queries are really Graph Traversals• Ideal when relationships between the data is Key e.g.

Social Networks

Page 48: Big Data And No SQL

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 48

■ Pros▪ Fast network search, works with open linked data

sets ■ Cons

▪ Poor Scalability as Graph doesn’t fit into RAM ■ Neo4J is a Graph database for Java develpers ■ Works as embedded Java library in your

application ■ Disk based , not just RAM ■ ACID compliant

This space is for the video image, pls leave this

space blank. Also, please add your email id, as a footer, in every slide

Page 49: Big Data And No SQL

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 49

Hybrid Architectures■ Most real world implementations use combination of No

SQL solutions based on business-driven use-cases, quality metrics and Architecture Patterns.

■ Use document stores for data■ Use S3 for image/binary/pdf storage■ Use Apache Lucene for document index stores■ Use Map-Reduce for real-time index & aggregate

creation and maintainance■ Use OLAP as data ware-house■ Use RDBMS for OLTPs

This space is for the video image, pls leave this

space blank. Also, please add your email id, as a footer, in every slide

Page 50: Big Data And No SQL

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 50

Conclusions

■ Big Data is the driver for No SQL’s rise, but not the only reason to use No SQL

■ Non-functional requirements like Scalability, Performance & Consistency are main justifications for NO SQL Usage

■ Schema-less ; Un-Structured data This space is for the video image, pls leave this

space blank. Also, please add your email id, as a footer, in every slide