No sql bigdata and postgresql

46
NoSQL, BigData and PostgreSQL

Transcript of No sql bigdata and postgresql

Page 1: No sql bigdata and postgresql

NoSQL, BigData andPostgreSQL

Page 2: No sql bigdata and postgresql

Contents● Typical RDBMS and Scaling

● Big Data

– Big Data VS Traditional Data

– Big Data Characteristic

– Big Data Technologies

– NoSQL & Hadoop

● NoSQL

● Benefits of NoSQL

● What does NoSQL not Provide

● NoSQL Database Usage

● BASE VS ACID

Page 3: No sql bigdata and postgresql

Contents ...● NoSQL Challenges

● Breads of NoSQL Solutions

– Key-Value Stores

– Column Family Store

– Document Database Store

● NoSQL with Relational DBMS (EDB)

● Postgres: Key-Value Store

● Postgres: Document Store

– JSON and SQL

– Bridging Between SQL and JSON

– JSON Data Types

Page 4: No sql bigdata and postgresql

Typical RDBMS● Fixed table schemas

● Small but frequent reads/write

● Large batch transactions

● Focus on ACID

– Atomicity

– Consistency

– Isolation

– Durability

Page 5: No sql bigdata and postgresql

How We Scale RDBMS Implementation

Page 6: No sql bigdata and postgresql

Build a Relational database1st Step

Database

Page 7: No sql bigdata and postgresql

Table Partition2nd Step

Database

Page 8: No sql bigdata and postgresql

Database Partitioning3rd Step

Cloud Instance 1Browser

Customer# 1

Web Tier Business Logic Tier

Cloud Instance 2Browser

Customer# 2

Web Tier Business Logic Tier

Cloud Instance 3Browser

Customer# 3

Web Tier Business Logic Tier

Page 9: No sql bigdata and postgresql

Big Data● Lots of structured and sami structured data collected

and warehoused and PB of transactions performedday by day like on ...– Web data– Social networking data– User personal identify– Users transactions

● Due to big volume of data which increases day byday traditional database management solution fail toprovide more performance, elastic scalability forwider audience e.g... – Google processes 20 PB + a day (2008)– Facebook has 2.5 PB of user data (2009)– Ebay has 6.5 PB of user data (2009)

Page 10: No sql bigdata and postgresql

Big Data VS Traditional Data● Photograph

● Audio & Video

● 3D model

● Simulations

● Location Data

● ..

● Documents● Finances● Inventory records● Personal files● ..

Page 11: No sql bigdata and postgresql

Big Data Characteristic● Volume (High volume of data)

● Velocity (Changes occurrence in data rapidly)

● Variety (Number of new data types)

Page 12: No sql bigdata and postgresql

Big Data Technologies

Page 13: No sql bigdata and postgresql

NoSQL & Hadoop

NoSQL Hadoop

● Real timeread/write system

● Interactive● Fast read/writes

● Batch data use foranalysis

● Large scale processing● Massive computer power

UserTransactions

Sensordata

Both support● Big volume of data● Incremental, horizontal scaling● Varying / Changing data formats

Customerprofiles

PredictiveAnalytics

FraudDeduction

Recommendations

Page 14: No sql bigdata and postgresql

NoSQL● Stands for No-SQL or Not only SQL

● Class of non-relational data storage systems

● Usually do not require a fixed table schema nor dothey use the concept of joins

● NoSQL is not ACID compliance.

Page 15: No sql bigdata and postgresql

Benefits of NoSQL● Elastic scaling: RDBMS might not scale out easily on

commodity clusters, but the new breed of NoSQLdatabases are designed to expand transparently to takeadvantage of new nodes.

● Flexible Data Model: Enable to work with new data typeslike mobile interactions, machine data, social connectionsetc.

● Enable you to work in new ways of incrementaldevelopment and continuous release.

● Cheap, easy to implement (open source)

Page 16: No sql bigdata and postgresql

Benefits of NoSQL● Data are replicated to multiple nodes (therefore

identical and fault-tolerant) and can be partitionedWhen data is written, the latest version is on at leastone node and then replicated to other nodes.

● No single point of failure

● Easy to distribute

● Don't require a schema

Page 17: No sql bigdata and postgresql

What does NoSQL Not Provide● Joins● Group by● ACID transactions● SQL

– Integration with applications that are based on SQL

Page 18: No sql bigdata and postgresql

NoSQL Database Usage● NoSQL Data storage systems makes sense for applications

that need to deal with very very large semi-structured data

– Log Analysis

– Social Networking Feeds● Scalable replication and distribution

– Potential of thousands of machines

– Potentially distributed around the word

● Query needs to answer quickly

● Mostly data retrieval with few updates

● Schema less with no relation

● ACID transaction properties not needed

● Open Source development

Page 19: No sql bigdata and postgresql

NoSQL Real-World Application● Emergency Management System

– High variability among data sources required high schemaflexibility

● Massively Open Online Course

– Massive read stability, content integration, low latency

● Patient Data and Prescription Records

– Efficient write stability

● Social Marketing Analytics

– Map reduce analytical approaches

Source: Gartner , A Tour of NoSQL in 8 Use Cases

Page 20: No sql bigdata and postgresql

Where No-SQL Used● Google (BigTable, LevelDB)● LinkedIn (Voldemort)● Facebook (Cassandra)● Twitter (Hadoop/Hbase, FlockDB, Cassandra)● Netflix (SimpleDB, Hadoop/HBase, Cassandra)● CERN (CouchDB)

Page 21: No sql bigdata and postgresql

BASE Transactions

● Autonomic● Consistency● Isolation● Durability

● Basically Available: Highly Available but notalways consistent

● Soft State: Background cleanup mechanism● Eventually Consistent: copies becomes

consistent at some later time if there are nomore updates to that data item

SQL No-SQL

Page 22: No sql bigdata and postgresql

No-SQL Challenges● Lack of maturity -- numerous solutions still in their

beta stages

● Lack of commercial support for enterprise users

● Lack of support for data analysis

● Maintenance efforts and skills are required. Expertsare hard to find

Page 23: No sql bigdata and postgresql

Breads of No-SQL Solutions● Key-Value Stores● Column Family Stores● Document Databases● Graph Databases

Page 24: No sql bigdata and postgresql

Key-Value Stores● Dynamo, Voldemort, Rhino

DHT …

● Key-Value is based on a hashtable where there is a uniquekey and a pointer to aparticular item of data.

● Mappings are usuallyaccompanied by cachemechanisms to maximizeperformance.

Page 25: No sql bigdata and postgresql

Column Family Store● BigTable, Cassandra, HBase, Hadoop etc.

● Store and process very large amounts of datadistributed over many machines. "Petabytes of dataacross thousands of servers"

● Keys point to multiple columns.

Page 26: No sql bigdata and postgresql

Document Database Stores● CouchDB, MongoDB, Lotus Notes, Redis …

● Documents are addressed in the database via aunique key that represents that document.

● Semi-structured documents can be XML or JSONformatted, for instance.

● In addition to the key, documents can be retrievedwith queries.

Page 27: No sql bigdata and postgresql

Document Database Stores{

FirstName: "Bart",

LastName: "Loews",

Children: [ {

FirstName:"Tadd",

Age: 4},

{

FirstName:"Todd",

Age:4}

],

Age: 35,

Address:{

number:1234,

street: "Fake road",

City: "Fake City",

state: "VA",

Country: "USA"

}

}

Page 28: No sql bigdata and postgresql

Relational VS Document DS

Page 29: No sql bigdata and postgresql

Relational VS Graph DS

Relational Database Store

Graph Database Store

Page 30: No sql bigdata and postgresql

Relational VS NoSQL DBMS Compare(Functionality, Scalability, Performance)

Page 31: No sql bigdata and postgresql

In EDB, NoSQL implemented through different data types

● HSTORE

– Key-value pair

– Simple, fast and easy

– Ideal for flat data structures

● JSON

– Hierarchical document model

– Introduced in PPAS 9.2/9.3

● JSONB

– Binary version of JSON

– Faster, more operators and even more robust

– Introduced in PPAS 9.4

NoSQL with Relational DBMS(EDB)

Page 32: No sql bigdata and postgresql

Postgres: Key-Value Store ● HStore contrib module enables storing key/value pairs

with in a single column.

● Allows you to create a schema less ACID complaint datastore with in Postgres.

● Create single HStore column and include, for each row,only those keys which pertain to record.

● Add attributes to a table and query without advanceplanning.

● Combine flexibility with ACID compliance

Page 33: No sql bigdata and postgresql

HStore - Example● Create a table with Hstore field

– Create table hstore_data (my_data HSTORE);

● Insert a record into hstore_data

– Insert into hstore_data (my_data) values('“cost”=>”60000”,

“product”=>”iphone”,

“provider”=>”Apple” ');

● Select my_data from hstore_data

– Select my_data from hstore_data;

=========================“cost”=>”60000”,”product”=>”iphone”, “provider”=>”Apple”

(1 row)

Page 34: No sql bigdata and postgresql

Postgres: Document Store● JSON is the most popular data-interchange format on

the web.

● Derived from ECMAScript Programming languagestandard.

● Supported by virtually every programing language.

● JSON datatype implemented in PPAS 9.2/9.3

● JSONB datatype implemented in PPAS 9.4.

Page 35: No sql bigdata and postgresql

JSONB - Example● Create a table with JSONB field

– Create table jsonb_data (data JSONB);

● Insert a record into jsonb_data

– Insert into jsonb_data (data) values

(' { “name”: “Apple Phone”,“type”: “phone”,“product”: ”iphone”,“available”: true,“warranty_years”: 1} ')

Page 36: No sql bigdata and postgresql

A Simple Query For JSON Data

Page 37: No sql bigdata and postgresql

A Query That Return JSON Data● Select data from JSON_data;

data

============================

{ “name”: “Apple Phone”, “type”: “phone”, “product”:”iphone”, “available”: true, “warranty_years”: 1 }

Note: This Query return JSON data in itsoriginal format

Page 38: No sql bigdata and postgresql

JSON and SQL● JSON is naturally integrated

with SQL in Postgres.

● JSON and SQL queries use

the same language, the same

planner, and the same ACID complaint transactionframework.

● JSON and HSTORE are elegant and easy to useextensions of the underlying object relational model.

Page 39: No sql bigdata and postgresql

JSON and SQL Example

No need for programming logic to combine SQL andNoSQL in the application – Postgres does it all

Page 40: No sql bigdata and postgresql

Bridging Between SQL and JSON● Simple SQL table definition.

– Create table products (id integer, product_name text);

● Select query returning data set

– Select * from products;

● Select query return the same result as a JSON data set

– Select ROW_TO_JSON(products) from products;

Page 41: No sql bigdata and postgresql

JSON Data Types● Number

– Signed decimal number may contain a fractional part.

– No distinguish between integer and floating point.

● String

– A sequence of zero or more unicode characters.

– Strings are delimited with double quotes mark.

– Supports a backslash escaping character.

● Boolean

– Either of the value of true or false.

● Array

– An ordered list of zero or more values.

– Each value may be of any type.

Page 42: No sql bigdata and postgresql

JSON Data Types● Array

– Arrays use square bracket notation with element being comma-separated.

● Objects– An unordered associative array (name/value pairs).

– Objects are delimited with curly brackets { }

– Comma to separate each pair.

– Each pair the colon ':' character separates the key or name fromits value.

– All keys must be strings and should be distinct from each otherwithin the object.

● Null– An empty value, using the word null

Page 43: No sql bigdata and postgresql

JSON Data Types - Example

Page 44: No sql bigdata and postgresql

JSON, JSONB or HSTORE ?● JSON/JSONB is more versatile than HSTORE.

● HSTORE provides more structure but its only deal withtext and you can not nest objects.

● JSON or JSONB ?

– If you need any of the following then use JSON● Storage of validated JSON, without processing or

indexing it.● Preservation of white spaces in json text.● Preservation of object key order● Preservation of duplicate object keys.● Maximum input/output speed.

– For any other case use JSONB.

Page 45: No sql bigdata and postgresql

Structured or Unstructured ?“No SQL Only” or “Not Only SQL” ?● Structure and standard emerge.

● Data has reference

● When the database has duplicate data entries , then theapplication has to manage updates in multiple places –what happens when there is no ACID transactionalmodel.

Page 46: No sql bigdata and postgresql

Say yes to “Not Only SQL”● Postgres overcomes many of the standard objections “It

can't be done with conventional database system.”

● Postgres

– Combines both structured and unstructured data.

– Is faster ( for many workloads) than the leading No-SQL only solutions.

– Integrate easily with web 2.0 application developmentenvironment.

– Can be deploy on client premises or in thecloud(public/private).

● Do more with Postgres – The enterprise NoSQL Solution.