Comparing sql and nosql dbs

32
Comparing SQL and noSQL Databases 1

Transcript of Comparing sql and nosql dbs

Page 1: Comparing sql and nosql dbs

Comparing SQL and noSQL Databases

1

Page 2: Comparing sql and nosql dbs

The following is a short, incomplete history of the SQL1987 – Initial ISO/IEC Standard1989 – Referential Integrity1992 – SQL2

1995 SQL/CLI (ODBC)1996 SQL/PSM – Procedural Language extensions

1999 – User Defined Types2003 – SQL/XML2008 – Expansions and corrections2011 (or 2012) System Versioned and Application

Time Period Tables

2

Standard SQL

Page 3: Comparing sql and nosql dbs

Data stored in columns and tablesRelationships represented by dataData Manipulation LanguageData Definition Language TransactionsAbstraction from physical layer

3

SQL Characteristics

Page 4: Comparing sql and nosql dbs

Applications specify what, not howQuery optimization enginePhysical layer can change without modifying

applicationsCreate indexes to support queriesIn Memory databases

4

SQL Physical Layer Abstraction

Page 5: Comparing sql and nosql dbs

Data manipulated with Select, Insert, Update, & Delete statementsSelect T1.Column1, T2.Column2 …

From Table1, Table2 …Where T1.Column1 = T2.Column1 …

Data AggregationCompound statementsFunctions and ProceduresExplicit transaction control

5

Data Manipulation Language (DML)

Page 6: Comparing sql and nosql dbs

Schema defined at the startCreate Table (Column1 Datatype1, Column2

Datatype 2, …)Constraints to define and enforce relationships

Primary KeyForeign KeyEtc.

Triggers to respond to Insert, Update , & DeleteStored ModulesAlter …Drop …Security and Access Control

6

Data Definition Language

Page 7: Comparing sql and nosql dbs

Atomic – All of the work in a transaction completes (commit) or none of it completes

Consistent – A transaction transforms the database from one consistent state to another consistent state. Consistency is defined in terms of constraints.

Isolated – The results of any changes made during a transaction are not visible until the transaction has committed.

Durable – The results of a committed transaction survive failures

7

Transactions – ACID Properties

Page 8: Comparing sql and nosql dbs

CommercialIBM DB2Oracle RDMSMicrosoft SQL ServerSybase SQL Anywhere

Open Source (with commercial options) MySQLIngres

Significant portions of the world’s economy use SQL databases!

8

SQL Database Examples

Page 9: Comparing sql and nosql dbs

From www.nosql-database.org:Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open-source and horizontal scalable. The original intention has been modern web-scale databases. The movement began early 2009 and is growing rapidly. Often more characteristics apply as: schema-free, easy replication support, simple API, eventually consistent / BASE (not ACID), a huge data amount, and more.

9

NoSQL Definition

Page 10: Comparing sql and nosql dbs

http://www.nosql-database.org/ lists 122 NoSQL DatabasesCassandraCouchDBHadoop & HbaseMongoDBStupidDBEtc.

10

NoSQL Products/Projects

Page 11: Comparing sql and nosql dbs

Large data volumesGoogle’s “big data”

Scalable replication and distributionPotentially thousands of machinesPotentially distributed around the world

Queries need to return answers quicklyMostly query, few updatesAsynchronous Inserts & UpdatesSchema-lessACID transaction properties are not needed – BASECAP TheoremOpen source development

11

NoSQL Distinguishing Characteristics

Page 12: Comparing sql and nosql dbs

Acronym contrived to be the opposite of ACIDBasically Available, Soft state,Eventually Consistent

CharacteristicsWeak consistency – stale data OKAvailability firstBest effortApproximate answers OKAggressive (optimistic)Simpler and faster

12

BASE Transactions

Page 13: Comparing sql and nosql dbs

A distributed system can support only two of the following characteristics: ConsistencyAvailabilityPartition toleranceThe slides from Brewer’s July 2000 talk do not define these characteristics.

13

Brewer’s CAP Theorem

Page 14: Comparing sql and nosql dbs

all nodes see the same data at the same time – Wikipedia

client perceives that a set of operations has occurred all at once – Pritchett

More like Atomic in ACID transaction properties

14

Consistency

Page 15: Comparing sql and nosql dbs

node failures do not prevent survivors from continuing to operate – Wikipedia

Every operation must terminate in an intended response – Pritchett

15

Availability

Page 16: Comparing sql and nosql dbs

the system continues to operate despite arbitrary message loss – Wikipedia

Operations will complete, even if individual components are unavailable – Pritchett

16

Partition Tolerance

Page 17: Comparing sql and nosql dbs

Discussing NoSQL databases is complicated because there are a variety of types:Column Store – Each storage block contains

data from only one columnDocument Store – stores documents made up

of tagged elementsKey-Value Store – Hash table of keys

17

NoSQL Database Types

Page 18: Comparing sql and nosql dbs

XML DatabasesGraph DatabasesCodasyl DatabasesObject Oriented DatabasesEtc…Will not address these today

18

Other Non-SQL Databases

Page 19: Comparing sql and nosql dbs

Each storage block contains data from only one column

Example: Hadoop/Hbasehttp://hadoop.apache.org/Yahoo, Facebook

Example: Ingres VectorWiseColumn Store integrated with an SQL databasehttp://www.ingres.com/products/vectorwise

19

NoSQL Example: Column Store

Page 20: Comparing sql and nosql dbs

More efficient than row (or document) store if:Multiple row/record/documents are inserted at

the same time so updates of column blocks can be aggregated

Retrievals access only some of the columns in a row/record/document

20

Column Store Comments

Page 21: Comparing sql and nosql dbs

Example: CouchDBhttp://couchdb.apache.org/BBC

Example: MongoDBhttp://www.mongodb.org/Foursquare, Shutterfly

JSON – JavaScript Object Notation

21

NoSQL Example: Document Store

Page 22: Comparing sql and nosql dbs

{ "_id": "guid goes here", "_rev": "314159",

"type": "abstract", "author": "Keith W. Hare"

"title": "SQL Standard and NoSQL Databases",

"body": "NoSQL databases (either no-SQL or Not Only SQL) are currently a hot topic in some parts of computing.", "creation_timestamp": "2011/05/10 13:30:00 +0004"}

22

CouchDB JSON Example

Page 23: Comparing sql and nosql dbs

"_id" GUID – Global Unique IdentifierPassed in or generated by CouchDB

"_rev"Revision numberVersioning mechanism

"type", "author", "title", etc. Arbitrary tagsSchema-lessCould be validated after the fact by user-written

routine23

CouchDB JSON Tags

Page 24: Comparing sql and nosql dbs

Hash tables of KeysValues stored with KeysFast access to small data valuesExample – Project-Voldemort

http://www.project-voldemort.com/Linkedin

Example – MemCacheDBhttp://memcachedb.org/Backend storage is Berkeley-DB

24

NoSQL Examples: Key-Value Store

Page 25: Comparing sql and nosql dbs

Technique for indexing and searching large data volumes

Two Phases, Map and ReduceMap

Extract sets of Key-Value pairs from underlying data

Potentially in Parallel on multiple machinesReduce

Merge and sort sets of Key-Value pairsResults may be useful for other searches

25

Map Reduce

Page 26: Comparing sql and nosql dbs

Map Reduce techniques differ across products

Implemented by application developers, not by underlying software

26

Map Reduce

Page 27: Comparing sql and nosql dbs

Google granted US Patent 7,650,331, January 2010System and method for efficient large-scale data processing

A large-scale data processing system and method includes one or more application-independent map modules configured to read input data and to apply at least one application-specific map operation to the input data to produce intermediate data values, wherein the map operation is automatically parallelized across multiple processors in the parallel processing environment. A plurality of intermediate data structures are used to store the intermediate data values. One or more application-independent reduce modules are configured to retrieve the intermediate data values and to apply at least one application-specific reduce operation to the intermediate data values to provide output data.

27

Map Reduce Patent

Page 28: Comparing sql and nosql dbs

Syntax variesHTMLJava ScriptEtc.

Asynchronous – Inserts and updates do not wait for confirmation

VersionedOptimistic Concurrency

28

Storing and Modifying Data

Page 29: Comparing sql and nosql dbs

Syntax VariesNo set-based query languageProcedural program languages such as Java, C,

etc.Application specifies retrieval pathNo query optimizerQuick answer is importantMay not be a single “right” answer

29

Retrieving Data

Page 30: Comparing sql and nosql dbs

Small upfront software costsSuitable for large scale distribution on

commodity hardware

30

Open Source

Page 31: Comparing sql and nosql dbs

NoSQL databases reject:Overhead of ACID transactions“Complexity” of SQLBurden of up-front schema designDeclarative query expression Yesterday’s technology

Programmer responsible forStep-by-step procedural languageNavigating access path

31

NoSQL Summary

Page 32: Comparing sql and nosql dbs

SQL DatabasesPredefined SchemaStandard definition and interface languageTight consistencyWell defined semantics

NoSQL DatabaseNo predefined SchemaPer-product definition and interface languageGetting an answer quickly is more important

than getting a correct answer

32

Summary