Post on 12-May-2015
description
September 19, 2013
Speaker: David Wolfe
Topics
What is SQL? What is NoSQL?
Why have relational databases been
successful?
Why did NoSQL databases emerge?
How are their data models different?
SQL & relational databases
Relational databases are software
applications that store data
Data is stored in tables that have rows &
columns : think excel spreadsheets
FirstName LastName Age Zipcode Gender
Bob Smith 45 38444 M
Jane Happy 23 15122 F
Fred Jones 55 92102 M
Johnny Appleseed 26 90025 M
SQL & relational databases
Relational databases typically have
many tables that are “related” to one
another
SQL & relational databases
Relational databases support access to data in tables through a language called “SQL” – Structured Query Language
SQL supports “set” based operations on tables – selection, projection, joining
SQL is based on relational algebra
SQL & relational databases
Relational databases were developed in the late 1970s at IBM
They have been the dominant approach to data management in the enterprise through the early 2000’s
Examples include
Oracle
Sybase
MySQL
Postgress
NoSQL databases
NoSQL are software applications that
store data
They, not surprisingly, do not use SQL or
the relational model (interrelated tables)
They are “less strict” about data
definition
They were developed in a “big-data”
world for applications needing massive
scalability (clustering)
NoSQL databases
There are many types of NoSQL databases
We will review the differences later
RDBMS value - persistence
During the 90’s and 2000’s as pc’s
became ubiquitous, distributed
computing took off.
In the 1990’s, client-server and n-tier
architectures dominated enterprise
development
The late 90’s and 2000’s saw the
dominance of the web and distributed
applications that broke out of enterprise
RDBMS value - persistence
In this distributed world where
applications needed to keep data
around for
Many users
Extended periods
RDBMS emerged as the defacto choice for
persisting data.
RDBMS value - concurrency
Another challenge that distributed
applications presented was
concurrency:
many users viewing and potentially updating
the same data at the same time
Concurrency is notoriously difficult to
get right for even the best engineers.
Relational databases “helped” by
controlling data access with transactions
RDBMS value - integration
Enterprise application eco-systems
necessitate multiple integrated software
applications. Example
Customer Service app
Biz Intel app
E-Commerce app
Inventory management apps
Common approach was to use a shared
rdbms database integration approach.
RDBMS value – SQL
RDBMS providers all supported a core
SQL standard
In theory this would allow developers to
switch reliance on different RDBMS
providers without problems
In fact, different providers (Oracle,
Sybase, Microsoft) developed different
“dialects” or SQL extensions (pl SQL vs.
T-SQL)
Crack #1– impedance mismatch
Impedance mismatch is the difference
between the relational model and in-
memory data structures
Crack #1– impedance mismatch
In the late 1990s people believed that
impedance mismatch would lead to
RDBMS being replaced by databases
that replicated in-memory structures to
disk (OODBMS)
While the 1990s saw the rise of OO
programming languages, OODBMS
never took gained real traction
Crack #1– impedance mismatch
OODBMS didn’t gain traction because
Impedance mismatch had been made easier
to deal with by Object-Relational (OR)
mapping frameworks like Hibernate, iBatis,
& Cocoon
There was a growing professional divide
between application developers and
database administrators
The value of RDBMS as an app integration
mechanism was large
Crack #2– SOA
The 2000’s saw a shift in how enterprise
applications interacted
Historically, many applications interacted
through a shared RDBMS.
This approach – shared integration
RDBMS – has serious problems
Overly complex schema
Cant change tables or add indices easily
Database has to preserve integrity
Crack #2– SOA
Interactions between applications shifted
to web-services
Web-services constituted protocols for
moving documents (XML, JSON) over
HTTP using SOAP or REST based
approaches
SOA allowed applications to
encapsulate data and expose it through
services
The Final Crack #3– Clusters
The internet saw several large web properties dramatically increase in scale
Websites started tracking activity and structure in a very detailed way
Social gestures
Social links
Log data
Purchase gestures
Increasing numbers of users appeared using more devices
The Final Crack #3– Clusters
The problem with scaling out (clustering)
is that RDBMS are not designed to run
on clusters.
Oracle RAC & MS SQL Server all use
the concept of a shared disk sub-system
Still single point of failure and scaling
limitation
The final crack – mismatch between
RDBMS & clusters
NoSQL Emergence
The emergence of NoSQL was really
about needing databases that run on
clusters One exception is Graph databases
Though problems with shared database
integration and impedance mismatch
existed, it was the need for scale that
drove the emergence of NoSQL
databases
Aggregate Data Models
A key characteristic of NoSQL databases is that they do not use the Relational data metamodel (relations & tuples)
There are four types of data metamodels in the NoSQL eco-system
Key-value
Document
Column-family
Graph
Aggregate Data Models
Key-value, document, and column-
family NoSQL databases share a
common characteristic of their data
models called “aggregate orientation” We ill not cover graph based data metamodels in this presentation
Aggregates
The relational model takes information
you want to store and divides it into
rows.
Rows are lists of simple data values.
Rows are the unit of data operation
Aggregate orientation recognizes that
often times data units can be more
complex and can have nested lists and
record structures
Aggregates
The relational model takes information you want to store and divides it into rows.
In RDBMS rows are lists of simple data values.
In RDBMS rows are the unit of data operation
Aggregate orientation recognizes that often times data units can be more complex and can have nested lists and record structures
With Aggregates, aggregates are the unit of data operation
Relational Data Example
Aggregate Data Example
Consequences of Aggregate
Orientation
Relations capture data elements and relations, but not aggregates.
Aggregates are really “chunks” of data that are typically retrieved and operated on as an interaction unit.
Aggregates are about how the data is being used.
RDBMS do not have knowledge of aggregate structure and cant use it to store and distribute data
Consequences of Aggregate
Orientation
So, RDBMS are aggregate-ignorant. Is that a bad or good thing? Its both
Its good if you need to access and use the data in many different ways – if you don’t have a primary structure for manipulating your data
Its bad if you want to run on a cluster.
Aggregates are great on clusters because you can distribute them across nodes
Consequences of Aggregate
Orientation
Aggregate orientation allows you to
operate many logical data items (in the
aggregate) by updating the aggregate
atomically
Aggregate oriented NoSQL databases
can be said to support transactions on
single aggregates, but not across
aggregates
Key-Value & Document Data
Models
Both types of databases have a key or
Id that is mapped to an aggregate data
structure in a virtual table
With key-value NoSQL dbs, we can only
access the aggregate by looking up its
key
With document databases we can also
look up aggregates by fields in the
aggregate
Key-Value & Document Data
Models
Examples of Key-Value NoSQL dbs are
Redis
Examples of Document NoSQL dbs are
Mongodb
Couchbase
SimpleDB
Column-Family Data Models
These NoSQL databases where
influenced by Google’s BigTable
The Columnar is a two-level aggregate
structure
There is a key (row identifier) that maps to
the aggregate of interest
The aggregate is a map of more detailed
values – these are referred to as columns
Column-Family Data Models
Column-Family Data Models
Column-family dbs organize columns into families
The data is row-oriented
Each row is an aggregate (eg. Customer with id 1234)
The data is column-oriented
Each column family defines a record type (customer profile)
But, columns can also be dynamic and unique (to model lists)
Column-Family Data Models
Examples of Column-Family NoSQL dbs
are
Hbase
Cassandra
Polyglot Persistence
The future?
Only NoSQL?
Only SQL?
Probably both – Polyglot Persistence