No sql – rise of the clusters

September 19, 2013

Speaker: David Wolfe

Topics

What is SQL? What is NoSQL?

Why have relational databases been

successful?

Why did NoSQL databases emerge?

How are their data models different?

SQL & relational databases

Relational databases are software

applications that store data

Data is stored in tables that have rows &

columns : think excel spreadsheets

FirstName LastName Age Zipcode Gender

Bob Smith 45 38444 M

Jane Happy 23 15122 F

Fred Jones 55 92102 M

Johnny Appleseed 26 90025 M

Relational databases typically have

many tables that are “related” to one

another

Relational databases support access to data in tables through a language called “SQL” – Structured Query Language

SQL supports “set” based operations on tables – selection, projection, joining

SQL is based on relational algebra

Relational databases were developed in the late 1970s at IBM

They have been the dominant approach to data management in the enterprise through the early 2000’s

Examples include

Oracle

Sybase

Postgress

NoSQL databases

NoSQL are software applications that

store data

They, not surprisingly, do not use SQL or

the relational model (interrelated tables)

They are “less strict” about data

definition

They were developed in a “big-data”

world for applications needing massive

scalability (clustering)

NoSQL databases

There are many types of NoSQL databases

We will review the differences later

RDBMS value - persistence

During the 90’s and 2000’s as pc’s

became ubiquitous, distributed

computing took off.

In the 1990’s, client-server and n-tier

architectures dominated enterprise

development

The late 90’s and 2000’s saw the

dominance of the web and distributed

applications that broke out of enterprise

RDBMS value - persistence

In this distributed world where

applications needed to keep data

around for

Many users

Extended periods

RDBMS emerged as the defacto choice for

persisting data.

RDBMS value - concurrency

Another challenge that distributed

applications presented was

concurrency:

many users viewing and potentially updating

the same data at the same time

Concurrency is notoriously difficult to

get right for even the best engineers.

Relational databases “helped” by

controlling data access with transactions

RDBMS value - integration

Enterprise application eco-systems

necessitate multiple integrated software

applications. Example

Customer Service app

Biz Intel app

E-Commerce app

Inventory management apps

Common approach was to use a shared

rdbms database integration approach.

RDBMS value – SQL

RDBMS providers all supported a core

SQL standard

In theory this would allow developers to

switch reliance on different RDBMS

providers without problems

In fact, different providers (Oracle,

Sybase, Microsoft) developed different

“dialects” or SQL extensions (pl SQL vs.

T-SQL)

Crack #1– impedance mismatch

Impedance mismatch is the difference

between the relational model and in-

memory data structures

In the late 1990s people believed that

impedance mismatch would lead to

RDBMS being replaced by databases

that replicated in-memory structures to

disk (OODBMS)

While the 1990s saw the rise of OO

programming languages, OODBMS

never took gained real traction

OODBMS didn’t gain traction because

Impedance mismatch had been made easier

to deal with by Object-Relational (OR)

mapping frameworks like Hibernate, iBatis,

& Cocoon

There was a growing professional divide

between application developers and

database administrators

The value of RDBMS as an app integration

mechanism was large

Crack #2– SOA

The 2000’s saw a shift in how enterprise

applications interacted

Historically, many applications interacted

through a shared RDBMS.

This approach – shared integration

RDBMS – has serious problems

Overly complex schema

Cant change tables or add indices easily

Database has to preserve integrity

Crack #2– SOA

Interactions between applications shifted

to web-services

Web-services constituted protocols for

moving documents (XML, JSON) over

HTTP using SOAP or REST based

approaches

SOA allowed applications to

encapsulate data and expose it through

services

The Final Crack #3– Clusters

The internet saw several large web properties dramatically increase in scale

Websites started tracking activity and structure in a very detailed way

Social gestures

Social links

Log data

Purchase gestures

Increasing numbers of users appeared using more devices

The Final Crack #3– Clusters

The problem with scaling out (clustering)

is that RDBMS are not designed to run

on clusters.

Oracle RAC & MS SQL Server all use

the concept of a shared disk sub-system

Still single point of failure and scaling

limitation

The final crack – mismatch between

RDBMS & clusters

NoSQL Emergence

The emergence of NoSQL was really

about needing databases that run on

clusters One exception is Graph databases

Though problems with shared database

integration and impedance mismatch

existed, it was the need for scale that

drove the emergence of NoSQL

databases

Aggregate Data Models

A key characteristic of NoSQL databases is that they do not use the Relational data metamodel (relations & tuples)

There are four types of data metamodels in the NoSQL eco-system

Key-value

Document

Column-family

Aggregate Data Models

Key-value, document, and column-

family NoSQL databases share a

common characteristic of their data

models called “aggregate orientation” We ill not cover graph based data metamodels in this presentation

Aggregates

The relational model takes information

you want to store and divides it into

Rows are lists of simple data values.

Rows are the unit of data operation

Aggregate orientation recognizes that

often times data units can be more

complex and can have nested lists and

record structures

Aggregates

The relational model takes information you want to store and divides it into rows.

In RDBMS rows are lists of simple data values.

In RDBMS rows are the unit of data operation

Aggregate orientation recognizes that often times data units can be more complex and can have nested lists and record structures

With Aggregates, aggregates are the unit of data operation

Relational Data Example

Aggregate Data Example

Consequences of Aggregate

Orientation

Relations capture data elements and relations, but not aggregates.

Aggregates are really “chunks” of data that are typically retrieved and operated on as an interaction unit.

Aggregates are about how the data is being used.

RDBMS do not have knowledge of aggregate structure and cant use it to store and distribute data

Orientation

So, RDBMS are aggregate-ignorant. Is that a bad or good thing? Its both

Its good if you need to access and use the data in many different ways – if you don’t have a primary structure for manipulating your data

Its bad if you want to run on a cluster.

Aggregates are great on clusters because you can distribute them across nodes

Orientation

Aggregate orientation allows you to

operate many logical data items (in the

aggregate) by updating the aggregate

atomically

Aggregate oriented NoSQL databases

can be said to support transactions on

single aggregates, but not across

aggregates

Key-Value & Document Data

Models

Both types of databases have a key or

Id that is mapped to an aggregate data

structure in a virtual table

With key-value NoSQL dbs, we can only

access the aggregate by looking up its

With document databases we can also

look up aggregates by fields in the

aggregate

Key-Value & Document Data

Models

Examples of Key-Value NoSQL dbs are

Examples of Document NoSQL dbs are

Mongodb

Couchbase

SimpleDB

Column-Family Data Models

These NoSQL databases where

influenced by Google’s BigTable

The Columnar is a two-level aggregate

structure

There is a key (row identifier) that maps to

the aggregate of interest

The aggregate is a map of more detailed

values – these are referred to as columns

Column-family dbs organize columns into families

The data is row-oriented

Each row is an aggregate (eg. Customer with id 1234)

The data is column-oriented

Each column family defines a record type (customer profile)

But, columns can also be dynamic and unique (to model lists)

Examples of Column-Family NoSQL dbs

Cassandra

Polyglot Persistence

The future?

Only NoSQL?

Only SQL?

Probably both – Polyglot Persistence

No sql – rise of the clusters

Technology

Transcript of No sql – rise of the clusters

RISE Project - Research Institutes in the Service Economy - Wp1 Clusters and Rtos

Altaro VM Backup 8 - TotemGuard€¦ · •Fully supports Hyper-V Clusters (CSV & SMB3) ... Backup of Exchange / SQL VMs - Transaction logs for SQL and Exchange will be truncated

Small Compute Clusters for Large-Scale Data Analysisnchawla/papers/ITSM2011.pdf · Small Compute Clusters for Large-Scale Data Analysis ... high performance, ... MySQL 3 SQL Query,

N-O-SQL, new database technologies on the rise

Polyhedral Clusters - ERNETmet.iisc.ernet.in/~lord/webfiles/clusters/polyclusters.pdf · In an R-phase the triacontahedral clusters are centered at bcc positions. Neighbouring clusters

Aplikativni SQL - poincare.matf.bg.ac.rspoincare.matf.bg.ac.rs/~gordana/SQLC.pdf · Aplikativni SQL • Interaktivni SQL • Aplikativni SQL – Statički SQL – Dinamički SQL •

ReseaRch RepoRt 2004—2005...high-rise clusters or low-rise sprawls—and temporal rhythms, ever accelerating, reveal how profoundly science and technology have influenced the very

PostgreSQL HA Database Clusters through Containment · PDF filePostgreSQL HA Database Clusters through Containment ... Bassil measured through MS SQL Server 2008, Oracle 11g, ... also

datasets using Databricks Analyzing massive genomics · • Use Spark SQL, ADAM, or Hail for overlap and aggregate queries ... • Spark allows you to program across large clusters

SQL Server InsIder - cdn.ttgtmedia.com · Clusters help you run a highly available SQL Server, but setup requires a whole bunch of steps. We walk you through the process, and help

Clusters & Super Clusters Large Scale Structure

(Geracc.Net / HomePlus) Pre-requirements · 2019. 7. 16. · SQL Server 2016 Express SQL Server Clusters MS SQL Server wordt geïnstalleerd door Corilus op voorwaarde de MS SQL Server

BIG DATA & Advanced Analytics Roadshow...Hadoop and SPARK on- premises. Provisioning HDInsight clusters, Azure SQL DW databases, Machine Learning, Stream Analytics & Power BI. Enabling

SQL Server 2019 Big Data Clusters · 2019-06-25 · SQL Server 2019 big data clusters Managed SQL Server, Spark, and data lake Store high volume data in a data lake and access it

Introduction | to Microsoft SQL Server Big Data Clusters › files › summit › session-assets › 2019 › T575BE.pdfIntroduction | to Microsoft SQL Server Big Data Clusters Buck

MY SQL SQL PL/SQL

The rise and fall of industrial clusters: Technology and ... · 1 THE RISE AND FALL OF INDUSTRIAL CLUSTERS: TECHNOLOGY AND THE LIFE CYCLE OF REGIONS a,b Mario A. Maggionic ABSTRACT:

SSRS/IQA/Rise Training Series - BSCI Chicago · SSRS/IQA/Rise Training Series Module 4 ... Overview of SQL Server Reporting Services ... Appendix A – Setting up SSRS in SQL Server

Microsoft SQL Server 2019 Big Data Clusters on Cisco UCS Reference Architecture · provides a compelling new way to use SQL Server to bring high-value relational data and high-volume

SQL Clusters in Virtualized Environments - Cloud Services · SQL Clusters in Virtualized Environments ... Clustering SQL Servers Three-node, three-instance cluster. Clustering SQL