SQL, NoSQL, and Next Generation DBMSs - dexa. · PDF fileSQL, NoSQL, and Next Generation DBMSs...

43
SQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab

Transcript of SQL, NoSQL, and Next Generation DBMSs - dexa. · PDF fileSQL, NoSQL, and Next Generation DBMSs...

SQL, NoSQL, and Next Generation DBMSs

Shahram Ghandeharizadeh

Director of the USC Database Lab

Outline

A brief history of DBMSs.

1960/70 1980+

OSs SQL

2000+

NoSQL

Before Computers

Database

DBMS/Data Store

Digital Era

Database

File System/

Data Store

0011101000000101110101

Application

programs

Before DBMSs: 1960/70s

Data

Data

Application

programs

Developer 1

Developer 2

Application

programs

After DBMSs

Application

programs

Developer 1

Developer 2

DBMS

Physical Data Independence.

SQL as a “what”-oriented language.

SQL Data Stores

Manage records/tuples

A record/tuple is a row in a table where attribute names are pre-defined in a schema.

Alternative physical designs:

Column-store versus Row-store.

Transactions with ACID properties

SQL IS OVERHYPED

Why?

Marketing campaigns have become too exaggerated!

Relational vendors claim RDBMS is the answer to all data management needs.

What are some counter examples?

Seltzer. Beyond Relational Databases. Communications of the ACM, July 2008.

Web Search

Semi-structured data

HTML pages instead of raw data.

Queries are keyword lookups and the desired response is a sorted list of possible answers.

Need for efficient inverted indices.

Bulk updates, read mostly.

Need for nontraditional indexing.

Directory Services International organizations with distributed

resources and personnel. Requirement: fast lookup of entities arranged in

a hierarchical structure that corresponds to a hierarchy of the organization.

LDAP standard. Core of identification and authentication system

from a number of vendors, e.g., IBM Tivoli, Microsoft Active Directory Server, SUN ONE Directory Server.

Bulk updates similar to data warehousing.

Multi-valued attributes.

Queries are single-row retrieval or lookups based on attribute values.

Other Examples

Mobile device caching

Your cell phone’s directory as a transient cache of a global directory.

Stream management

Real-time filtering of streams for interesting patterns. Example: identify hotly traded stock, or a stock that is not traded as heavily as expected.

Filters look like SQL selection predicates, causing developers to mistake a RDBMS as the right choice.

XML management

Summary Relational DBMS have been designed for transaction

processing and workloads consisting of ad hoc queries and significant amount of updates. 25 years ago, One market for DBMS: Business data

processing. This has changed to include different applications with different requirements.

Example applications are read-dominated: No need for transactional guarantees.

SQL is the wrong choice for stream processing.

One software architecture will not support the diverse needs of these applications. Possible solutions: 1) each application re-builds its own storage manager from

scratch,

2) provide a flexible solution that can be tailored to the needs of a particular application.

Past 25 Years

Two trends:

1. Bloated systems.

Need for a specialist, a trained DBA, to keep a system and its applications running.

2. Few applications need all the features available in today’s RDBMSs.

The application must pay for all the features even though it requires a small subset.

NOSQL DATA STORES

NoSQL Data Stores

Scale horizontally for “simple operations” using many servers.

Replicate and distribute (partition) data across many servers.

Provide a simple call level interface or protocol.

A weaker concurrency model than ACID:

Basically Available, Soft state, Eventually consistent (BASE).

Efficient use of distributed indexes and DRAM for data storage.

Ability to dynamically add new attributes to data records.

Cattell. Scalable SQL and NoSQL Data Stores. SIGMOD Record 39(4), 2010.

Ghandeharizadeh, Boghrati, and Barahmand. An Evaluation of Graph Data Models. TPCTC 2014.

NoSQL Data Model A “key-value” store:

A distributed hash table,

A key/value may be an arbitrary sequence of bytes,

E.g., memcached, Voldemort, Riak, Redis, Tokyo Cabinet, Membase, Membrain.

A “document” store:

A value may be a scalar, lists, nested documents,

Attribute names might be dynamically defined at runtime,

E.g., SimpleDB, CouchDB, MongoDB, Terrastore.

An “Extensible record” store:

A hybrid between a SQL store and a document store,

Families of attributes are defined in a schema and new attributes can be added,

Attributes may be list-valued,

E.g., BigTable, HBase, HyperTable, Cassandra, PNUTs.

MIDDLEWARE: CACHE AUGMENTED DATA STORES

Simple Operations Operations that read and write a small amount of

data.

Challenge: High volume of requests with a low latency requirement.

Person-to-person service providers in 1 Minute:

147K page views

100M queries 7K user visits

347K Tweets

Facebook, http://thenextweb.com/facebook/2014/10/28/facebook-1-35-billion-users/

Google, http://expandedramblings.com/index.php/google-plus-statistics/

Twitter, https://about.twitter.com/company

Wikipedia, http://stats.wikimedia.org/EN/Sitemap.htm

How?

Look up query result instead of query processing.

Ideal for applications with workloads that exhibit a high read to write ratio.

Key-value store as the cache manager.

Query result caching:

Key: query string, Value: result set

Trillions of cached key-value pairs.

Cache Augmented DBMSs

1. Value = Get (Key)

2. If Value is found, go to Step 6.

3. SQL queries

4. Query results Application

constructs Value using the results

5. Put(Key, Value)

6. Use Value to generate HTML result page

RDBMS

Server

Cache

Server

(KVS,

e.g., memcached)

1 23

54

CADBMS: Update

1. SQL DML Command: Insert, Delete, Update

2. Invalidate key-value pairs: Delete

Alternatives to invalidate include Refill/Refresh and incremental update

RDBMS

Server

21

Cache

Server

(KVS,

e.g., memcached)

Developer 1

Developer 2Data

Store

memcached

Cache

Server

Application

programs

Persistent

Data

In-memory

Copy of

Data

Application

programs

Stale

CADBMS Today

Physical Data Independence.

A “what”-oriented language.

Future CADBMSs

Application

programs

Application

programs

CADBMS

Data

Store

Key Value

Cache Server

Developer 1

Developer 2

Physical Data Independence.

SQL as a “what”-oriented language.

KOSAR

Application

programs

Application

programs

KOSAR

RDBMS

Key Value

Cache Server

Developer 1

Developer 2

Ghandeharizadeh et. al. A Demonstration of KOSAR. Middleware 2014.

Architecture A database driven application:

Data Store Server

Data Store Client

Application

Architecture: Example An RDBMS driven application authored

using Java:

MySQL Server

JDBC

Application

SQL Result Set

KOSAR: Transparent Caching

Simply replace the client component of your application with KOSAR and see it run much faster.

Data Store Server

Data Store Client

Application

Ghandeharizadeh, Yap, and Nguyen. Strong Consistency in Cache Augmented SQL Systems. Middleware 2014.

Ghandeharizadeh, Irani, Lam, Yap. CAMP: A Multi-Queue Eviction Policy for Key-Value Stores. Middleware 2014.

How?1. Lookup query result instead of query

processing.

Data Store Server

Data Store Client

Application

memcached Servers

Ideal for workloads that exhibit a high read to write ratio.

Client-Server Architecture

0

2000

4000

6000

8000

10000

12000SoAR (Actions/Second)

SLA: 95% of actions to observe a response time faster than 100 msec.

0.1% Write 10% Write

SQL-X SQL-X

CADBMSCADBMS

Barahmand and Ghandeharizadeh. BG: A Social Networking Benchmark. CIDR 2013.

Barahmand and Ghandeharizadeh. Expedited Benchmarking of Social Network Actions. CIKM 2013.

BG Benchmark, http://bgbenchmark.org

BG is a macro benchmark for interactive social networking actions.

BG quantifies the Social Action Rating (SoAR) of a data store:

For a given workload, the maximum number of simultaneous actions performed by a data store while satisfying a pre-specified SLA.

Barahmand and Ghandeharizadeh. BG: A Social Networking Benchmark. CIDR 2013.

Barahmand and Ghandeharizadeh. D-Zipfian: A Decentralized Implementation of Zipfian. SIGMOD DBTest 2013.

Barahmand and Ghandeharizadeh. Expedited Benchmarking of Social Network Actions. CIKM 2013.

Alabdulkarim, Barahmand and Ghandeharizadeh. A Scalable Benchmark for Interactive Social Networking Actions.

Ph.D. Fellowship

Client-Server Architecture

0

2000

4000

6000

8000

10000

12000SoAR (Actions/Second)

SLA: 95% of actions to observe a response time faster than 100 msec.

0.1% Write 10% Write

SQL-X SQL-X

CADBMSCADBMS

Shared Address Space1. Avoid overhead of serialization and

network communication

Data Store Server

Data Store Client

Application

Shared Address Space

0

20000

40000

60000

80000

100000

120000

140000

0.1% Write

SoAR (Actions/Second)

SLA: 95% of actions to observe a response time faster than 100 msec.

10% Write

CADBMS

CADBMS

SQL-X SQL-X

Shared Address Space

0

20000

40000

60000

80000

100000

120000

140000

0.1% Write

SoAR (Actions/Second)

SLA: 95% of actions to observe a response time faster than 100 msec.

10% Write

CADBMS

CADBMS

SQL-X SQL-X

Why?1. CPU overhead of query processing is

more than 85% [1, 2].

Data Store Server

Data Store Client

Application

Cache Servers

Harizopoulos et. al. OLTP: Through the Looking Glass and What We Found There. SIGMOD 2008.

Stonebraker and Cattell. 10 Rules for Scalable Performance in Simple Operation Datastores. CACM 2011.

Architectures Client-Server, Shared-Address Space,

and Hybrids.

Client-Server Shared-Address Space

Ghandeharizadeh, and Yap. Cache Augmented Data Stores. SIGMOD DBSocial 2013.

NON VOLATILE MEMORY

Non Volatile Memory

Flash

DRAM HDD

CPU

DRAM HDD

CPU Flash

DRAM HDD

CPUNVM

Traditional

2010

2017(late 2016)

Flash

DRAM

CPU

Non-Volatile Memory

Byte-addressable

Time to rewrite the key-value stores & database engine!

Configurable:

Time to re-design algorithms

Emulated

HDD

NVM

Emulated

Flash

CPU

Emulated

HDD

DRAM

NVM

Emulated

Flash

Emulated

DRAM

CPU

Digital Era

Database

File System/

Data Store

0011101000000101110101

Future (Biological) Computers

Database DBMS/Data Store