VoltDB - Stonebraker Live! - New York City 2013

Stonebraker Live!Navigating the Database Universe

VoltDB presents

BRUCE READING

President and CEO

• Traditional RDBMS is all wrong– Presented by Dr. Michael Stonebraker, Co-founder

• Making sense of the database universe

– Presented by Bruce Reading, President and CEO

• Hello VoltDB 3.0

– Presented by Ryan Betts, Field CTO

Agenda

TRADITIONAL RDBMS WISDOM IS ALL WRONG

Dr. Michael Stonebraker

Traditional RDBMS Wisdom

• Data is in disk block formatting (heavily encoded)

• With a main memory buffer pool of blocks

• Query plans– Optimize CPU, I/O

– Fundamental operation is read a row

• Indexing via B-trees– Clustered or unclustered


• Dynamic row-level locking

• Aries-style write-ahead log

• Replication (asynchronous or synchronous)

– Update the primary first

– Then move the log to other sites

– And roll forward at the secondary (s)


• Describes MySQL, DB2, Postgres, SQLServer, Oracle…

• Focus of most college-level DBMS courses

– Including M.I.T.

• Focus of most DBMS textbooks


• Is completely wrong• (More charitably) is obsolete

The DBMS Marketplace

• About 1/3 “data warehouses”

– Lots of big reads

– Bulk-loaded from OLTP systems

• About 1/3 “OLTP”

– Lots of small updates

– And a few reads

• About 1/3 “everything else”

– Hadoop, NoSQL, graph DBMS, Array DBMS…


• Data warehouses

– Market already moving strongly in the direction of column stores

– Which have nothing to do with the traditional wisdom

– Because column stores are 50 – 100 X row stores

The Participants

• Native column store vendors

– HP/Vertica, SAP/Hana, Red Shift (Amazon/Paraccl), SAP/Sybase/IQ

• Native row store vendors

– Microsoft, Oracle, DB2, Netezza

• In transition

– Teradata, Asterdata, Greenplum

• If you are running a row store, then be prepared to switch!


• OLTP

– NewSQL systems are wildly faster than the traditional wisdom

• Everything else

– Not an RDBMS market

OLTP Databases – 3 Big Decisions

• Main memory vs. disk orientation• Replication strategy• Concurrency control strategy

Reality Check on OLTP Databases

• TP database size grows at the rate transactions increase• 1 Tbyte of main memory buyable for around $30K (or less)

– (say) 64 Gbytes per server in 16 servers

• 10+ Tbytes possible• If your data doesn’t fit in main memory now, then wait a

couple of years and it will…

Reality Check – Main Memory Performance

• TPC-C CPU cycles

• On the Shore DBMS prototype

• “Elephants” should be similar

To Go Fast

• Must focus on overhead– B-trees affects a small fraction of the path length

• Must get rid of all four pie slices– Anything less gives you a marginal win– TimesTen as an example

16

Buffer Pool Overhead

• Get rid of the buffer pool

• i.e., run a main-memory DBMS

– Like VoltDB

Single Threading

• Hosed unless you do this

– Unless you get rid of queuing (somehow)

– Or eliminate shared data structures (somehow)

• VoltDB statically divides shared memory among the cores

– And cores are single threaded

Concurrency Control

• MVCC popular (NuoDB, Hekaton)

• Time stamp order popular (VoltDB)

• I don’t know anybody who is doing normal dynamic locking

– It’s too slow!!!!

Reality Check – High Availability (HA)

• Requirement in today’s OLTP systems

• Nobody will take down time

• Must be solved through replication

How to Implement HA

• I am only interested in ACID outcomes!!!!

• Eventual consistency actually means “creates garbage”

– Consider 2 customers at 2 sites, each buying the last “widget”

• Even Jeff Dean (Google) has come around to this point of view

How to Implement HA

• Active-Passive

– Effectively requires you to write a log

– One of the four pie slices

• Active-Active (VoltDB solution)

– Send only the transaction, not the effect of the transaction

– Allows read-queries to be sent to any replica

Reality Check – Power Failures

• What to do if you don’t have UPS…

• Cannot lose data on a power failure!!!!

• Two options

– Bring back the log (and the pie slice)

– Command log plus asynchronous checkpoints

Some Data From Nirmesh Malvaiya

• Implemented Aries in VoltDB

• Compared against the VoltDB command logging

• Command logging about 3X faster in total throughput

The Nail in the Coffin

• Time stamp order compatible with active-active

– As are any deterministic schemes

• Locking and MVCC are not

– Need a 2 phase commit between the replicas

– Slow, slow, slow

Net-Net on OLTP

• Main memory DBMS

• Deterministic concurrency control

• HA via active-active

• Has nothing to do with the traditional wisdom

• Even if your data is too big for main memory

– The traditional wisdom is still wrong

– Stay tuned for a paper on this topic

Summary

• What we teach our DBMS students is all wrong

• Implementations from the “elephants” are all obsolete– One-size-does-not-fit-all

– Several million lines of code per vendor are obsolete

• I expect a lot of turmoil in the market off into the future

MAKING SENSE OF THE DATABASE UNIVERSE

Bruce Reading

The fact is…

There’s only more and more to come.

And it’s not slowing down…

Record amounts of data are being created everyday…

And if that data is most valuable at the moment it’s created, how do you

put it to use NOW?

How do you automate decisioning against it NOW?

Imagine…

Nice story. So what?

Large, busy bank

Rogue trader

5 “Mistypednumber”

-$Small sum lost9 “Mistyped

number”

& “Mistypednumber

-$Small sum lost

-$

Small sum lost

Oblivious

-$-$

-$

-$

-$

-$

-$

-$

-$-$

-$

-$

-$

-$-$

-$

-$

-$-$

-$

-$

-$ -$

-$

-$

-$

-$

-$

-$-$

-$

-$

-$

-$

-$

-$

-$

-$

-$

-$

-$

-$ -$-$

-$

-$

-$-$

-$

-$

-$

-$-$

-$-$

-$

-$

-$

-$-$

-$

-$

-$

-$ -$

-$

-$-$

-$-$-$

-$-$-$

-$

-$-$

-$-$

-$

-$

-$

-$

-$

-$

-$

-$

-$

-$ -$

-$2BNLarge sum lost

Third largest loss inbanking history

UBS couldn't flag it among all the data... until it was too late.

This is our world now.

Same old, same old won’t cut it.

What’s a developer to do?

Data Value Chain

Interactive Real-time Analytics Record Lookup Historical Analytics Exploratory Analytics

Milliseconds Hundredths of seconds Second(s) Minutes Hours

• Place trade• Serve ad• Enrich stream• Examine packet• Approve trans.

• Calculate risk• Leaderboard• Aggregate• Count

• Retrieve click stream

• Show orders

• Backtest algo• BI• Daily reports

• Algo discovery• Log analysis• Fraud pattern match

Age of Data

Data Value Chain


Milliseconds Hundredths of seconds Second(s) Minutes Hours

• Place trade• Serve ad• Enrich stream• Examine packet• Approve trans.

• Calculate risk• Leaderboard• Aggregate• Count

• Retrieve click stream

• Show orders

• Backtest algo• BI• Daily reports

• Algo discovery• Log analysis• Fraud pattern match

Value of Individual Data Item

Data V

alue

AggregateData Value

Age of Data

Traditional RDBMSSimple SlowSmall

FastComplexLarge

Ap

pli

cati

on

Co

mp

lexi

ty

Value of Individual Data Item Aggregate Data Value

Data V

alue

The Database Universe


Transactional Analytic

Traditional RDBMS

Simple SlowSmall

FastComplexLarge

Ap

pli

cati

on

Co

mp

lexi

ty

Value of Individual Data Item Aggregate Data Value

Data V

alue

Data Warehouse

Hadoop, etc.NoSQL

The Database Universe


Transactional Analytic

NewSQL

Velocity

The fastest, most scalable database on the market todayVoltDBIngest massive quantities of data and

perform automated decisioning in real time3 MILLION transactions

per second Dramatically lowering your cost per

transactionVoltDB enables

NOW.A huge impact on the bottom lineNOW

PREVENT

ACHIEVE

Anything is possible…

Electrical smart grids

Micro-personalization

Real-time display targeting

Dynamic airline ticket purchasing

State-of-the-art social networking

Session management

Network monitoring

We enable NOW.

www.VoltDB.com

HELLO 3.0!

Ryan Betts

Introducing VoltDB 3.0

VoltDB 3.0

VoltDB: a modern OLTP database built for a high velocity world.

– Horizontal scalability

– Hundreds of thousands of transactions per second

– Relational SQL

Latency and Throughput, 50-50 Read/Write Workload

Latency and Throughput, 50-50 Read/Write Workload

0 20000 40000 60000 80000 100000 120000 140000 160000 180000 2000000

2

4

6

8

10

12

14

16

3.02.8.4.1

TPS

La

ten

cy

(m

s)

VoltDB 3.0 vs. v2.8.4.1Key/Value 50/50 read/write workload

3 Node, K=1 Cluster

Read/Write Workload Latency/Throughput

Read/Write Workload Latency/Throughput

0 50000 100000 150000 200000 250000 300000 3500000

1

2

3

4

5

6

7

8

9

10% read/90% write

50% read/50% write

90% read/10% write

TPS

Avg

. L

aten

cy (

ms)

VoltDB 3.0Key/Value various read/write workload

3 Node, K=1 Cluster

Faster: Ad Hoc SQL Performance

• Conversational SQL

• Thousands to 10,000+ ad hoc SQL transactions/second

• Single or multiple (batch) SQL statement transactionFaster: Ad Hoc SQL Performance

Easier Development: New SQL Support

• SQL LIKE and NOT LIKE

• UNION

• Column Functions

• Counting function (leaderboard ranking queries)

• Ability to define index using column functions

Easier Development: New SQL Support

• JSON values stored in a varchar column

• Field() column function

• Indexing on JSON elements

CREATE INDEX session_site_moderator

ON user_session_table (field(json_data, 'site'),

field(json_data, 'moderator'), username);

• New JSON sample in kit

Easier Development: JSON Support

Easier Development: JSON Support

Easier Development: Online Operations

Easier Development: Online Operations

• Ability to re-join a failed node to cluster with no impact to existing operations

• Online schema update

• No service window

Easier Development: Streamlined Development

• Elimination of project.xml

• VoltDB-specific configuration now defined in DDL

• Defaulting of deployment.xml

• New Volt Compiler CLI:

voltdb compile

Easier Development: Streamlined Development

Expanded Reach: Cloud-Friendly

• Reduce impact of variable node performance and latency

• Elimination of strict NTP configuration

• Scales to large # of nodesExpanded Reach: Cloud-Friendly

Integration: High-Performance Export

• Parallelized export

• New connectors: JDBC, Netezza, VerticaIntegration: High-Performance Export

Integration: Client Library Updates

• New PHP Client

• Node.js client v1.0

• Go Client

• Coming soon: updated Erlang client

Integration: Client Library Updates

http://golang.org

Other Notable New Features

• Explain command

• CSV loader utility

• CSV snapshots

• New Administration CLI: voltadmin– voltadmin save

– voltadmin restore

– voltadmin pause

– voltadmin resume

– voltadmin shutdown

Other Notable New Features

More Samples Available for Download

More Samples Available for Download

http://voltdb.com/community/volt-labs.php



Volt University

• Portfolio of instructional content, classes, tools, and other resources to help them built applications quickly

• Curriculum and supporting material range from beginner to advanced

• Three types of instruction:

– Volt University Online

– Volt University Classroom

– Volt Vanguard Certification

Volt University

Summary: VoltDB v3.0

• Run faster: transactions at high velocity scale.

• Create faster: write and scale your ACID application.

• Learn faster: Volt Labs & VoltDB University

VoltDB v3.0

DOWNLOAD 3.0at

www.voltdb.com

Imagine the Possibilities

More Information?

E-mail [email protected]

Visit our forumshttp://community.voltdb.com/forum

Read the VoltDB “Getting Started Guide”http://community.voltdb.com/docs/GettingStarted/index

Follow @VoltDB on Twitter

More Information?

QUESTIONS?

THANK YOU

VoltDB - Stonebraker Live! - New York City 2013

Technology

Transcript of VoltDB - Stonebraker Live! - New York City 2013