VoltDB - Stonebraker Live! - New York City 2013
-
Upload
voltdbevents -
Category
Technology
-
view
1.814 -
download
1
description
Transcript of VoltDB - Stonebraker Live! - New York City 2013
Stonebraker Live!Navigating the Database Universe
VoltDB presents
BRUCE READING
President and CEO
• Traditional RDBMS is all wrong– Presented by Dr. Michael Stonebraker, Co-founder
• Making sense of the database universe
– Presented by Bruce Reading, President and CEO
• Hello VoltDB 3.0
– Presented by Ryan Betts, Field CTO
Agenda
TRADITIONAL RDBMS WISDOM IS ALL WRONG
Dr. Michael Stonebraker
Traditional RDBMS Wisdom
• Data is in disk block formatting (heavily encoded)
• With a main memory buffer pool of blocks
• Query plans– Optimize CPU, I/O
– Fundamental operation is read a row
• Indexing via B-trees– Clustered or unclustered
Traditional RDBMS Wisdom
• Dynamic row-level locking
• Aries-style write-ahead log
• Replication (asynchronous or synchronous)
– Update the primary first
– Then move the log to other sites
– And roll forward at the secondary (s)
Traditional RDBMS Wisdom
• Describes MySQL, DB2, Postgres, SQLServer, Oracle…
• Focus of most college-level DBMS courses
– Including M.I.T.
• Focus of most DBMS textbooks
Traditional RDBMS Wisdom
• Is completely wrong• (More charitably) is obsolete
The DBMS Marketplace
• About 1/3 “data warehouses”
– Lots of big reads
– Bulk-loaded from OLTP systems
• About 1/3 “OLTP”
– Lots of small updates
– And a few reads
• About 1/3 “everything else”
– Hadoop, NoSQL, graph DBMS, Array DBMS…
The DBMS Marketplace
• Data warehouses
– Market already moving strongly in the direction of column stores
– Which have nothing to do with the traditional wisdom
– Because column stores are 50 – 100 X row stores
The Participants
• Native column store vendors
– HP/Vertica, SAP/Hana, Red Shift (Amazon/Paraccl), SAP/Sybase/IQ
• Native row store vendors
– Microsoft, Oracle, DB2, Netezza
• In transition
– Teradata, Asterdata, Greenplum
• If you are running a row store, then be prepared to switch!
The DBMS Marketplace
• OLTP
– NewSQL systems are wildly faster than the traditional wisdom
• Everything else
– Not an RDBMS market
OLTP Databases – 3 Big Decisions
• Main memory vs. disk orientation• Replication strategy• Concurrency control strategy
Reality Check on OLTP Databases
• TP database size grows at the rate transactions increase• 1 Tbyte of main memory buyable for around $30K (or less)
– (say) 64 Gbytes per server in 16 servers
• 10+ Tbytes possible• If your data doesn’t fit in main memory now, then wait a
couple of years and it will…
Reality Check – Main Memory Performance
• TPC-C CPU cycles
• On the Shore DBMS prototype
• “Elephants” should be similar
To Go Fast
• Must focus on overhead– B-trees affects a small fraction of the path length
• Must get rid of all four pie slices– Anything less gives you a marginal win– TimesTen as an example
16
Buffer Pool Overhead
• Get rid of the buffer pool
• i.e., run a main-memory DBMS
– Like VoltDB
Single Threading
• Hosed unless you do this
– Unless you get rid of queuing (somehow)
– Or eliminate shared data structures (somehow)
• VoltDB statically divides shared memory among the cores
– And cores are single threaded
Concurrency Control
• MVCC popular (NuoDB, Hekaton)
• Time stamp order popular (VoltDB)
• I don’t know anybody who is doing normal dynamic locking
– It’s too slow!!!!
Reality Check – High Availability (HA)
• Requirement in today’s OLTP systems
• Nobody will take down time
• Must be solved through replication
How to Implement HA
• I am only interested in ACID outcomes!!!!
• Eventual consistency actually means “creates garbage”
– Consider 2 customers at 2 sites, each buying the last “widget”
• Even Jeff Dean (Google) has come around to this point of view
How to Implement HA
• Active-Passive
– Effectively requires you to write a log
– One of the four pie slices
• Active-Active (VoltDB solution)
– Send only the transaction, not the effect of the transaction
– Allows read-queries to be sent to any replica
Reality Check – Power Failures
• What to do if you don’t have UPS…
• Cannot lose data on a power failure!!!!
• Two options
– Bring back the log (and the pie slice)
– Command log plus asynchronous checkpoints
Some Data From Nirmesh Malvaiya
• Implemented Aries in VoltDB
• Compared against the VoltDB command logging
• Command logging about 3X faster in total throughput
The Nail in the Coffin
• Time stamp order compatible with active-active
– As are any deterministic schemes
• Locking and MVCC are not
– Need a 2 phase commit between the replicas
– Slow, slow, slow
Net-Net on OLTP
• Main memory DBMS
• Deterministic concurrency control
• HA via active-active
• Has nothing to do with the traditional wisdom
• Even if your data is too big for main memory
– The traditional wisdom is still wrong
– Stay tuned for a paper on this topic
Summary
• What we teach our DBMS students is all wrong
• Implementations from the “elephants” are all obsolete– One-size-does-not-fit-all
– Several million lines of code per vendor are obsolete
• I expect a lot of turmoil in the market off into the future
MAKING SENSE OF THE DATABASE UNIVERSE
Bruce Reading
The fact is…
There’s only more and more to come.
And it’s not slowing down…
Record amounts of data are being created everyday…
And if that data is most valuable at the moment it’s created, how do you
put it to use NOW?
How do you automate decisioning against it NOW?
NOW
Imagine…
Nice story. So what?
Large, busy bank
Rogue trader
5 “Mistypednumber”
-$Small sum lost9 “Mistyped
number”
& “Mistypednumber
-$Small sum lost
-$
Small sum lost
Oblivious
-$-$
-$
-$
-$
-$
-$
-$
-$-$
-$
-$
-$
-$-$
-$
-$
-$-$
-$
-$
-$ -$
-$
-$
-$
-$
-$
-$-$
-$
-$
-$
-$
-$
-$
-$
-$
-$
-$
-$
-$ -$-$
-$
-$
-$-$
-$
-$
-$
-$-$
-$-$
-$
-$
-$
-$-$
-$
-$
-$
-$ -$
-$
-$-$
-$-$-$
-$-$-$
-$
-$-$
-$-$
-$
-$
-$
-$
-$
-$
-$
-$
-$
-$ -$
-$2BNLarge sum lost
Third largest loss inbanking history
UBS couldn't flag it among all the data... until it was too late.
This is our world now.
Same old, same old won’t cut it.
What’s a developer to do?
Data Value Chain
Interactive Real-time Analytics Record Lookup Historical Analytics Exploratory Analytics
Milliseconds Hundredths of seconds Second(s) Minutes Hours
• Place trade• Serve ad• Enrich stream• Examine packet• Approve trans.
• Calculate risk• Leaderboard• Aggregate• Count
• Retrieve click stream
• Show orders
• Backtest algo• BI• Daily reports
• Algo discovery• Log analysis• Fraud pattern match
Age of Data
Data Value Chain
Interactive Real-time Analytics Record Lookup Historical Analytics Exploratory Analytics
Milliseconds Hundredths of seconds Second(s) Minutes Hours
• Place trade• Serve ad• Enrich stream• Examine packet• Approve trans.
• Calculate risk• Leaderboard• Aggregate• Count
• Retrieve click stream
• Show orders
• Backtest algo• BI• Daily reports
• Algo discovery• Log analysis• Fraud pattern match
Value of Individual Data Item
Data V
alue
AggregateData Value
Age of Data
Traditional RDBMSSimple SlowSmall
FastComplexLarge
Ap
pli
cati
on
Co
mp
lexi
ty
Value of Individual Data Item Aggregate Data Value
Data V
alue
The Database Universe
Interactive Real-time Analytics Record Lookup Historical Analytics Exploratory Analytics
Transactional Analytic
Traditional RDBMS
Simple SlowSmall
FastComplexLarge
Ap
pli
cati
on
Co
mp
lexi
ty
Value of Individual Data Item Aggregate Data Value
Data V
alue
Data Warehouse
Hadoop, etc.NoSQL
The Database Universe
Interactive Real-time Analytics Record Lookup Historical Analytics Exploratory Analytics
Transactional Analytic
NewSQL
Velocity
The fastest, most scalable database on the market todayVoltDBIngest massive quantities of data and
perform automated decisioning in real time3 MILLION transactions
per second Dramatically lowering your cost per
transactionVoltDB enables
NOW.A huge impact on the bottom lineNOW
PREVENT
ACHIEVE
Anything is possible…
Electrical smart grids
Micro-personalization
Real-time display targeting
Dynamic airline ticket purchasing
State-of-the-art social networking
Session management
Network monitoring
We enable NOW.
www.VoltDB.com
HELLO 3.0!
Ryan Betts
Introducing VoltDB 3.0
VoltDB 3.0
VoltDB: a modern OLTP database built for a high velocity world.
– Horizontal scalability
– Hundreds of thousands of transactions per second
– Relational SQL
Latency and Throughput, 50-50 Read/Write Workload
Latency and Throughput, 50-50 Read/Write Workload
0 20000 40000 60000 80000 100000 120000 140000 160000 180000 2000000
2
4
6
8
10
12
14
16
3.02.8.4.1
TPS
La
ten
cy
(m
s)
VoltDB 3.0 vs. v2.8.4.1Key/Value 50/50 read/write workload
3 Node, K=1 Cluster
Read/Write Workload Latency/Throughput
Read/Write Workload Latency/Throughput
0 50000 100000 150000 200000 250000 300000 3500000
1
2
3
4
5
6
7
8
9
10% read/90% write
50% read/50% write
90% read/10% write
TPS
Avg
. L
aten
cy (
ms)
VoltDB 3.0Key/Value various read/write workload
3 Node, K=1 Cluster
Faster: Ad Hoc SQL Performance
• Conversational SQL
• Thousands to 10,000+ ad hoc SQL transactions/second
• Single or multiple (batch) SQL statement transactionFaster: Ad Hoc SQL Performance
Easier Development: New SQL Support
• SQL LIKE and NOT LIKE
• UNION
• Column Functions
• Counting function (leaderboard ranking queries)
• Ability to define index using column functions
Easier Development: New SQL Support
• JSON values stored in a varchar column
• Field() column function
• Indexing on JSON elements
CREATE INDEX session_site_moderator
ON user_session_table (field(json_data, 'site'),
field(json_data, 'moderator'), username);
• New JSON sample in kit
Easier Development: JSON Support
Easier Development: JSON Support
Easier Development: Online Operations
Easier Development: Online Operations
• Ability to re-join a failed node to cluster with no impact to existing operations
• Online schema update
• No service window
Easier Development: Streamlined Development
• Elimination of project.xml
• VoltDB-specific configuration now defined in DDL
• Defaulting of deployment.xml
• New Volt Compiler CLI:
voltdb compile
Easier Development: Streamlined Development
Expanded Reach: Cloud-Friendly
• Reduce impact of variable node performance and latency
• Elimination of strict NTP configuration
• Scales to large # of nodesExpanded Reach: Cloud-Friendly
Integration: High-Performance Export
• Parallelized export
• New connectors: JDBC, Netezza, VerticaIntegration: High-Performance Export
Integration: Client Library Updates
• New PHP Client
• Node.js client v1.0
• Go Client
• Coming soon: updated Erlang client
Integration: Client Library Updates
http://golang.org
Other Notable New Features
• Explain command
• CSV loader utility
• CSV snapshots
• New Administration CLI: voltadmin– voltadmin save
– voltadmin restore
– voltadmin pause
– voltadmin resume
– voltadmin shutdown
Other Notable New Features
More Samples Available for Download
More Samples Available for Download
http://voltdb.com/community/volt-labs.php
Volt University
• Portfolio of instructional content, classes, tools, and other resources to help them built applications quickly
• Curriculum and supporting material range from beginner to advanced
• Three types of instruction:
– Volt University Online
– Volt University Classroom
– Volt Vanguard Certification
Volt University
Summary: VoltDB v3.0
• Run faster: transactions at high velocity scale.
• Create faster: write and scale your ACID application.
• Learn faster: Volt Labs & VoltDB University
VoltDB v3.0
DOWNLOAD 3.0at
www.voltdb.com
Imagine the Possibilities
More Information?
E-mail [email protected]
Visit our forumshttp://community.voltdb.com/forum
Read the VoltDB “Getting Started Guide”http://community.voltdb.com/docs/GettingStarted/index
Follow @VoltDB on Twitter
More Information?
QUESTIONS?
THANK YOU