Taming the Big Data Fire Hose

the NewSQL database you’ll never outgrow

Taming the Big DataFire Hose

John HuggSr. Software Engineer, VoltDB

VoltDB 2

Big Data Defined

Velocity+ Moves at very high rates (think sensor-driven systems)+ Valuable in its temporal, high velocity state

Volume+ Fast-moving data creates massive historical archives+ Valuable for mining patterns, trends and relationships

Variety+ Structured (logs, business transactions)+ Semi-structured and unstructured

VoltDB 3

Lower-frequency operations

High-frequency operations

DataSource

Example Big Data Use Cases

Capital markets Write/index all trades, store tick data

Show consolidated risk across traders

Call initiation request Real-time authorization Fraud detection/analysis

Inbound HTTP requests

Visitor logging, analysis, alerting Traffic pattern analytics

Online gameRank scores:•Defined intervals•Player “bests”

Leaderboard lookups

Real-time ad trading systems

Match form factor, placement criteria, bid/ask

Report ad performance from exhaust stream

Mobile device location sensor

Location updates, QoS, transactions Analytics on transactions

VoltDB 4

Big Data and You

Incoming data streams are different than traditional business apps

+ You need to write data quickly and reliably, but …

It’s not just about high speed writes+ You need to validate in real-time+ You need to count and aggregate+ You need to analyze in real-time+ You need to scale on demand+ You may need to transact

Big Data and You

VoltDB 5

Big Data Management Infrastructure

Online gaming

Adserving

Sensordata

Internetcommerc

e

SaaS,Web 2.0

Mobileplatforms

Financialtrade

Structured data ACID guarantees Relational/SQL Real-time analytics

NewSQL

Unstructured data Eventual consistency Schemaless KV, document

NoSQL

Other OLAPdata stores

AnalyticDatastore

High Velocity High Volume

VoltDB 6

Big Data Management Infrastructure

Online gaming

Adserving

Sensordata

Internetcommerc

e

SaaS,Web 2.0

Mobileplatforms

Financialtrade

NewSQL

NoSQL

Other OLAPdata stores

AnalyticDatastore

High Velocity High Volume

High VelocityData Management

VoltDB 8

High Velocity DBMS Requirements

Ingest at very high speeds and rates Scale easily to meet growth and demand peaks Support integrated fault tolerance Support a wide range of real-time (or “near-time”)

analytics Integrate easily with high volume analytic datastores

VoltDB 9

High Speed Data Ingestion

Support millions of write operations per second at scale

Read and write latencies below 50 milliseconds Provide ACID-level consistency guarantees (maybe) Support one or more well-known application

interfaces+ SQL+ Key/Value+ Document

VoltDB 10

Scale to Meet Growth and Demand

Scale-out on commodity hardware Built-in database partitioning

+ Manual sharding and/or add-on solutions are brittle, require apps to do “heavy lifting”, and can be an operational nightmare

Database must automatically implement defined partitioning strategy

+ Application should “see” a single database instance

Database should encourage scalability best practices+ For example, replication of reference data minimizes need for

multi-partition operations

VoltDB 11

A Look Inside Partitioning

1 101 21 101 34 401 2

1 knife2 spoon3 fork

Partition 1

2 201 15 501 35 502 2


Partition 2

3 201 16 601 16 601 2


Partition 3

table orders : customer_id (partition key)(partitioned) order_id

product_id

table products : product_id (replicated) product_name

select count(*) from orders where customer_id = 5single-partition

select count(*) from orders where product_id = 3multi-partition

insert into orders (customer_id, order_id, product_id) values (3,303,2)single-partition

update products set product_name = ‘spork’ where product_id = 3multi-partition

VoltDB 12

Integrated Fault Tolerance

Database should transparently support built-in “Tandem-style” HA

+ Users should be able to easily increase/decrease fault tolerance levels

Database should be easily and quickly recoverable in the event of severe hardware failures

Database should be able to automatically detect and manage a variety of partition fault conditions

Downed nodes should be “rejoinable” without the need for service windows

VoltDB 13

Partition Detection & Recovery

Server A

Server B

Server C

Network fault protectionDetects partition event

Determines which side of fault to disable

Snapshots and disables orphaned node(s)

Server A

Server B

Server C

Live node rejoinAllows “downed” nodes to rejoin live cluster

Automatically re-synchs all node data

Coordinates transactions during re-synch

VoltDB 14

Real-time Analytics

Database should support a wide variety of high performance reads

+ High-frequency single-partition+ Lower-frequency multi-partition

Common analytic queries should be optimized in the database

+ Multi-partition aggregations, limits, etc.

Database should accommodate a flexible range of relational data operations

+ Particularly relevant to structured data

VoltDB 15

Integration with Analytic Datastores

Database should offer high performance, transactional export

Export should allow a wide variety of common data enrichment operations

+ Normalize and de-normalize+ De-duplicate+ Aggregate

Architecture should support loosely-coupled integrations

+ Impedance mismatches+ Durability

VoltDB 16

VoltDB Export Data Flow

Loosely-coupled, asynchronous Queue must be durable Bi-directional durability

High VelocityDatabase Cluster

VoltDB 17

Summary

Big Data infrastructures will usually require more than one engine

+ High velocity engine for “fast” data+ Analytic engine for “deep” data

Data characteristics will often determine which high velocity engine to use

+ NewSQL is often well-suited to structured data+ NoSQL is often a good fit for unstructured data

Choose solutions that suit your needs and are designed for interoperability

Taming the Big Data Fire Hose

Documents

Transcript of Taming the Big Data Fire Hose