Taming the Big Data Fire Hose
description
Transcript of Taming the Big Data Fire Hose
the NewSQL database you’ll never outgrow
Taming the Big DataFire Hose
John HuggSr. Software Engineer, VoltDB
VoltDB 2
Big Data Defined
Velocity+ Moves at very high rates (think sensor-driven systems)+ Valuable in its temporal, high velocity state
Volume+ Fast-moving data creates massive historical archives+ Valuable for mining patterns, trends and relationships
Variety+ Structured (logs, business transactions)+ Semi-structured and unstructured
VoltDB 3
Lower-frequency operations
High-frequency operations
DataSource
Example Big Data Use Cases
Capital markets Write/index all trades, store tick data
Show consolidated risk across traders
Call initiation request Real-time authorization Fraud detection/analysis
Inbound HTTP requests
Visitor logging, analysis, alerting Traffic pattern analytics
Online gameRank scores:•Defined intervals•Player “bests”
Leaderboard lookups
Real-time ad trading systems
Match form factor, placement criteria, bid/ask
Report ad performance from exhaust stream
Mobile device location sensor
Location updates, QoS, transactions Analytics on transactions
VoltDB 4
Big Data and You
Incoming data streams are different than traditional business apps
+ You need to write data quickly and reliably, but …
It’s not just about high speed writes+ You need to validate in real-time+ You need to count and aggregate+ You need to analyze in real-time+ You need to scale on demand+ You may need to transact
Big Data and You
VoltDB 5
Big Data Management Infrastructure
Online gaming
Adserving
Sensordata
Internetcommerc
e
SaaS,Web 2.0
Mobileplatforms
Financialtrade
Structured data ACID guarantees Relational/SQL Real-time analytics
NewSQL
Unstructured data Eventual consistency Schemaless KV, document
NoSQL
Other OLAPdata stores
AnalyticDatastore
High Velocity High Volume
VoltDB 6
Big Data Management Infrastructure
Online gaming
Adserving
Sensordata
Internetcommerc
e
SaaS,Web 2.0
Mobileplatforms
Financialtrade
NewSQL
NoSQL
Other OLAPdata stores
AnalyticDatastore
High Velocity High Volume
High VelocityData Management
VoltDB 8
High Velocity DBMS Requirements
Ingest at very high speeds and rates Scale easily to meet growth and demand peaks Support integrated fault tolerance Support a wide range of real-time (or “near-time”)
analytics Integrate easily with high volume analytic datastores
VoltDB 9
High Speed Data Ingestion
Support millions of write operations per second at scale
Read and write latencies below 50 milliseconds Provide ACID-level consistency guarantees (maybe) Support one or more well-known application
interfaces+ SQL+ Key/Value+ Document
VoltDB 10
Scale to Meet Growth and Demand
Scale-out on commodity hardware Built-in database partitioning
+ Manual sharding and/or add-on solutions are brittle, require apps to do “heavy lifting”, and can be an operational nightmare
Database must automatically implement defined partitioning strategy
+ Application should “see” a single database instance
Database should encourage scalability best practices+ For example, replication of reference data minimizes need for
multi-partition operations
VoltDB 11
A Look Inside Partitioning
1 101 21 101 34 401 2
1 knife2 spoon3 fork
Partition 1
2 201 15 501 35 502 2
1 knife2 spoon3 fork
Partition 2
3 201 16 601 16 601 2
1 knife2 spoon3 fork
Partition 3
table orders : customer_id (partition key)(partitioned) order_id
product_id
table products : product_id (replicated) product_name
select count(*) from orders where customer_id = 5single-partition
select count(*) from orders where product_id = 3multi-partition
insert into orders (customer_id, order_id, product_id) values (3,303,2)single-partition
update products set product_name = ‘spork’ where product_id = 3multi-partition
VoltDB 12
Integrated Fault Tolerance
Database should transparently support built-in “Tandem-style” HA
+ Users should be able to easily increase/decrease fault tolerance levels
Database should be easily and quickly recoverable in the event of severe hardware failures
Database should be able to automatically detect and manage a variety of partition fault conditions
Downed nodes should be “rejoinable” without the need for service windows
VoltDB 13
Partition Detection & Recovery
Server A
Server B
Server C
Network fault protectionDetects partition event
Determines which side of fault to disable
Snapshots and disables orphaned node(s)
Server A
Server B
Server C
Live node rejoinAllows “downed” nodes to rejoin live cluster
Automatically re-synchs all node data
Coordinates transactions during re-synch
VoltDB 14
Real-time Analytics
Database should support a wide variety of high performance reads
+ High-frequency single-partition+ Lower-frequency multi-partition
Common analytic queries should be optimized in the database
+ Multi-partition aggregations, limits, etc.
Database should accommodate a flexible range of relational data operations
+ Particularly relevant to structured data
VoltDB 15
Integration with Analytic Datastores
Database should offer high performance, transactional export
Export should allow a wide variety of common data enrichment operations
+ Normalize and de-normalize+ De-duplicate+ Aggregate
Architecture should support loosely-coupled integrations
+ Impedance mismatches+ Durability
VoltDB 16
VoltDB Export Data Flow
Loosely-coupled, asynchronous Queue must be durable Bi-directional durability
High VelocityDatabase Cluster
VoltDB 17
Summary
Big Data infrastructures will usually require more than one engine
+ High velocity engine for “fast” data+ Analytic engine for “deep” data
Data characteristics will often determine which high velocity engine to use
+ NewSQL is often well-suited to structured data+ NoSQL is often a good fit for unstructured data
Choose solutions that suit your needs and are designed for interoperability