Strata schemes management queensland presentation select strata
SQLFire at Strata 2012
-
Upload
carter-shanklin -
Category
Technology
-
view
2.767 -
download
0
description
Transcript of SQLFire at Strata 2012
SQLFire
Fast meets scalable in VMware‘s NewSQL database.
Strata 2012
Jags Ramnarayan – Chief Architect, SQLFireCarter Shanklin – Product Manager, SQLFire
Sponsor Sessions Suck• We Promise To:
– Keep it relevant.– Keep it technical.– Keep it entertaining.
Speed Matters
Users demand fast applications and fast websites.The database is the hardest thing to scale.
Speed• In-memory for maximum
speed and minimum latency.
SQLFire: Speed, Scale, SQLScale
• Horizontally scalable.• Add or remove nodes at
any time for more capacity or availability.
SQL• Familiar SQL interface.• SQL 92 compliant.• JDBC and ADO.NET
interfaces.
How does SQLFire get scale and speed?
• Horizontal scaleout + Dynamic partitioning– Appears to app as single database
• Tunable consistency– Including asynchronous global distribution
• In-memory architecture– “Memory is the new disk, disk the new tape”
Diverging needs for online and analytics
Online Layer
Analytics Layer
User Concurren
cy
Update Rate
Query Richness
Data Volume
SQLFire: What does it really look like?
1
2
3
4
5
6
7
8
910
SQLFire Tables Are Replicated By Default. CREATE TABLE sales
(product_id int, store_id int,
price float);
SQLFire Node 1
SQLFire Node 2
Replica
Replica
sales
Best for small andfrequently accessed
data.
1
2
3
4
5
6
7
8
910
Partitioned Tables Are Split Among Members. CREATE TABLE sales
(product_id int, store_id int,
price float)
PARTITION BY
COLUMN (product_id);
SQLFire Node 1
SQLFire Node 2
Replica
Replica
sales Partition 1
Partition 2Best for largedata sets.
Type Purpose Example
Hash Partitioning (Default)
Built-in hashing algorithm splits data at random across available servers.
PARTITION BY COLUMN (customer_id);
ListManually divide data across servers based on discrete criteria.
PARTITION BY LIST (home_state) (VALUES (‘CA’, ‘WA’), VALUES (‘TX’, ‘OK’));
RangeManually divide data across servers based on continuous criteria.
PARTITION BY RANGE (date) (VALUES BETWEEN ‘2008-01-01’ AND ‘2008-12-31’, VALUES BETWEEN ‘2009-01-01’ AND ‘2009-12-30’);
ExpressionFully dynamic division of data based on function execution. Can use UDFs.
PARTITION BY (MONTH(date));
Types Of Partitioning In SQLFire.
How does it scale for queries?
N = 2 4 6 8 10
200k
420k
604k
790k
1M
Partitioned TablePK queries per second
(1kb Rows)
Number Of Servers
# Clients = 2*N
200
400
600
800
1000
How does it scale for updates?
N = 2 4 6 8 10
220k
490k
750k
950k
1.3M
Partitioned TableUpdates Per Second
(3 columns)
Number Of Servers
85% < 1mslatency # Clients = 2*N
200
400
600
800
1000
1
2
3
4
5
6
7
8
910
Redundancy Increases Availability. CREATE TABLE sales
(product_id int, store_id int,
price float)
PARTITION BY
COLUMN (product_id);
REDUNDANCY 1;
SQLFire Node 1
Partition 2*
SQLFire Node 2
Partition 1*
Replica
Replica
salesPartition 1
Partition 2All data is availableif Node 1 fails.
Partitioning and redundancy
Redundancy = 2(but tunable)
Single ownerfor any row at point
in time
Replication can be “rack aware”
Replication is synchronous but done
in parallel
SQLFire: Derp-Proof Database
• Instant failover at protocol level• Apps retain their connections• Data remains available Was that cord
supposed to be in the wall?
Select * from Customer c, Sales swhere c.cust_id = s.cust_id
and c.cust_id ='xxx';
• With Hash partitioning the join logic executes everywhere
• Distributed joins are expensive and inhibit scaling– joins across distributed nodes could involve distributed locks and
potentially a lot of intermediate data transfer across nodes
Linearly scaling joins
Designer thinks about how data maps to partitions– The main idea is to:
1) minimize excessive data distribution by keeping the most frequently accessed and joined data collocated on partitions
2) Collocate transaction working set on partitions so complex 2-phase commits/paxos commit is eliminated or minimized.
Read Pat Helland’s “Life beyond Distributed Transactions” and the Google MegaStore paper
Partition Aware DB Design
1
2
3
4
5
6
7
8
910
Collocate Data For Fast Joins. CREATE TABLE sales
(product_id int, store_id int,
price float)
PARTITION BY
COLUMN (product_id);
COLOCATE WITH customers;
SQLFire Node 1
Customer 1 Sales
SQLFire Node 2
Customer 2 Sales
Replica
Replica
Customer 1
Customer 2SQLFire can jointables withoutnetwork hops.
C1
C2
Related data placedon the same node.
SQLFire Node 1
Customer 1 Sales
SQLFire Node 2
Customer 2 Sales
Replica
Replica
Customer 1
Customer 2SQLFire can jointables withoutnetwork hops.
C1
C2
Related data placedon the same node.
Select * fromCustomer c, Sales swhere c.cust_id =
s.cust_id and c.cust_id =‘C1';
Collocate Data For Fast Joins.
Query pruned to node 1
SQLFire Node 1
Customer 1 Sales
SQLFire Node 2
Customer 2 Sales
Replica
Replica
Customer 1
Customer 2
In parallel, each node does hash join, aggregation locally
C1
C2
Related data placedon the same node.
SELECT sum(value) AS total FROM sales s, customer c
WHERE s.cust_id = c.cust_id and c.state = ‘CA’ GROUP By
cust_id ORDER By total
Collocate Data For Fast Joins.
Parallel scatter-gather
Dynamic Data Colocation
Redundancy = 2Single master forany entity group
Dynamic entitygroup formation
Based on foreignkey relationships
Data-Aware Stored Procs• Procedure execution routed to the
data• Full scaled-out execution• Highly available• Use pure Java to access/store data• Demo later on Like Map/Reduce But Different
1
2
3
4
5
6
7
8
910
Scaling Stored Procedures CALL maxSales(arguments)
ON TABLE salesWHERE (Location in (‘CA’,’WA’,’OR')
WITH RESULT PROCESSOR
maxSalesReducer
SQLFire uses data-aware routing to
route processing tothe data.
maxSales on local data
maxSales on local data
maxSalesReducer
Result Processorsgive map/reduce
functionality.
Scalability: ConsistencyWith Transactions And Without
- Row updates always atomic and isolated
- FIFO consistency
- Distributed transactions with 1-phase commit- Coordinator per
node- Eager locking + Fail
fastAssumes:Most x-actions small in space and timeWrite-write conflicts rare
• Parallel log structured storage
• Each partition writes in parallel
• Backups write to disk also– Increase reliability
against h/w loss
Scalability: High performance persistence
MemoryTables
Append only Operation logs
OS Buffers
LOG Compressor
Record1
Record2
Record3
Record1
Record2
Record3
MemoryTables
Append only Operation logs
OS Buffers
LOG Compressor
Record1
Record2
Record3
Record1
Record2
Record3
Demos!
Demo: Distributed Procedures
• Autocorrelation of time series
• All pure Java scaled-out• Tolerant of node failures• Using SuanShu Java
library
Demo: Caching• Read-only or…• Read-through / Write-
behind• Cache analytics results• Skip the ETL
http://vmware.com/go/sqlfireTry SQLFire Today!Free for developer use to 3 nodes.
Download:
Forum:http://vmware.com/vmtn/appplatform/vfabric_sqlfireGot questions? Get answers.
:sigh:Just Google it
Twitter: @vFabricSQLFire, @cshanklin, @jagsrI need more followers to get a promotion.
Demo Details
Scaling Stored Procs (1)
Insert Timeseries
Ubuntu(database)
Scaling Stored Procs (2)
Insert Timeseries
Compute Autocorrelations
Complete
Ubuntu(database)
Scaling Stored Procs (3)
Insert Timeseries
Compute Autocorrelations
Complete
Ubuntu(database)
Compute Autocorrelations
Complete
Ubuntu(database)
Compute Autocorrelations
Complete
Ubuntu(database)
Rebalance Rebalance
All usingstandard SQL
APIs
Caching Analytics (1)
Continuous BatchProcessing
Caching Analytics (2)
Low latency
Ubuntu(database)
Continuous BatchProcessing
In-memorycaching
JDBC rowloader
Caching Analytics (3)
Low latency
Ubuntu(database)
Continuous BatchProcessing
In-memorycaching
Scalable +Tunable Cache
Policies
• LRU Count– Overflow to disk or destroy.
• Time To Live– Counter ticks as soon as the row is loaded.
• Idle Time– Destroy rows when they are not accessed for a while.
• Specified in CREATE TABLE syntax.
Caching Policies