SQLFire at Strata 2012

43
SQLFire Fast meets scalable in VMware‘s NewSQL database. Strata 2012 Jags Ramnarayan – Chief Architect, SQLFire Carter Shanklin – Product Manager, SQLFire

description

SQLFire is VMware's in-memory distributed NewSQL database. I delivered this preso in connection with Jags, the product architect and we covered the design choices SQLFire makes to achieve extreme scalability, as well as the connection between big data and fast data. The deck looks a little different in presenter mode so for best results download and enjoy.

Transcript of SQLFire at Strata 2012

Page 1: SQLFire at Strata 2012

SQLFire

Fast meets scalable in VMware‘s NewSQL database.

Strata 2012

Jags Ramnarayan – Chief Architect, SQLFireCarter Shanklin – Product Manager, SQLFire

Page 2: SQLFire at Strata 2012

Sponsor Sessions Suck• We Promise To:

– Keep it relevant.– Keep it technical.– Keep it entertaining.

Page 3: SQLFire at Strata 2012

Speed Matters

Users demand fast applications and fast websites.The database is the hardest thing to scale.

Page 4: SQLFire at Strata 2012

Speed• In-memory for maximum

speed and minimum latency.

SQLFire: Speed, Scale, SQLScale

• Horizontally scalable.• Add or remove nodes at

any time for more capacity or availability.

SQL• Familiar SQL interface.• SQL 92 compliant.• JDBC and ADO.NET

interfaces.

Page 5: SQLFire at Strata 2012

How does SQLFire get scale and speed?

• Horizontal scaleout + Dynamic partitioning– Appears to app as single database

• Tunable consistency– Including asynchronous global distribution

• In-memory architecture– “Memory is the new disk, disk the new tape”

Page 6: SQLFire at Strata 2012
Page 7: SQLFire at Strata 2012

Diverging needs for online and analytics

Online Layer

Analytics Layer

User Concurren

cy

Update Rate

Query Richness

Data Volume

Page 8: SQLFire at Strata 2012
Page 9: SQLFire at Strata 2012
Page 10: SQLFire at Strata 2012
Page 11: SQLFire at Strata 2012
Page 12: SQLFire at Strata 2012
Page 13: SQLFire at Strata 2012

SQLFire: What does it really look like?

Page 14: SQLFire at Strata 2012

1

2

3

4

5

6

7

8

910

SQLFire Tables Are Replicated By Default. CREATE TABLE sales

(product_id int, store_id int,

price float);

SQLFire Node 1

SQLFire Node 2

Replica

Replica

sales

Best for small andfrequently accessed

data.

Page 15: SQLFire at Strata 2012

1

2

3

4

5

6

7

8

910

Partitioned Tables Are Split Among Members. CREATE TABLE sales

(product_id int, store_id int,

price float)

PARTITION BY

COLUMN (product_id);

SQLFire Node 1

SQLFire Node 2

Replica

Replica

sales Partition 1

Partition 2Best for largedata sets.

Page 16: SQLFire at Strata 2012

Type Purpose Example

Hash Partitioning (Default)

Built-in hashing algorithm splits data at random across available servers.

PARTITION BY COLUMN (customer_id);

ListManually divide data across servers based on discrete criteria.

PARTITION BY LIST (home_state) (VALUES (‘CA’, ‘WA’), VALUES (‘TX’, ‘OK’));

RangeManually divide data across servers based on continuous criteria.

PARTITION BY RANGE (date) (VALUES BETWEEN ‘2008-01-01’ AND ‘2008-12-31’, VALUES BETWEEN ‘2009-01-01’ AND ‘2009-12-30’);

ExpressionFully dynamic division of data based on function execution. Can use UDFs.

PARTITION BY (MONTH(date));

Types Of Partitioning In SQLFire.

Page 17: SQLFire at Strata 2012

How does it scale for queries?

N = 2 4 6 8 10

200k

420k

604k

790k

1M

Partitioned TablePK queries per second

(1kb Rows)

Number Of Servers

# Clients = 2*N

200

400

600

800

1000

Page 18: SQLFire at Strata 2012

How does it scale for updates?

N = 2 4 6 8 10

220k

490k

750k

950k

1.3M

Partitioned TableUpdates Per Second

(3 columns)

Number Of Servers

85% < 1mslatency # Clients = 2*N

200

400

600

800

1000

Page 19: SQLFire at Strata 2012

1

2

3

4

5

6

7

8

910

Redundancy Increases Availability. CREATE TABLE sales

(product_id int, store_id int,

price float)

PARTITION BY

COLUMN (product_id);

REDUNDANCY 1;

SQLFire Node 1

Partition 2*

SQLFire Node 2

Partition 1*

Replica

Replica

salesPartition 1

Partition 2All data is availableif Node 1 fails.

Page 20: SQLFire at Strata 2012

Partitioning and redundancy

Redundancy = 2(but tunable)

Single ownerfor any row at point

in time

Replication can be “rack aware”

Replication is synchronous but done

in parallel

Page 21: SQLFire at Strata 2012

SQLFire: Derp-Proof Database

• Instant failover at protocol level• Apps retain their connections• Data remains available Was that cord

supposed to be in the wall?

Page 22: SQLFire at Strata 2012

Select * from Customer c, Sales swhere c.cust_id = s.cust_id

and c.cust_id ='xxx';

• With Hash partitioning the join logic executes everywhere

• Distributed joins are expensive and inhibit scaling– joins across distributed nodes could involve distributed locks and

potentially a lot of intermediate data transfer across nodes

Linearly scaling joins

Page 23: SQLFire at Strata 2012

Designer thinks about how data maps to partitions– The main idea is to:

1) minimize excessive data distribution by keeping the most frequently accessed and joined data collocated on partitions

2) Collocate transaction working set on partitions so complex 2-phase commits/paxos commit is eliminated or minimized.

Read Pat Helland’s “Life beyond Distributed Transactions” and the Google MegaStore paper

Partition Aware DB Design

Page 24: SQLFire at Strata 2012

1

2

3

4

5

6

7

8

910

Collocate Data For Fast Joins. CREATE TABLE sales

(product_id int, store_id int,

price float)

PARTITION BY

COLUMN (product_id);

COLOCATE WITH customers;

SQLFire Node 1

Customer 1 Sales

SQLFire Node 2

Customer 2 Sales

Replica

Replica

Customer 1

Customer 2SQLFire can jointables withoutnetwork hops.

C1

C2

Related data placedon the same node.

Page 25: SQLFire at Strata 2012

SQLFire Node 1

Customer 1 Sales

SQLFire Node 2

Customer 2 Sales

Replica

Replica

Customer 1

Customer 2SQLFire can jointables withoutnetwork hops.

C1

C2

Related data placedon the same node.

Select * fromCustomer c, Sales swhere c.cust_id =

s.cust_id and c.cust_id =‘C1';

Collocate Data For Fast Joins.

Query pruned to node 1

Page 26: SQLFire at Strata 2012

SQLFire Node 1

Customer 1 Sales

SQLFire Node 2

Customer 2 Sales

Replica

Replica

Customer 1

Customer 2

In parallel, each node does hash join, aggregation locally

C1

C2

Related data placedon the same node.

SELECT sum(value) AS total FROM sales s, customer c

WHERE s.cust_id = c.cust_id and c.state = ‘CA’ GROUP By

cust_id ORDER By total

Collocate Data For Fast Joins.

Parallel scatter-gather

Page 27: SQLFire at Strata 2012

Dynamic Data Colocation

Redundancy = 2Single master forany entity group

Dynamic entitygroup formation

Based on foreignkey relationships

Page 28: SQLFire at Strata 2012

Data-Aware Stored Procs• Procedure execution routed to the

data• Full scaled-out execution• Highly available• Use pure Java to access/store data• Demo later on Like Map/Reduce But Different

Page 29: SQLFire at Strata 2012

1

2

3

4

5

6

7

8

910

Scaling Stored Procedures CALL maxSales(arguments)

ON TABLE salesWHERE (Location in (‘CA’,’WA’,’OR')

WITH RESULT PROCESSOR

maxSalesReducer

SQLFire uses data-aware routing to

route processing tothe data.

maxSales on local data

maxSales on local data

maxSalesReducer

Result Processorsgive map/reduce

functionality.

Page 30: SQLFire at Strata 2012

Scalability: ConsistencyWith Transactions And Without

- Row updates always atomic and isolated

- FIFO consistency

- Distributed transactions with 1-phase commit- Coordinator per

node- Eager locking + Fail

fastAssumes:Most x-actions small in space and timeWrite-write conflicts rare

Page 31: SQLFire at Strata 2012

• Parallel log structured storage

• Each partition writes in parallel

• Backups write to disk also– Increase reliability

against h/w loss

Scalability: High performance persistence

MemoryTables

Append only Operation logs

OS Buffers

LOG Compressor

Record1

Record2

Record3

Record1

Record2

Record3

MemoryTables

Append only Operation logs

OS Buffers

LOG Compressor

Record1

Record2

Record3

Record1

Record2

Record3

Page 32: SQLFire at Strata 2012

Demos!

Page 33: SQLFire at Strata 2012

Demo: Distributed Procedures

• Autocorrelation of time series

• All pure Java scaled-out• Tolerant of node failures• Using SuanShu Java

library

Page 34: SQLFire at Strata 2012

Demo: Caching• Read-only or…• Read-through / Write-

behind• Cache analytics results• Skip the ETL

Page 35: SQLFire at Strata 2012

http://vmware.com/go/sqlfireTry SQLFire Today!Free for developer use to 3 nodes.

Download:

Forum:http://vmware.com/vmtn/appplatform/vfabric_sqlfireGot questions? Get answers.

:sigh:Just Google it

Twitter: @vFabricSQLFire, @cshanklin, @jagsrI need more followers to get a promotion.

Page 36: SQLFire at Strata 2012

Demo Details

Page 37: SQLFire at Strata 2012

Scaling Stored Procs (1)

Insert Timeseries

Ubuntu(database)

Page 38: SQLFire at Strata 2012

Scaling Stored Procs (2)

Insert Timeseries

Compute Autocorrelations

Complete

Ubuntu(database)

Page 39: SQLFire at Strata 2012

Scaling Stored Procs (3)

Insert Timeseries

Compute Autocorrelations

Complete

Ubuntu(database)

Compute Autocorrelations

Complete

Ubuntu(database)

Compute Autocorrelations

Complete

Ubuntu(database)

Rebalance Rebalance

All usingstandard SQL

APIs

Page 40: SQLFire at Strata 2012

Caching Analytics (1)

Continuous BatchProcessing

Page 41: SQLFire at Strata 2012

Caching Analytics (2)

Low latency

Ubuntu(database)

Continuous BatchProcessing

In-memorycaching

JDBC rowloader

Page 42: SQLFire at Strata 2012

Caching Analytics (3)

Low latency

Ubuntu(database)

Continuous BatchProcessing

In-memorycaching

Scalable +Tunable Cache

Policies

Page 43: SQLFire at Strata 2012

• LRU Count– Overflow to disk or destroy.

• Time To Live– Counter ticks as soon as the row is loaded.

• Idle Time– Destroy rows when they are not accessed for a while.

• Specified in CREATE TABLE syntax.

Caching Policies