SQLFire at Strata 2012

SQLFire

Fast meets scalable in VMware‘s NewSQL database.

Strata 2012

Jags Ramnarayan – Chief Architect, SQLFireCarter Shanklin – Product Manager, SQLFire

Sponsor Sessions Suck• We Promise To:

– Keep it relevant.– Keep it technical.– Keep it entertaining.

Speed Matters

Users demand fast applications and fast websites.The database is the hardest thing to scale.

Speed• In-memory for maximum

speed and minimum latency.

SQLFire: Speed, Scale, SQLScale

• Horizontally scalable.• Add or remove nodes at

any time for more capacity or availability.

SQL• Familiar SQL interface.• SQL 92 compliant.• JDBC and ADO.NET

interfaces.

How does SQLFire get scale and speed?

• Horizontal scaleout + Dynamic partitioning– Appears to app as single database

• Tunable consistency– Including asynchronous global distribution

• In-memory architecture– “Memory is the new disk, disk the new tape”

Diverging needs for online and analytics

Online Layer

Analytics Layer

User Concurren

cy

Update Rate

Query Richness

Data Volume

SQLFire: What does it really look like?

1

2

3

4

5

6

7

8

910

SQLFire Tables Are Replicated By Default. CREATE TABLE sales

(product_id int, store_id int,

price float);

SQLFire Node 1

SQLFire Node 2

Replica

Replica

sales

Best for small andfrequently accessed

data.

1

2

3

4

5

6

7

8

910

Partitioned Tables Are Split Among Members. CREATE TABLE sales


price float)

PARTITION BY

COLUMN (product_id);

SQLFire Node 1

SQLFire Node 2

Replica

Replica

sales Partition 1

Partition 2Best for largedata sets.

Type Purpose Example

Hash Partitioning (Default)

Built-in hashing algorithm splits data at random across available servers.

PARTITION BY COLUMN (customer_id);

ListManually divide data across servers based on discrete criteria.

PARTITION BY LIST (home_state) (VALUES (‘CA’, ‘WA’), VALUES (‘TX’, ‘OK’));

RangeManually divide data across servers based on continuous criteria.

PARTITION BY RANGE (date) (VALUES BETWEEN ‘2008-01-01’ AND ‘2008-12-31’, VALUES BETWEEN ‘2009-01-01’ AND ‘2009-12-30’);

ExpressionFully dynamic division of data based on function execution. Can use UDFs.

PARTITION BY (MONTH(date));

Types Of Partitioning In SQLFire.

How does it scale for queries?

N = 2 4 6 8 10

200k

420k

604k

790k

1M

Partitioned TablePK queries per second

(1kb Rows)

Number Of Servers

# Clients = 2*N

200

400

600

800

1000

How does it scale for updates?

N = 2 4 6 8 10

220k

490k

750k

950k

1.3M

Partitioned TableUpdates Per Second

(3 columns)

Number Of Servers

85% < 1mslatency # Clients = 2*N

200

400

600

800

1000

1

2

3

4

5

6

7

8

910

Redundancy Increases Availability. CREATE TABLE sales


price float)

PARTITION BY


REDUNDANCY 1;

SQLFire Node 1

Partition 2*

SQLFire Node 2

Partition 1*

Replica

Replica

salesPartition 1

Partition 2All data is availableif Node 1 fails.

Partitioning and redundancy

Redundancy = 2(but tunable)

Single ownerfor any row at point

in time

Replication can be “rack aware”

Replication is synchronous but done

in parallel

SQLFire: Derp-Proof Database

• Instant failover at protocol level• Apps retain their connections• Data remains available Was that cord

supposed to be in the wall?

Select * from Customer c, Sales swhere c.cust_id = s.cust_id

and c.cust_id ='xxx';

• With Hash partitioning the join logic executes everywhere

• Distributed joins are expensive and inhibit scaling– joins across distributed nodes could involve distributed locks and

potentially a lot of intermediate data transfer across nodes

Linearly scaling joins

Designer thinks about how data maps to partitions– The main idea is to:

1) minimize excessive data distribution by keeping the most frequently accessed and joined data collocated on partitions

2) Collocate transaction working set on partitions so complex 2-phase commits/paxos commit is eliminated or minimized.

Read Pat Helland’s “Life beyond Distributed Transactions” and the Google MegaStore paper

Partition Aware DB Design

1

2

3

4

5

6

7

8

910

Collocate Data For Fast Joins. CREATE TABLE sales


price float)

PARTITION BY


COLOCATE WITH customers;

SQLFire Node 1

Customer 1 Sales

SQLFire Node 2

Customer 2 Sales

Replica

Replica

Customer 1

Customer 2SQLFire can jointables withoutnetwork hops.

C1

C2

Related data placedon the same node.

SQLFire Node 1

Customer 1 Sales

SQLFire Node 2

Customer 2 Sales

Replica

Replica

Customer 1

Customer 2SQLFire can jointables withoutnetwork hops.

C1

C2


Select * fromCustomer c, Sales swhere c.cust_id =

s.cust_id and c.cust_id =‘C1';

Collocate Data For Fast Joins.

Query pruned to node 1

SQLFire Node 1

Customer 1 Sales

SQLFire Node 2

Customer 2 Sales

Replica

Replica

Customer 1

Customer 2

In parallel, each node does hash join, aggregation locally

C1

C2


SELECT sum(value) AS total FROM sales s, customer c

WHERE s.cust_id = c.cust_id and c.state = ‘CA’ GROUP By

cust_id ORDER By total

Collocate Data For Fast Joins.

Parallel scatter-gather

Dynamic Data Colocation

Redundancy = 2Single master forany entity group

Dynamic entitygroup formation

Based on foreignkey relationships

Data-Aware Stored Procs• Procedure execution routed to the

data• Full scaled-out execution• Highly available• Use pure Java to access/store data• Demo later on Like Map/Reduce But Different

1

2

3

4

5

6

7

8

910

Scaling Stored Procedures CALL maxSales(arguments)

ON TABLE salesWHERE (Location in (‘CA’,’WA’,’OR')

WITH RESULT PROCESSOR

maxSalesReducer

SQLFire uses data-aware routing to

route processing tothe data.

maxSales on local data

maxSales on local data

maxSalesReducer

Result Processorsgive map/reduce

functionality.

Scalability: ConsistencyWith Transactions And Without

- Row updates always atomic and isolated

- FIFO consistency

- Distributed transactions with 1-phase commit- Coordinator per

node- Eager locking + Fail

fastAssumes:Most x-actions small in space and timeWrite-write conflicts rare

• Parallel log structured storage

• Each partition writes in parallel

• Backups write to disk also– Increase reliability

against h/w loss

Scalability: High performance persistence

MemoryTables

Append only Operation logs

OS Buffers

LOG Compressor

Record1

Record2

Record3

Record1

Record2

Record3

MemoryTables

Append only Operation logs

OS Buffers

LOG Compressor

Record1

Record2

Record3

Record1

Record2

Record3

Demos!

Demo: Distributed Procedures

• Autocorrelation of time series

• All pure Java scaled-out• Tolerant of node failures• Using SuanShu Java

library

Demo: Caching• Read-only or…• Read-through / Write-

behind• Cache analytics results• Skip the ETL

http://vmware.com/go/sqlfireTry SQLFire Today!Free for developer use to 3 nodes.

Download:

Forum:http://vmware.com/vmtn/appplatform/vfabric_sqlfireGot questions? Get answers.

:sigh:Just Google it

Twitter: @vFabricSQLFire, @cshanklin, @jagsrI need more followers to get a promotion.

Demo Details

Scaling Stored Procs (1)

Insert Timeseries

Ubuntu(database)


Insert Timeseries

Compute Autocorrelations

Complete

Ubuntu(database)


Insert Timeseries


Complete

Ubuntu(database)


Complete

Ubuntu(database)


Complete

Ubuntu(database)

Rebalance Rebalance

All usingstandard SQL

APIs

Caching Analytics (1)

Continuous BatchProcessing


Low latency

Ubuntu(database)


In-memorycaching

JDBC rowloader


Low latency

Ubuntu(database)


In-memorycaching

Scalable +Tunable Cache

Policies

• LRU Count– Overflow to disk or destroy.

• Time To Live– Counter ticks as soon as the row is loaded.

• Idle Time– Destroy rows when they are not accessed for a while.

• Specified in CREATE TABLE syntax.

Caching Policies

SQLFire at Strata 2012

Technology

Transcript of SQLFire at Strata 2012