C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

31
Time is Money Financial Time Series Jake Luciani and Carl Yeksigian BlueMountain Capital

description

This session will focus on our approach to building a scalable TimeSeries database for financial data using Cassandra 1.2 and CQL3. We will discuss how we deal with a heavy mix of reads and writes as well as how we monitor and track performance of the system.

Transcript of C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

Page 1: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

Time is Money

Financial Time Series Jake Luciani and Carl Yeksigian

BlueMountain Capital

Page 2: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

About this talk Part 1: Our use case and architecture Part 2: Our deployment and tuning Part 3: Q&A

Page 3: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

Know your problem. 1000s of consumers ..creating and reading data as fast as possible ..consistent to all readers ..and handle ad-hoc user queries ..quickly ..across data centers.

Page 4: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

Know your data.

AAPL price

MSFT price

Page 5: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

Know your queries.

Time Series Query

Start, End, Periodicity defines query

1 minute periods

Page 6: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

Know your queries.

Cross Section Query

As Of time defines the query

As Of Time (11am)

Page 7: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

Know your queries. Cross sections are random Storing for all possible Cross Sections is not possible. We also support bi-temporality

Page 8: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

Let's optimize for Time Series.

Page 9: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

CREATE TABLE tsdata ( id blob, property string, asof_ticks bigint, knowledge_ticks bigint, value blob, PRIMARY KEY(id,property,asof_ticks,knowledge_ticks)

) WITH COMPACT STORAGE AND CLUSTERING ORDER BY(asof_ticks DESC, knowledge_ticks DESC)

Data Model (CQL 3)

Page 10: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

SELECT * FROM tsdata WHERE id = 0x12345 AND property = 'lastPrice' AND asof_ticks >= 1234567890 AND asof_ticks <= 2345678901

CQL3 Queries: Time Series

Page 11: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

CQL3 Queries: Cross Section SELECT * FROM tsdata WHERE id = 0x12345 AND property = 'lastPrice' AND asof_ticks = 1234567890 AND knowledge_ticks < 2345678901 LIMIT 1

Page 12: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

A Service, not an app

C*

Olympus

Olym

pus

Olympus

Oly

mpu

s

App

App

App

App

App

App

App

App

App

App

Fat Client

Olympus Thrift Service Olympus Thrift Service

Page 13: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

Complex Value Types Not every value is a double Some values belong together (Bid and Ask should always come back together) Thrift structures as values Typed, extensible schema Union types give us a way to deserialize any type

Page 14: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

Ad-hoc querying UI

Page 15: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

But that's the easy part...

(queue transition)

Page 16: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

Scaling... The first rule of scaling is you do not just turn everything to 11.

Page 17: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

Scaling... Step 1 - Fast Machines for your workload Step 2 - Avoid Java GC for your workload Step 3 - Tune Cassandra for your workload Step 4 - Prefetch and cache for your workload

Page 18: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

Can't fix what you can't measure Riemann (http://riemann.io) Easily push application and system metrics into a single system We push 6k metrics per second to a single Riemann instance

Page 19: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

Metrics: Riemann Yammer Metrics with Riemann

https://gist.github.com/carlyeks/5199090

Page 20: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

Metrics: Riemann Push stream based metrics library Riemann Dash for Why is it Slow? Graphite for Why was it Slow?

Page 21: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

VisualVM: The greatest tool EVER Many useful plugins... Just start jstatd on each server and go!

Page 22: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

Scaling Reads: Machines SSDs for hot data JBOD config As many cores as possible (> 16) 10GbE network Bonded network cards Jumbo frames

Page 23: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

JBOD is a lifesaver SSDs are great until they aren't anymore JBOD allowed passive recovery in the face of simultaneous disk failures (SSDs had a bad firmware)

Page 24: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

Scaling Reads: Cassandra Changes we've made: • Configuration • Compaction • Compression

Page 25: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

Leveled Compaction Wide rows means data can be spread across a huge number of SSTables Leveled Compaction puts a bound on the worst case (*) Fewer SSTables to read means lower latency, as shown below; orange SSTables get read

L0

L1

L2

L3

L4

L5

* In Theory

Page 26: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

Leveled Compaction: Breaking Bad Under high write load, forced to read all of the L0 files

L0

L1

L2

L3

L4

L5

Page 27: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

Hybrid Compaction: Breaking Better Size Tiering Level 0 On by default in 2.0

L0

L1

L2

L3

L4

L5

{ Hybrid

Compaction

Size Tiered

Leveled

Page 28: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

Overlapping Compaction Instead of forcing a combination of L0 files with L1, we can just push up files This allows a higher level of concurrency in compactions We still know the SSTables that might contain the keys We can force a proper compaction at any configurable level

L0

L1

L2

L3

L4

L5

Page 29: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

C optimized library Read path needs to be fast for our workload CRC check, composite comparison eat a lot of cycles CRC is implemented on chip for some architectures (why not use it?) We want to move some of the operations into a JNI library to reduce latency and improve throughput

Page 30: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

Current Stats 16 nodes 2 Data Centers Replication Factor 6 200k Writes/sec at EACH_QUORUM 150k Reads/sec at LOCAL_QUORUM > 30 Million time series > 15 Billion points 10 TB on disk (compressed) Read Latency 50%/95% is 1ms/5ms

Page 31: C* Summit 2013: Time is Money Jake Luciani and Carl Yeksigian

Questions? Thank you! @tjake and @carlyeks