PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016
Transcript of PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016
![Page 1: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016](https://reader035.fdocuments.in/reader035/viewer/2022070516/586f75e81a28ab10258b6251/html5/thumbnails/1.jpg)
PlayStation and Cassandra Streams
Cassandra Summit 2016
![Page 2: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016](https://reader035.fdocuments.in/reader035/viewer/2022070516/586f75e81a28ab10258b6251/html5/thumbnails/2.jpg)
Who are we? Alexander Filipchik (PSN: LaserToy)
Principal Software Engineer at Sony Interactive Entertainment
Dustin Pham (PSN: quibfan)Principal Software Engineer at Sony Interactive Entertainment
![Page 3: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016](https://reader035.fdocuments.in/reader035/viewer/2022070516/586f75e81a28ab10258b6251/html5/thumbnails/3.jpg)
Agenda• Multi-regional deployment problem• Proving C* replications will work for us• Designing a Test System• Cassandra Streams as a result
![Page 4: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016](https://reader035.fdocuments.in/reader035/viewer/2022070516/586f75e81a28ab10258b6251/html5/thumbnails/4.jpg)
How it all startedWe want
Multiple regions and always on
![Page 5: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016](https://reader035.fdocuments.in/reader035/viewer/2022070516/586f75e81a28ab10258b6251/html5/thumbnails/5.jpg)
A lot of unknowns• Will it work?• Will performance degrade?• How eventual is multiregion eventual
consistency?• Will we hit any roadblocks?• Well, how many roadblocks will we hit?
![Page 6: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016](https://reader035.fdocuments.in/reader035/viewer/2022070516/586f75e81a28ab10258b6251/html5/thumbnails/6.jpg)
What did we know?Netflix is doing it and they actually tested it:• They wrote 1M records in one region of a
multi-region cluster• 500ms later read in other clusters was
initiated• All records were successfully read
![Page 7: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016](https://reader035.fdocuments.in/reader035/viewer/2022070516/586f75e81a28ab10258b6251/html5/thumbnails/7.jpg)
Well…Some questions to answer:
Should we just trust the Netflix’s results and just replicate data and see what happens?
Is their experiment applicable to our situation?
Can we do better?
![Page 8: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016](https://reader035.fdocuments.in/reader035/viewer/2022070516/586f75e81a28ab10258b6251/html5/thumbnails/8.jpg)
![Page 9: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016](https://reader035.fdocuments.in/reader035/viewer/2022070516/586f75e81a28ab10258b6251/html5/thumbnails/9.jpg)
Some wants:• Track replication latencies between regions• Use close to production traffic (both load and data)• Write/read in all the regions in the same time• To be able to simulate different disruptions• To have a reusable system we can use to test our
future Cassandra deployments• Do it in one month
![Page 10: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016](https://reader035.fdocuments.in/reader035/viewer/2022070516/586f75e81a28ab10258b6251/html5/thumbnails/10.jpg)
Tracking latenciesTo track latencies we need to record some information when message arrives on a specific node:
17:06:52 Received from DC1, R1: update KS Test CF test K 1000 C hello Size 76 Timestamp 1456333612729000 at 1456333612735000. Diff is: 6000
17:06:53 Received from DC2, R1: update KS Test CF test K 1000 C hello Size 76 Timestamp 1456333613344000 at 1456333613345000. Diff is: 10000
![Page 11: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016](https://reader035.fdocuments.in/reader035/viewer/2022070516/586f75e81a28ab10258b6251/html5/thumbnails/11.jpg)
Using real data• We need something we can use as a buffer so
we can store prod size data in there and then replay when we want
![Page 12: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016](https://reader035.fdocuments.in/reader035/viewer/2022070516/586f75e81a28ab10258b6251/html5/thumbnails/12.jpg)
Results
Bad Idea
We need a way to store all the latencies and something to analyze the results
![Page 13: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016](https://reader035.fdocuments.in/reader035/viewer/2022070516/586f75e81a28ab10258b6251/html5/thumbnails/13.jpg)
Putting everything together.
![Page 14: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016](https://reader035.fdocuments.in/reader035/viewer/2022070516/586f75e81a28ab10258b6251/html5/thumbnails/14.jpg)
Preparation
Exporter
Thrift
CQL
Thrift
JSON
Region 1
Region 2
Ingester
Ingester
![Page 15: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016](https://reader035.fdocuments.in/reader035/viewer/2022070516/586f75e81a28ab10258b6251/html5/thumbnails/15.jpg)
Test
Read/Write Loader
Region 1
Read/Write Loader
Region 2
![Page 16: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016](https://reader035.fdocuments.in/reader035/viewer/2022070516/586f75e81a28ab10258b6251/html5/thumbnails/16.jpg)
Analysis
![Page 17: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016](https://reader035.fdocuments.in/reader035/viewer/2022070516/586f75e81a28ab10258b6251/html5/thumbnails/17.jpg)
How did we extract latencies?
![Page 18: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016](https://reader035.fdocuments.in/reader035/viewer/2022070516/586f75e81a28ab10258b6251/html5/thumbnails/18.jpg)
Just injected code here and there
Messaging Service Keyspace CommitLog Memtable Etc.
Fire Async event
Store Context
info
Use
Write
StorageProxy
![Page 19: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016](https://reader035.fdocuments.in/reader035/viewer/2022070516/586f75e81a28ab10258b6251/html5/thumbnails/19.jpg)
Example or results
10
100
1000
10000
100000
1000000
10000000Two DC connection cut-off and recovery ( latency in logarithmic scale)
Pct95 Pct99
Pct999 MaxLag
![Page 20: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016](https://reader035.fdocuments.in/reader035/viewer/2022070516/586f75e81a28ab10258b6251/html5/thumbnails/20.jpg)
Looking at the Bigger Picture
![Page 21: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016](https://reader035.fdocuments.in/reader035/viewer/2022070516/586f75e81a28ab10258b6251/html5/thumbnails/21.jpg)
What now?• The information gathered through tests were
extremely useful but also not easily reachable in Cassandra’s current state
• Could we somehow ’tap’ off of Cassandra’s data streams?
![Page 22: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016](https://reader035.fdocuments.in/reader035/viewer/2022070516/586f75e81a28ab10258b6251/html5/thumbnails/22.jpg)
Cassandra streams
queues logs metrics
![Page 23: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016](https://reader035.fdocuments.in/reader035/viewer/2022070516/586f75e81a28ab10258b6251/html5/thumbnails/23.jpg)
Why????• Why not use triggers?• Why not put data routing ahead of
Cassandra?• Wouldn’t this cause a performance impact?• Wouldn’t this result in data bloat somewhere
else?
![Page 24: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016](https://reader035.fdocuments.in/reader035/viewer/2022070516/586f75e81a28ab10258b6251/html5/thumbnails/24.jpg)
Knowing what happens at different points can power different use cases
message storage keyspace Commitlog memtable etc111011010110
001001000100
111011010110001001000100
111011010110001001000100
111011010110001001000100
111011010110001001000100
111011010110001001000100
111011010110001001000100
![Page 25: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016](https://reader035.fdocuments.in/reader035/viewer/2022070516/586f75e81a28ab10258b6251/html5/thumbnails/25.jpg)
Use Cases• Building personalized search indices• In-place Migrations at Data tier level• Cache invalidation• Building analytic views• Data into read optimized views (transformations)• Smart backups• Disabling Hints, and provide alternative mechanisms• Provide more failure handling possibilities• Production level tests (stress tests)
![Page 26: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016](https://reader035.fdocuments.in/reader035/viewer/2022070516/586f75e81a28ab10258b6251/html5/thumbnails/26.jpg)
Tap flow: In-place Migration
message storage keyspace Commitlog memtable etc111011010110
001001000100
111011010110001001000100
consumers
Read Schema ATransformWrite Schema B
![Page 27: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016](https://reader035.fdocuments.in/reader035/viewer/2022070516/586f75e81a28ab10258b6251/html5/thumbnails/27.jpg)
Tap flow: Alternative Failure
message storage keyspace Commitlog memtable etc111011010110
001001000100
111011010110001001000100
Failure!
Replay Log
Hints causing cassandra to die faster
![Page 28: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016](https://reader035.fdocuments.in/reader035/viewer/2022070516/586f75e81a28ab10258b6251/html5/thumbnails/28.jpg)
Tap flow: Production load test
message storage keyspace Commitlog memtable etc111011010110
001001000100
111011010110001001000100
consumers
Formalizing the previous Cassandra multi regional latency tests into the ‘Streams’ framework
![Page 29: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016](https://reader035.fdocuments.in/reader035/viewer/2022070516/586f75e81a28ab10258b6251/html5/thumbnails/29.jpg)
High level framework• Cassandra configuration to enable ’Streams’ per
keyspace– Tap hooks (+after StorageProxy => topic)– Sampling/Throttling capability/circuit breaking– Request Log mode (not recommended) / Kafka mode
• Common interfaces for consumers with common reference implementations
![Page 30: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016](https://reader035.fdocuments.in/reader035/viewer/2022070516/586f75e81a28ab10258b6251/html5/thumbnails/30.jpg)
Still a W.I.P.• The ‘Cassandra Streams’ for our Cassandra clusters is still a W.I.P.
and used only for measurement/analysis• Introducing a tap off of the write path introduces a new set of
complexity– Consistency– Paxos– Etc
• However, depending on use-case, it is a useful tool that can be enabled & disabled via configuration
![Page 31: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016](https://reader035.fdocuments.in/reader035/viewer/2022070516/586f75e81a28ab10258b6251/html5/thumbnails/31.jpg)
PlayStation is hiring:
hackitects.com