Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016
Transcript of Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016
![Page 1: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/1.jpg)
Cold Storage that isn’t glacialJoshua Hollander, Principal Software Engineer
ProtectWise
1 / 41
![Page 2: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/2.jpg)
Who are we and what do we do?
2 / 41
![Page 3: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/3.jpg)
Who are we and what do we do?We record full fidelity Network Data
2 / 41
![Page 4: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/4.jpg)
Who are we and what do we do?We record full fidelity Network Data
Every network conversation (NetFlow)
2 / 41
![Page 5: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/5.jpg)
Who are we and what do we do?We record full fidelity Network Data
Every network conversation (NetFlow)
Full fidelity packet data (PCAP)
2 / 41
![Page 6: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/6.jpg)
Who are we and what do we do?We record full fidelity Network Data
Every network conversation (NetFlow)
Full fidelity packet data (PCAP)
Searchable detailed protocol data for:
HTTP
DNS
DHCP
Files
Security Events (IDS)
and more...
2 / 41
![Page 7: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/7.jpg)
Who are we and what do we do?We record full fidelity Network Data
Every network conversation (NetFlow)
Full fidelity packet data (PCAP)
Searchable detailed protocol data for:
HTTP
DNS
DHCP
Files
Security Events (IDS)
and more...
We analyze all that data and detect threats others can't see
2 / 41
![Page 8: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/8.jpg)
Network data piles up fast!
3 / 41
![Page 9: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/9.jpg)
Network data piles up fast!Over 300 C* servers in production
Over 1200 servers in EC2
Over 150TB in C*
About 90TB of SOLR indexes
100TB of cold storage data
2 PB of PCAP
3 / 41
![Page 10: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/10.jpg)
Network data piles up fast!Over 300 C* servers in production
Over 1200 servers in EC2
Over 150TB in C*
About 90TB of SOLR indexes
100TB of cold storage data
2 PB of PCAP
And we are growing rapidly!
3 / 41
![Page 11: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/11.jpg)
So what?
4 / 41
![Page 12: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/12.jpg)
So what?All those servers cost a lot of money
4 / 41
![Page 13: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/13.jpg)
Right sizing our AWS bill
5 / 41
![Page 14: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/14.jpg)
Right sizing our AWS bill
This is all time series data:
Lots of writes/reads to recent dataSome reads and very few writes to older data
5 / 41
![Page 15: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/15.jpg)
Right sizing our AWS bill
This is all time series data:
Lots of writes/reads to recent data
Some reads and very few writes to older data
So just move all that older, cold data, to cheaper storage.
How hard could that possibly be?
5 / 41
![Page 16: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/16.jpg)
Problem 1:Data distributed evenly across all these expensive servers
Using Size Tiered CompactionCan't just migrate old SSTables to new cheap servers
6 / 41
![Page 17: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/17.jpg)
Problem 1:Data distributed evenly across all these expensive servers
Using Size Tiered CompactionCan't just migrate old SSTables to new cheap servers
Solution: use Date Tiered Compaction?
We update old data regularlyWhat SSTables can you safely migrate?
Result:
6 / 41
![Page 18: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/18.jpg)
Problem 1:Data distributed evenly across all these expensive servers
Using Size Tiered CompactionCan't just migrate old SSTables to new cheap servers
Solution: use Date Tiered Compaction?
We update old data regularlyWhat SSTables can you safely migrate?
Result:
6 / 41
![Page 19: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/19.jpg)
Problem 2: DSE/SOLR re-index takes forever!
We have nodes with up to 300GB of SOLR indexes
Bootstraping a new node requires re-index after streamingRe-index can take a week or more!!!At that pace we simply cannot bootstrap new nodes
7 / 41
![Page 20: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/20.jpg)
Problem 2: DSE/SOLR re-index takes forever!
We have nodes with up to 300GB of SOLR indexes
Bootstraping a new node requires re-index after streaming
Re-index can take a week or more!!!
At that pace we simply cannot bootstrap new nodes
Solution: Time sharded clusters in application code
Fanout searches for large time windows
Assemble results in code (using same algorithm SOLR does)
7 / 41
![Page 21: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/21.jpg)
Problem 2: DSE/SOLR re-index takes forever!
We have nodes with up to 300GB of SOLR indexes
Bootstraping a new node requires re-index after streaming
Re-index can take a week or more!!!
At that pace we simply cannot bootstrap new nodes
Solution: Time sharded clusters in application code
Fanout searches for large time windows
Assemble results in code (using same algorithm SOLR does)
Result:
¯\_( )_/¯
7 / 41
![Page 22: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/22.jpg)
What did we gain?
8 / 41
![Page 23: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/23.jpg)
What did we gain?
We can now migrate older timeshards to cheaper servers!
8 / 41
![Page 24: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/24.jpg)
What did we gain?
We can now migrate older timeshards to cheaper servers!
However:
Cold data servers are still too expensive
DevOps time suck is massive
Product wants us to store even more data!
8 / 41
![Page 25: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/25.jpg)
Throwing ideas at the wall
9 / 41
![Page 26: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/26.jpg)
Throwing ideas at the wall
We have this giant data warehouse, let's use that
Response time is too slow: 10 seconds to pull single record by ID!Complex ETL pipeline where latency measured in hoursData model is differentRead onlyReliability, etc, etc
9 / 41
![Page 27: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/27.jpg)
Throwing ideas at the wall
We have this giant data warehouse, let's use that
Response time is too slow: 10 seconds to pull single record by ID!Complex ETL pipeline where latency measured in hoursData model is differentRead onlyReliability, etc, etc
What about Elastic Search, HBase, Hive/Parquet, MongoDB, etc, etc?
9 / 41
![Page 28: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/28.jpg)
Throwing ideas at the wall
We have this giant data warehouse, let's use that
Response time is too slow: 10 seconds to pull single record by ID!Complex ETL pipeline where latency measured in hoursData model is differentRead onlyReliability, etc, etc
What about Elastic Search, HBase, Hive/Parquet, MongoDB, etc, etc?
Wait. Hold on! This Parquet thing is interesting...
9 / 41
![Page 29: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/29.jpg)
Parquet
10 / 41
![Page 30: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/30.jpg)
Parquet
Columnar
Projections are very efficientEnables vectorized execution
10 / 41
![Page 31: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/31.jpg)
Parquet
Columnar
Projections are very efficient
Enables vectorized execution
Compressed
Using Run Length Encoding
Throw Snappy, LZO, or LZ4 on top of that
10 / 41
![Page 32: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/32.jpg)
Parquet
Columnar
Projections are very efficient
Enables vectorized execution
Compressed
Using Run Length Encoding
Throw Snappy, LZO, or LZ4 on top of that
Schema
Files encode schema and other meta-data
Support exists for merging disparate schema amongst files
10 / 41
![Page 33: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/33.jpg)
Row Group
Horizontal grouping of columnsWithin each row group data is arranged by column in chunks
Column Chunk
Chunk of data for an individual columnUnit of parallelization for I/O
Page
Column chunks are divided up into pages for compression and encoding
Parquet: Some details
11 / 41
![Page 34: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/34.jpg)
So you have a nice file format...
Now what?
12 / 41
![Page 35: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/35.jpg)
Need to get data out of Cassandra
13 / 41
![Page 36: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/36.jpg)
Need to get data out of Cassandra
Spark seems good for that
13 / 41
![Page 37: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/37.jpg)
Need to get data out of Cassandra
Spark seems good for that
Need to put the data somewhere
13 / 41
![Page 38: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/38.jpg)
Need to get data out of Cassandra
Spark seems good for that
Need to put the data somewhere
S3 is really cheap and fairly well supported by Spark
13 / 41
![Page 39: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/39.jpg)
Need to get data out of Cassandra
Spark seems good for that
Need to put the data somewhere
S3 is really cheap and fairly well supported by Spark
Need to be able to query the data
13 / 41
![Page 40: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/40.jpg)
Need to get data out of Cassandra
Spark seems good for that
Need to put the data somewhere
S3 is really cheap and fairly well supported by Spark
Need to be able to query the data
Spark seems good for that too
13 / 41
![Page 41: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/41.jpg)
So we are using Spark and S3...
Now what?
14 / 41
![Page 42: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/42.jpg)
Lots and lots of files
Sizing parquet
15 / 41
![Page 43: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/43.jpg)
Lots and lots of files
Sizing parquet
Parquet docs recommend 1GB files for HDFS
15 / 41
![Page 44: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/44.jpg)
Lots and lots of files
Sizing parquet
Parquet docs recommend 1GB files for HDFS
For S3 the sweet spot appears to be 128 to 256MB
15 / 41
![Page 45: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/45.jpg)
Lots and lots of files
Sizing parquet
Parquet docs recommend 1GB files for HDFS
For S3 the sweet spot appears to be 128 to 256MB
We have terabytes of files
Scans take forever!
15 / 41
![Page 46: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/46.jpg)
Lots and lots of files
Sizing parquet
Parquet docs recommend 1GB files for HDFS
For S3 the sweet spot appears to be 128 to 256MB
We have terabytes of files
Scans take forever!
15 / 41
![Page 47: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/47.jpg)
PartitioningOur queries are always filtered by:
1. Customer2. Time
16 / 41
![Page 48: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/48.jpg)
PartitioningOur queries are always filtered by:
1. Customer2. Time
So we Partition by:
├── cid=X| ├── year=2015| └── year=2016| └── month=0| └── day=0| └── hour=0└── cid=Y
16 / 41
![Page 49: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/49.jpg)
PartitioningOur queries are always filtered by:
1. Customer2. Time
So we Partition by:
├── cid=X| ├── year=2015| └── year=2016| └── month=0| └── day=0| └── hour=0└── cid=Y
Spark understands and translates query filters to this folder structure
16 / 41
![Page 50: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/50.jpg)
Big Improvement!Now a customer can query a time range quickly
17 / 41
![Page 51: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/51.jpg)
Partitioning problems
18 / 41
![Page 52: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/52.jpg)
Partitioning problems
Customers generally ask questions such as:
"Over the last 6 months, how many times did I see IP X using protocol Y?"
18 / 41
![Page 53: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/53.jpg)
Partitioning problems
Customers generally ask questions such as:
"Over the last 6 months, how many times did I see IP X using protocol Y?"
"When did IP X not use port 80 for HTTP?"
18 / 41
![Page 54: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/54.jpg)
Partitioning problems
Customers generally ask questions such as:
"Over the last 6 months, how many times did I see IP X using protocol Y?"
"When did IP X not use port 80 for HTTP?"
"Who keeps scanning server Z for open SSH ports?"
18 / 41
![Page 55: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/55.jpg)
Partitioning problems
Customers generally ask questions such as:
"Over the last 6 months, how many times did I see IP X using protocol Y?"
"When did IP X not use port 80 for HTTP?"
"Who keeps scanning server Z for open SSH ports?"
Queries would take minutes.
18 / 41
![Page 56: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/56.jpg)
Queries spanning large time windowsselect count(*) from events where ip = '192.168.0.1' and cid = 1 and year = 2016
19 / 41
![Page 57: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/57.jpg)
Queries spanning large time windowsselect count(*) from events where ip = '192.168.0.1' and cid = 1 and year = 2016
├── cid=X| ├── year=2015| └── year=2016| |── month=0| | └── day=0| | └── hour=0| | └── 192.168.0.1_was_NOT_here.parquet| └── month=1| └── day=0| └── hour=0| └── 192.168.0.1_WAS_HERE.parquet└── cid=Y
19 / 41
![Page 58: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/58.jpg)
Problem #1Requires listing out all the sub-dirs for large time ranges.Remember S3 is not really a file systemSlow!
20 / 41
![Page 59: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/59.jpg)
Problem #1Requires listing out all the sub-dirs for large time ranges.Remember S3 is not really a file systemSlow!
Problem #2Pulling potentially thousands of files from S3.Slow and Costly!
20 / 41
![Page 60: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/60.jpg)
Solving problem #1
Put partition info and file listings in a db ala Hive.
21 / 41
![Page 61: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/61.jpg)
Solving problem #1
Put partition info and file listings in a db ala Hive.
Why not just use Hive?
21 / 41
![Page 62: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/62.jpg)
Solving problem #1
Put partition info and file listings in a db ala Hive.
Why not just use Hive?
Still not fast enoughAlso does not help with Problem #2
21 / 41
![Page 63: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/63.jpg)
DSE/SOLR to the Rescue!
22 / 41
![Page 64: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/64.jpg)
DSE/SOLR to the Rescue!
Store file meta data in SOLR
22 / 41
![Page 65: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/65.jpg)
DSE/SOLR to the Rescue!
Store file meta data in SOLR
Efficiently skip elements of partition hierarchy!
select count(*) from events where month = 6
22 / 41
![Page 66: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/66.jpg)
DSE/SOLR to the Rescue!
Store file meta data in SOLR
Efficiently skip elements of partition hierarchy!
select count(*) from events where month = 6
Avoids pulling all meta in Spark driver
1. Get partition counts and schema info from SOLR driver-side
2. Submit SOLR RDD to cluster
3. Run mapPartitions on SOLR RDD and turn into Parquet RDDs
As an optimization for small file sets we pull the SOLR rows driver side
22 / 41
![Page 67: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/67.jpg)
Boxitecture
23 / 41
![Page 68: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/68.jpg)
Performance gains!Source Scan/Filter Time
SOLR < 100 milliseconds
Hive > 5 seconds
S3 directory listing > 5 minutes!!!
24 / 41
![Page 69: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/69.jpg)
Problem #1 Solved!
25 / 41
![Page 70: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/70.jpg)
Problem #1 Solved!What about Problem #2?
25 / 41
![Page 71: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/71.jpg)
Solving problem #2
Still need to pull potentially thousands of files to answer our query!
26 / 41
![Page 72: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/72.jpg)
Solving problem #2
Still need to pull potentially thousands of files to answer our query!
Can we partition differently?
26 / 41
![Page 73: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/73.jpg)
Solving problem #2
Still need to pull potentially thousands of files to answer our query!
Can we partition differently?
Field Cardinality Result
Protocol Medium (9000) ❌
Port High (65535) ❌❌
IP Addresses Astronomically High (3.4 undecillion) ❌❌❌
Nope! Nope! Nope!
26 / 41
![Page 74: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/74.jpg)
Searching High Cardinality Data
27 / 41
![Page 75: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/75.jpg)
Searching High Cardinality Data
Assumptions
1. Want to reduce # of files pulled for a given query
2. Cannot store all exact values in SOLR
3. We are okay with a few false positives
27 / 41
![Page 76: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/76.jpg)
Searching High Cardinality Data
Assumptions
1. Want to reduce # of files pulled for a given query
2. Cannot store all exact values in SOLR
3. We are okay with a few false positives
This sounds like a job for...
27 / 41
![Page 77: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/77.jpg)
Bloom Filters!
28 / 41
![Page 78: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/78.jpg)
Towards a "Searchable" Bloom Filter
29 / 41
![Page 79: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/79.jpg)
Towards a "Searchable" Bloom Filter
Normal SOLR index looks vaguely like
Term Doc IDs
192.168.0.1 1,2,3,5,8,13...
10.0.0.1 2,4,6,8...
8.8.8.8 1,2,3,4,5,6
29 / 41
![Page 80: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/80.jpg)
Towards a "Searchable" Bloom Filter
Normal SOLR index looks vaguely like
Term Doc IDs
192.168.0.1 1,2,3,5,8,13...
10.0.0.1 2,4,6,8...
8.8.8.8 1,2,3,4,5,6
Terms are going to grow out of control
29 / 41
![Page 81: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/81.jpg)
Towards a "Searchable" Bloom Filter
Normal SOLR index looks vaguely like
Term Doc IDs
192.168.0.1 1,2,3,5,8,13...
10.0.0.1 2,4,6,8...
8.8.8.8 1,2,3,4,5,6
Terms are going to grow out of control
If only we could constrain to a reasonable number of values?
29 / 41
![Page 82: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/82.jpg)
Terms as a "bloom filter"
30 / 41
![Page 83: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/83.jpg)
Terms as a "bloom filter"
30 / 41
![Page 84: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/84.jpg)
Terms as a "bloom filter"
What if our terms were the offsets of the Bloom Filter values?
30 / 41
![Page 85: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/85.jpg)
Terms as a "bloom filter"
What if our terms were the offsets of the Bloom Filter values?
Term Doc IDs
0 1,2,3,5,8,13...
1 2,4,6,8...
2 1,2,3,4,5,6
3 1,2,3
... ...
N 1,2,3,4,5...
30 / 41
![Page 86: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/86.jpg)
Index
Term Doc IDs
0 0,1,2
1 1,2
2 1
3 0
4 1,2
5 0
Indexing
Field Value Indexed Values Doc ID
ip 192.168.0.1 {0, 3, 5} 0
ip 10.0.0.1 {1, 2, 4} 1
ip 8.8.8.8 {0, 1, 4} 2
Queries
Field Query String Actual Query
ip ip:192.168.0.1 ip_bits:0 AND 3 AND 5
ip ip:10.0.0.1 ip_bits:1 AND 4 AND 5
Searchable Bloom Filters
31 / 41
![Page 87: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/87.jpg)
Problem #2 Solved!
32 / 41
![Page 88: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/88.jpg)
Problem #2 Solved!Enormous filtering power
32 / 41
![Page 89: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/89.jpg)
Problem #2 Solved!Enormous filtering power
Relatively minimal cost in space and computation
32 / 41
![Page 90: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/90.jpg)
Key Lookups
33 / 41
![Page 91: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/91.jpg)
Key LookupsNeed to retain this C* functionality
33 / 41
![Page 92: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/92.jpg)
Key LookupsNeed to retain this C* functionality
Spark/Parquet has no direct support
What partition would it choose?
The partition would have to be encoded in the key?! �
33 / 41
![Page 93: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/93.jpg)
Key LookupsNeed to retain this C* functionality
Spark/Parquet has no direct support
What partition would it choose?
The partition would have to be encoded in the key?! �Solution:
Our keys have time encoded in them
Enables us to generate the partition path containing the row
33 / 41
![Page 94: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/94.jpg)
Key LookupsNeed to retain this C* functionality
Spark/Parquet has no direct support
What partition would it choose?
The partition would have to be encoded in the key?! �Solution:
Our keys have time encoded in them
Enables us to generate the partition path containing the row
That was easy!
33 / 41
![Page 95: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/95.jpg)
Other reasons to "customize"
34 / 41
![Page 96: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/96.jpg)
Other reasons to "customize"
Parquet has support for filter pushdown
34 / 41
![Page 97: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/97.jpg)
Other reasons to "customize"
Parquet has support for filter pushdown
Spark has support for Parquet filter pushdown, but...
34 / 41
![Page 98: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/98.jpg)
Other reasons to "customize"
Parquet has support for filter pushdown
Spark has support for Parquet filter pushdown, but...
Uses INT96 for TimestampNo pushdown support: SPARK-11784All our queries involve timestamps!
34 / 41
![Page 99: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/99.jpg)
Other reasons to "customize"
Parquet has support for filter pushdown
Spark has support for Parquet filter pushdown, but...
Uses INT96 for TimestampNo pushdown support: SPARK-11784All our queries involve timestamps!
IP AddressesSpark, Impala, Presto have no direct supportUse string or binaryWanted to be able to push down CIDR range comparisons
34 / 41
![Page 100: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/100.jpg)
Other reasons to "customize"
Parquet has support for filter pushdown
Spark has support for Parquet filter pushdown, but...
Uses INT96 for TimestampNo pushdown support: SPARK-11784All our queries involve timestamps!
IP AddressesSpark, Impala, Presto have no direct supportUse string or binaryWanted to be able to push down CIDR range comparisons
Lack of pushdown for these leads to wasted I/O and GC pressure.
34 / 41
![Page 101: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/101.jpg)
Archiving
35 / 41
![Page 102: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/102.jpg)
Archiving
Currently, when Time shard fills up:
1. Roll new hot time shard2. Run Spark job to Archive data to S33. Swap out "warm" shard for cold storage (automagical)4. Drop the "warm" shard
35 / 41
![Page 103: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/103.jpg)
Archiving
Currently, when Time shard fills up:
1. Roll new hot time shard2. Run Spark job to Archive data to S33. Swap out "warm" shard for cold storage (automagical)4. Drop the "warm" shard
Not an ideal process, but deals with legacy requirements
35 / 41
![Page 104: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/104.jpg)
Archiving
Currently, when Time shard fills up:
1. Roll new hot time shard2. Run Spark job to Archive data to S33. Swap out "warm" shard for cold storage (automagical)4. Drop the "warm" shard
Not an ideal process, but deals with legacy requirements
TODO:
1. Stream data straight to cold storage2. Materialize customer edits in to hot storage3. Merge hot and cold data at query time (already done)
35 / 41
![Page 105: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/105.jpg)
What have we done so far?
36 / 41
![Page 106: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/106.jpg)
What have we done so far?1. Time sharded C* clusters with SOLR
36 / 41
![Page 107: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/107.jpg)
What have we done so far?1. Time sharded C* clusters with SOLR
2. Cheap speedy Cold storage based on S3 and Spark
36 / 41
![Page 108: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/108.jpg)
What have we done so far?1. Time sharded C* clusters with SOLR
2. Cheap speedy Cold storage based on S3 and Spark
3. A mechanism for archiving data to S3
36 / 41
![Page 109: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/109.jpg)
That's cool, but...
37 / 41
![Page 110: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/110.jpg)
That's cool, but...
How do we handle queries to 3 different stores?
C*
SOLR
Spark
37 / 41
![Page 111: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/111.jpg)
That's cool, but...
How do we handle queries to 3 different stores?
C*
SOLR
Spark
Handle Timesharding and Functional Sharding?
37 / 41
![Page 112: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/112.jpg)
That's cool, but...
How do we handle queries to 3 different stores?
C*
SOLR
Spark
Handle Timesharding and Functional Sharding?
37 / 41
![Page 113: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/113.jpg)
Lot's of Scala query DSL libraries:QuillSlickPhantometc
38 / 41
![Page 114: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/114.jpg)
Lot's of Scala query DSL libraries:QuillSlickPhantometc
AFAIK nobody supports:1. Simultaneously querying heterogeneous data stores
38 / 41
![Page 115: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/115.jpg)
Lot's of Scala query DSL libraries:QuillSlickPhantometc
AFAIK nobody supports:1. Simultaneously querying heterogeneous data stores
2. Stitching together time series data from multiple stores
38 / 41
![Page 116: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/116.jpg)
Lot's of Scala query DSL libraries:QuillSlickPhantometc
AFAIK nobody supports:1. Simultaneously querying heterogeneous data stores
2. Stitching together time series data from multiple stores
3. Managing sharding:
ConfigurationDiscovery
38 / 41
![Page 117: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/117.jpg)
Enter Quaero:
39 / 41
![Page 118: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/118.jpg)
Enter Quaero:
Abstracts away data store differences
Query AST (Algebraic Data Type in Scala)Command/Strategy pattern for easily plugging in new data stores
39 / 41
![Page 119: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/119.jpg)
Enter Quaero:
Abstracts away data store differences
Query AST (Algebraic Data Type in Scala)
Command/Strategy pattern for easily plugging in new data stores
Deep understanding of sharding patterns
Handles merging of time/functional sharded data
Adding shards is configuration driven
39 / 41
![Page 120: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/120.jpg)
Enter Quaero:
Abstracts away data store differences
Query AST (Algebraic Data Type in Scala)
Command/Strategy pattern for easily plugging in new data stores
Deep understanding of sharding patterns
Handles merging of time/functional sharded data
Adding shards is configuration driven
Exposes a "typesafe" query DSL similar to Phantom or Rogue
Reduce/eliminate bespoke code for retrieving the same data from all 3 stores
39 / 41
![Page 121: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/121.jpg)
Open Sourcing? Maybe!?
There is still a lot of work to be done
40 / 41
![Page 122: Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra Summit 2016](https://reader031.fdocuments.in/reader031/viewer/2022030317/586f75d71a28ab10258b61db/html5/thumbnails/122.jpg)
We are hiring!Interwebs: Careers @ Protectwise
Twitter: @ProtectWise
41 / 41