Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla

33
Cassandra & Scylla at Kenshoo

Transcript of Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla

Page 1: Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla

Cassandra & Scylla at Kenshoo

Page 2: Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla

About Me

• Wrote “Basic” code when I was a kid

• 17 years in the internet industry

• Big data fanatic for the last 6 years

• Big data team leader At Kenshoo

• Our team is Big data DBA

• Programming: ETL & Administration tools

Page 3: Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla

Kenshoo

• 10 years, Tel-Aviv based Startup

• Industry Leader in Digital Marketing

• 500+ employees

• Heavy data shop

Page 4: Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla

Kenshoo Legacy Architecture

Page 5: Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla

Bigdata at Kenshoo

Page 6: Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla

Cassandra at Kenshoo

• Datastax Community

• 5 clusters

• 70 nodes

• Biggest cluster, ver 1.2.19▪ 40 physical nodes▪ 4TB compressed to 1TB per node▪ 14 Billion records▪ 1500 bytes values, IOPS: 5K avg, 30K burst▪ Processing clicks and conversion for user behavior

Page 7: Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla

Cassandra Cost

• Cassandra is great at writes (At first)

• Postponing the “cost” to background processes (no free meals)▪ Compact▪ Repair▪ Cleanup▪ Add / Remove node

Page 8: Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla

War & PeaceKnowing your Cassandra

Page 9: Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla

Day to day

• Requires a lot of in depth knowledge

• Lack of documentation

• Tuning per application

• Lot’s of custom maintenance scripts

• Don’t run repair , only rebuild

Page 10: Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla

• Compaction didn’t complete during maintenance window

• Leveled CS is a ColumnFamily cluster wide configuration

• Found a jmx “hack” for a per server config

Migration to Leveled Compaction Strategy

• It took us a month to manually switch

• Needed to reconfigure each server after restart

Page 11: Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla

GC Hell

• Lot’s of GC under load

• GC causing 95th percentile performance▪ One problematic node affects the cluster performance

• Full restart to the cluster each week

• Change rpc server type to hsha

• Tuning is black magic, takes days to see effects

Page 12: Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla

Maintenance is delicate

• Need to wait between adding and removing nodes for

cluster to rebalance

• Turn off a node thrift service

• Running partial cleanup

• Tuning of params▪ compaction_throughput_mb_per_sec▪ concurrent_compactors

Page 13: Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla

Scylla Evaluation

Page 14: Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla

Self tuning features

Page 15: Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla

Lab setup

• i2.8xlarge

• Cassandra 2.1.15▪ compaction_throughput_mb_per_sec = 0 (16)▪ stream_throughput_outbound_megabits_per_sec = 10000 (200)▪ inter_dc_stream_throughput_outbound_megabits_per_sec = 10000 (200)

• Scylla 1.3▪ out of the box

Page 16: Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla

GC at Scylla

Page 17: Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla

Repair

• 3 x i2.8xlarge

• RF 3

• 72GB data per node• 10M rows

▪ 5 columns▪ 1500 bytes value

• Delete all the data from one node

Page 18: Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla

Repair results

Page 19: Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla

Cleanup at Scylla

• 4 x i2.8xlarge

• RF 3

• 270GB data per node• 50M rows

▪ 5 columns▪ 1500 bytes value

• Decommission a node and join it back

Page 20: Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla

Cleanup results

Page 21: Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla

Compact at Scylla

• 3 x i2.8xlarge

• RF 3

• 30 minute stress• 10M rows

▪ 5 columns▪ 1500 bytes value

Page 22: Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla

Compaction result

Page 23: Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla

Latency & Throughput

• 3 x i2.8xlarge

• RF 3

• 72GB data per node• 10M rows

▪ 5 columns▪ 1.5K value▪ 3 writes, 2 reads mixed

• 4 x cassandra-stress nodes▪ 30 minutes

Page 24: Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla
Page 25: Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla
Page 26: Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla
Page 27: Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla
Page 28: Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla
Page 29: Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla
Page 30: Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla
Page 31: Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla

• Scylla has lower cost▪ Compaction, repair & cleanup are much more efficient▪ Consistent in latency under much higher load

• Moving forward▪ Integrate it in an inner production monitoring system

Conclusion

Page 32: Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla

Q&A

Page 33: Scylla Summit 2016: Why Kenshoo is about to displace Cassandra with Scylla

Thank You!

Contact: [email protected]