When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 ·...
Transcript of When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 ·...
![Page 1: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204.](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebb2d4f4f610f0c17317493/html5/thumbnails/1.jpg)
![Page 3: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204.](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebb2d4f4f610f0c17317493/html5/thumbnails/3.jpg)
Why this talk
We were challenged with an interesting requirement...
![Page 4: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204.](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebb2d4f4f610f0c17317493/html5/thumbnails/4.jpg)
What makes a distributed system?
A bunch of stuff that magically works together
![Page 5: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204.](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebb2d4f4f610f0c17317493/html5/thumbnails/5.jpg)
How to start?
Investigate the current setup (if any)
Understand your use case
Understand your data
Set a base configuration
Define the goal
![Page 6: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204.](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebb2d4f4f610f0c17317493/html5/thumbnails/6.jpg)
Investigate the current setup
● What type of deployment are you working with?● What is the available hardware?
○ CPU cores and threads○ Memory amount and type○ Storage size and type○ Network interfaces amount and type○ Limitations
![Page 7: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204.](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebb2d4f4f610f0c17317493/html5/thumbnails/7.jpg)
Hardware configuration
8-16 cores32GB ram
Commit log SSDData drive SSD
1GbE
Placement groupsAvailability zones
Enhanced networking
![Page 8: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204.](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebb2d4f4f610f0c17317493/html5/thumbnails/8.jpg)
OS - Swap, storage, cpu
Swap is bad
● remove swap from fstab● disable swap: swapoff -a
Optimize block layer
echo 1 > /sys/block/XXX/queue/nomergesecho 8 > /sys/block/XXX/queue/read_ahead_kbecho deadline > /sys/block/XXX/queue/scheduler
Disable cpu scaling
for sysfs_cpu in /sys/devices/system/cpu/cpu[0-9]*do echo performance > $sysfs_cpu/cpufreq/scaling_governordone
![Page 9: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204.](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebb2d4f4f610f0c17317493/html5/thumbnails/9.jpg)
sysctl.d - network
net.ipv4.tcp_rmem = 4096 87380 16777216 # read buffer space allocatable in units of pagesnet.ipv4.tcp_wmem = 4096 65536 16777216 # write buffer space allocatable in units of pagesnet.ipv4.tcp_ecn = 0 # disable explicit congestion notificationnet.ipv4.tcp_window_scaling = 1 # enable window scaling (higher throughput)net.ipv4.ip_local_port_range = 10000 65535 # allowed local port rangenet.ipv4.tcp_tw_recycle = 1 # enable fast time-wait recycle
net.core.rmem_max = 16777216 # max socket receive buffer in bytesnet.core.wmem_max = 16777216 # max socket send buffer in bytesnet.core.somaxconn = 4096 # number of incoming connectionsnet.core.netdev_max_backlog = 16384 # incoming connections backlog
![Page 10: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204.](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebb2d4f4f610f0c17317493/html5/thumbnails/10.jpg)
sysctl.d - vm and fs
vm.swappiness = 1 # memory swapping thresholdvm.max_map_count = 1073741824 # max memory map areas a process can havevm.dirty_background_bytes = 10485760 # dirty memory amount threshold (kernel)vm.dirty_bytes = 1073741824 # dirty memory amount threshold (process)fs.file-max = 1073741824 # max number of open filesvm.min_free_kbytes = 1048576 # min number of VM free kilobytes
![Page 11: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204.](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebb2d4f4f610f0c17317493/html5/thumbnails/11.jpg)
JVM - G1GC
JVM_OPTS="$JVM_OPTS -XX:+UseG1GC"JVM_OPTS="$JVM_OPTS -XX:MaxGCPauseMillis=500"JVM_OPTS="$JVM_OPTS -XX:G1RSetUpdatingPauseTimePercent=5"JVM_OPTS="$JVM_OPTS -XX:InitiatingHeapOccupancyPercent=25"
JVM_OPTS="$JVM_OPTS -XX:ParallelGCThreads=16" # Set to number of full coresJVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=16" # Set to number of full cores
![Page 12: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204.](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebb2d4f4f610f0c17317493/html5/thumbnails/12.jpg)
JVM - HotSpot
MAX_HEAP_SIZE="8G" # Good starting pointHEAP_NEWSIZE="2G" # Good starting point
JVM_OPTS="$JVM_OPTS -XX:+PerfDisableSharedMem"JVM_OPTS="$JVM_OPTS -XX:-UseBiasedLocking"
# Tunable settingsJVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=2"JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=16"JVM_OPTS="$JVM_OPTS -XX:+UnlockDiagnosticVMOptions"JVM_OPTS="$JVM_OPTS -XX:ParGCCardsPerStrideChunk=4096"
# Instagram settingsJVM_OPTS="$JVM_OPTS -XX:+CMSScavengeBeforeRemark"JVM_OPTS="$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=60000"JVM_OPTS="$JVM_OPTS -XX:CMSWaitDuration=30000"
![Page 13: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204.](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebb2d4f4f610f0c17317493/html5/thumbnails/13.jpg)
Cassandra yaml
concurrent_reads: 128concurrent_writes: 128concurrent_counter_writes: 128memtable_allocation_type: heap_buffersmemtable_flush_writers: 8memtable_cleanup_threshold: 0.15memtable_heap_space_in_mb: 2048memtable_offheap_space_in_mb: 2048
trickle_fsync: truetrickle_fsync_interval_in_kb: 1024
internode_compression: dc
![Page 14: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204.](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebb2d4f4f610f0c17317493/html5/thumbnails/14.jpg)
Data model
Data model impacts performance a lotOptimize so that you read from one partition
Make sure your data can be distributedSSTable compression depending on the use case
Compaction strategy
![Page 15: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204.](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebb2d4f4f610f0c17317493/html5/thumbnails/15.jpg)
Ok, what now?
After we set the base configuration it’s time for testing and observing
![Page 16: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204.](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebb2d4f4f610f0c17317493/html5/thumbnails/16.jpg)
Test setup
Make sure you have repeatable testsFixed rate tests
Variable rate testsProduction like testsCassandra Stress
Various loadgen tools (gatling, wrk, loader,...)Coordinated omission
![Page 17: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204.](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebb2d4f4f610f0c17317493/html5/thumbnails/17.jpg)
Tuning methodology
![Page 18: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204.](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebb2d4f4f610f0c17317493/html5/thumbnails/18.jpg)
Metrics and reporting stack
OS metrics (SmartCat)Metrics reporter config (AddThis)
Cassandra diagnostics (SmartCat)FilebeatRiemannInfluxDBGrafana
ElasticsearchLogstashKibana
![Page 19: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204.](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebb2d4f4f610f0c17317493/html5/thumbnails/19.jpg)
Grafana
![Page 20: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204.](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebb2d4f4f610f0c17317493/html5/thumbnails/20.jpg)
Kibana
![Page 21: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204.](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebb2d4f4f610f0c17317493/html5/thumbnails/21.jpg)
Slow queries
Track query execution times above some thresholdGain insights into the long processing queries
Relate that to what’s going on on the nodeCompare app and cluster slow queries
https://github.com/smartcat-labs/cassandra-diagnostics
![Page 22: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204.](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebb2d4f4f610f0c17317493/html5/thumbnails/22.jpg)
Slow queries - cluster
![Page 23: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204.](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebb2d4f4f610f0c17317493/html5/thumbnails/23.jpg)
Slow queries - cluster vs app
![Page 24: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204.](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebb2d4f4f610f0c17317493/html5/thumbnails/24.jpg)
Ops center
Pros:Great when starting out
Everything you need in a nice GUICluster metrics
Cons:Metrics stored in the same cluster
Issues with some of the services (repair, slow query,...)Additional agents on the nodes
![Page 25: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204.](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebb2d4f4f610f0c17317493/html5/thumbnails/25.jpg)
AWS
![Page 26: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204.](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebb2d4f4f610f0c17317493/html5/thumbnails/26.jpg)
AWS deployment
Choose your instance based on calculationsCost limits come second
Use placement groups and availability zonesDon’t overdo it just because you can ($$$)
Go for EBS volumes (gp2)You don’t need ephemeral storage (mostly)
![Page 27: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204.](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebb2d4f4f610f0c17317493/html5/thumbnails/27.jpg)
EBS volumes
Pros:3.4TB+ volume has 10.000 IOPs
Average latency is ~0.38msDurable across reboots
AWS snapshotsCan be attached/detached
Easy to recreate
Cons:Rare latency spikes
Average latency is ~0.38msDegrading factor
![Page 28: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204.](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebb2d4f4f610f0c17317493/html5/thumbnails/28.jpg)
EBS volume problems
![Page 29: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204.](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebb2d4f4f610f0c17317493/html5/thumbnails/29.jpg)
End result
Did we meet our goal?Can we go any further?
Torture testingFailure scenarios
Latency and delay inducersAutomate everything
![Page 30: When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 · When every millisecond counts July 2016 Matija Gobec matija.gobec@smartcat.io @mad_max0204.](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebb2d4f4f610f0c17317493/html5/thumbnails/30.jpg)
Q&A