Cassandra from tarball to production
-
Upload
ron-kuris -
Category
Technology
-
view
447 -
download
0
Transcript of Cassandra from tarball to production
![Page 1: Cassandra from tarball to production](https://reader031.fdocuments.in/reader031/viewer/2022030315/5884151b1a28ab95518b676b/html5/thumbnails/1.jpg)
Cassandra: From tarball to production
![Page 2: Cassandra from tarball to production](https://reader031.fdocuments.in/reader031/viewer/2022030315/5884151b1a28ab95518b676b/html5/thumbnails/2.jpg)
Why talk about this?You are about to deploy CassandraYou are looking for “best practices”You don’t want:... to scour through the documentation... to do something known not to work well... to forget to cover some important step
![Page 3: Cassandra from tarball to production](https://reader031.fdocuments.in/reader031/viewer/2022030315/5884151b1a28ab95518b676b/html5/thumbnails/3.jpg)
What we won’t cover● Cassandra: how does
it work?● How do I design my
schema?● What’s new in
Cassandra X.Y?
![Page 4: Cassandra from tarball to production](https://reader031.fdocuments.in/reader031/viewer/2022030315/5884151b1a28ab95518b676b/html5/thumbnails/4.jpg)
So many things to doMonitoring Snitch DC/Rack Settings Time Sync
Seeds/Autoscaling Full/Incremental Backups
AWS Instance Selection
Disk - SSD?
Disk Space - 2x? AWS AMI (Image) Selection
Periodic Repairs Replication Strategy
Compaction Strategy
SSL/VPC/VPN Authorization + Authentication
OS Conf - Users
OS Conf - Limits OS Conf - Perms OS Conf - FSType OS Conf - Logs
C* Start/Stop OS Conf - Path Use case evaluation
![Page 5: Cassandra from tarball to production](https://reader031.fdocuments.in/reader031/viewer/2022030315/5884151b1a28ab95518b676b/html5/thumbnails/5.jpg)
Chef to the rescue?Chef community cookbook availablehttps://github.com/michaelklishin/cassandra-chef-cookbook
Installs java Creates a “cassandra” user/group
Download/extract the tarball Fixes up ownership
Builds the C* configuration files Sets the ulimits for filehandles, processes, memory locking
Sets up an init script Sets up data directories
![Page 6: Cassandra from tarball to production](https://reader031.fdocuments.in/reader031/viewer/2022030315/5884151b1a28ab95518b676b/html5/thumbnails/6.jpg)
Chef Cookbook CoverageMonitoring Snitch DC/Rack Settings Time Sync
Seeds/Autoscaling Full/Incremental Backups
Disk - SSD? Disk - How much?
AWS Instance Type AWS AMI (Image) Selection
Periodic Repairs Replication Strategy
Compaction Strategy
SSL/VPC/VPN Authorization + Authentication
OS Conf - Users
OS Conf - Limits OS Conf - Perms OS Conf - FSType OS Conf - Logs
C* Start/Stop OS Conf - Path Use case evaluation
![Page 7: Cassandra from tarball to production](https://reader031.fdocuments.in/reader031/viewer/2022030315/5884151b1a28ab95518b676b/html5/thumbnails/7.jpg)
MonitoringIs every node answering queries?Are nodes talking to each other?Are any nodes running slowly?Push UDP! (statsd)http://hackers.lookout.com/2015/01/cassandra-monitoring/https://github.com/lookout/cassandra-statsd-agent
![Page 8: Cassandra from tarball to production](https://reader031.fdocuments.in/reader031/viewer/2022030315/5884151b1a28ab95518b676b/html5/thumbnails/8.jpg)
Monitoring - SyntheticHealth checks, bad and good● ‘nodetool status’ exit code
○ Might return 0 if the node is not accepting requests○ Slow, cross node reads
● cqlsh -u sysmon -p password < /dev/null● Verifies this node can read auth table● https://github.com/lookout/cassandra-health-check
![Page 9: Cassandra from tarball to production](https://reader031.fdocuments.in/reader031/viewer/2022030315/5884151b1a28ab95518b676b/html5/thumbnails/9.jpg)
What about OpsCenter?We chose not to use itWant consistent interface for all monitoringGUI vs Command Line argumentDidn’t see good auditing capabilitiesDidn’t interface well with our chef solution
![Page 10: Cassandra from tarball to production](https://reader031.fdocuments.in/reader031/viewer/2022030315/5884151b1a28ab95518b676b/html5/thumbnails/10.jpg)
SnitchUse the right snitch!● AWS EC2MultiRegionSnitch● Google? GoogleCloudSnitch● GossipingPropertyFileSnitchNOT● SimpleSnitch (default)Community cookbook: set it!
![Page 11: Cassandra from tarball to production](https://reader031.fdocuments.in/reader031/viewer/2022030315/5884151b1a28ab95518b676b/html5/thumbnails/11.jpg)
What is RF?Replication Factor is how many copies of dataValue is hashed to determine primary hostAdditional copies always next node
Hash here
![Page 12: Cassandra from tarball to production](https://reader031.fdocuments.in/reader031/viewer/2022030315/5884151b1a28ab95518b676b/html5/thumbnails/12.jpg)
What is CL?Consistency Level -- It’s not RF!Describes how many nodes must respond before operation is considered COMPLETECL_ONE - only one node respondsCL_QUORUM - (RF/2)+1 nodes (round down)CL_ALL - RF nodes respond
![Page 13: Cassandra from tarball to production](https://reader031.fdocuments.in/reader031/viewer/2022030315/5884151b1a28ab95518b676b/html5/thumbnails/13.jpg)
DC/Rack SettingsYou might need to set these
Maybe you’re not in AmazonRack == Availability Zone?Hard: Renaming DC or adding racks
![Page 14: Cassandra from tarball to production](https://reader031.fdocuments.in/reader031/viewer/2022030315/5884151b1a28ab95518b676b/html5/thumbnails/14.jpg)
Renaming DCsClients “remember” which DC they talk toRenaming single DC causes all clients to failBetter to spin up a new one than rename old
![Page 15: Cassandra from tarball to production](https://reader031.fdocuments.in/reader031/viewer/2022030315/5884151b1a28ab95518b676b/html5/thumbnails/15.jpg)
Adding a rackStart with 6 node cluster, rack R1Replication factor 3Add 1 node in R2, and rebalanceALL data in R2 node?Good idea to keep racks balanced
![Page 16: Cassandra from tarball to production](https://reader031.fdocuments.in/reader031/viewer/2022030315/5884151b1a28ab95518b676b/html5/thumbnails/16.jpg)
I don’t have time for thisClusters must have synchronized timeYou will get lots of drift with: [0-3].amazon.pool.ntp.orgCommunity cookbook doesn’t cover anything here
![Page 17: Cassandra from tarball to production](https://reader031.fdocuments.in/reader031/viewer/2022030315/5884151b1a28ab95518b676b/html5/thumbnails/17.jpg)
Better make time for thisC* serializes write operations by time stampsClocks on virtual machines drift!It’s the relative difference among clocks that mattersC* nodes should synchronize with each otherSolution: use a pair of peered NTP servers (level 2 or 3) and a small set of known upstream providers
![Page 18: Cassandra from tarball to production](https://reader031.fdocuments.in/reader031/viewer/2022030315/5884151b1a28ab95518b676b/html5/thumbnails/18.jpg)
From a small seed…Seeds are used for new nodes to find clusterEvery new node should use the same seedsSeed nodes get topology changes fasterEach seed node must be in the config fileMultiple seeds per datacenter recommendedTricky to configure on AWS
![Page 19: Cassandra from tarball to production](https://reader031.fdocuments.in/reader031/viewer/2022030315/5884151b1a28ab95518b676b/html5/thumbnails/19.jpg)
Backups - Full+IncrementalNothing in the cookbooks for thisC* makes it “easy”: snapshot, then copySnapshots might require a lot more spaceRemove the snapshot after copying it
![Page 20: Cassandra from tarball to production](https://reader031.fdocuments.in/reader031/viewer/2022030315/5884151b1a28ab95518b676b/html5/thumbnails/20.jpg)
Disk selectionSSD Rotational
EphemeralEBS
Low latency Any size instance Any size instance
Recommended Not cheap Less expensive
Great random r/w perf Good write performance No node rebuilds
No network use for disk No network use for disk
![Page 21: Cassandra from tarball to production](https://reader031.fdocuments.in/reader031/viewer/2022030315/5884151b1a28ab95518b676b/html5/thumbnails/21.jpg)
AWS Instance SelectionWe moved to EC2c3.2xlarge (15GiB mem, Disk 160GB)?i2.xlarge (30GiB mem, 800GB disk)Max recommended storage per node is 1TBUse instance types that support HVMSome previous generation instance types, such as T1, C1, M1, and M2 do not support Linux HVM AMIs. Some current generation instance types, such as T2, I2, R3, G2, and C4 do not support PV AMIs.
![Page 22: Cassandra from tarball to production](https://reader031.fdocuments.in/reader031/viewer/2022030315/5884151b1a28ab95518b676b/html5/thumbnails/22.jpg)
How much can I use??Snapshots take space (kind of)Best practice: keep disks half full!800GB disk becomes 400GBSnapshots during repairs?Lots of uses for snapshots!
![Page 23: Cassandra from tarball to production](https://reader031.fdocuments.in/reader031/viewer/2022030315/5884151b1a28ab95518b676b/html5/thumbnails/23.jpg)
Periodic RepairsBuried in the docs:“As a best practice, you should schedule repairs weekly”http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html
● “-pr” (yes)● “-par” (maybe)● “--in-local-dc” (no)
![Page 24: Cassandra from tarball to production](https://reader031.fdocuments.in/reader031/viewer/2022030315/5884151b1a28ab95518b676b/html5/thumbnails/24.jpg)
Repair TipsRaise gc_grace_seconds (tombstones?)Run on one node at a timeSchedule for low usage hoursUse “par” if you have dead time (faster)Tune with: nodetool setcompactionthroughput
![Page 25: Cassandra from tarball to production](https://reader031.fdocuments.in/reader031/viewer/2022030315/5884151b1a28ab95518b676b/html5/thumbnails/25.jpg)
I thought I deleted thatCompaction removes “old” tombstones10 day default grace period (gc_grace_period)After that, deletes will not be propagated!Run ‘nodetool repair’ at least every 10 daysOnce a week is perfect (3 day grace)Node down >7 days? ‘nodetool remove’ it!
![Page 26: Cassandra from tarball to production](https://reader031.fdocuments.in/reader031/viewer/2022030315/5884151b1a28ab95518b676b/html5/thumbnails/26.jpg)
Changing RF within DC?Easy to decrease RFImpossible to increase RF without (usually)Reads with CL_ONE might fail!
Hash here
![Page 27: Cassandra from tarball to production](https://reader031.fdocuments.in/reader031/viewer/2022030315/5884151b1a28ab95518b676b/html5/thumbnails/27.jpg)
Replication StrategyHow many replicas should we have?What happens if some data is lost?Are you write-heavy or read-heavy?Quorum considerations: odd is better!RF=1? RF=3? RF=5?
![Page 28: Cassandra from tarball to production](https://reader031.fdocuments.in/reader031/viewer/2022030315/5884151b1a28ab95518b676b/html5/thumbnails/28.jpg)
Magic JMX setting: reduce traffic to a nodeGreat when node is “behind” the 4 hour windowUsed by gossiper to divert traffic during repairsWrites: ok, read repair: ok, nodetool repair: ok$ java -jar jmxterm.jar -l localhost:7199$> set -b org.apache.cassandra.db:type=DynamicEndpointSnitch Severity 10000
Don’t be too severe!
![Page 29: Cassandra from tarball to production](https://reader031.fdocuments.in/reader031/viewer/2022030315/5884151b1a28ab95518b676b/html5/thumbnails/29.jpg)
Compaction StrategySolved by using a good C* designSizeTiered or Leveled?
Leveled has better guarantees for read timesSizeTiered may require 10 (or more) reads!Leveled uses less disk spaceLeveled tombstone collection is slower
![Page 30: Cassandra from tarball to production](https://reader031.fdocuments.in/reader031/viewer/2022030315/5884151b1a28ab95518b676b/html5/thumbnails/30.jpg)
Auth*Cookbooks default to OFF
Turn authenticator and authorizer on‘cassandra’ user is super special
Requires QUORUM (cross-DC) for signonLOCAL_ONE for all other users!
![Page 31: Cassandra from tarball to production](https://reader031.fdocuments.in/reader031/viewer/2022030315/5884151b1a28ab95518b676b/html5/thumbnails/31.jpg)
UsersOS users vs Cassandra users: 1 to 1?Shared credentials for apps?Nothing logs the user taking the action!‘cassandra’ user is created by cookbookAll processes run as ‘cassandra’
![Page 32: Cassandra from tarball to production](https://reader031.fdocuments.in/reader031/viewer/2022030315/5884151b1a28ab95518b676b/html5/thumbnails/32.jpg)
LimitsChef helps here! Startup:ulimit -l unlimited # mem lockulimit -n 48000 # fds
/etc/security/limits.dcassandra - nofile 48000cassandra - nproc unlimitedcassandra - memlock unlimited
![Page 33: Cassandra from tarball to production](https://reader031.fdocuments.in/reader031/viewer/2022030315/5884151b1a28ab95518b676b/html5/thumbnails/33.jpg)
Filesystem TypeOfficially supported: ext4 or XFSXFS is slightly fasterInteresting options:● ext4 without journal● ext2● zfs
![Page 34: Cassandra from tarball to production](https://reader031.fdocuments.in/reader031/viewer/2022030315/5884151b1a28ab95518b676b/html5/thumbnails/34.jpg)
LogsTo consolidate or not to consolidate?Push or pull? Usually push!FOSS: syslogd, syslog-ng, logstash/kibana, heka, bananaOthers: Splunk, SumoLogic, Loggly, Stackify
![Page 35: Cassandra from tarball to production](https://reader031.fdocuments.in/reader031/viewer/2022030315/5884151b1a28ab95518b676b/html5/thumbnails/35.jpg)
ShutdownNice init script with cookbook, steps are:● nodetool disablethrift (no more clients)● nodetool disablegossip (stop talking to
cluster)● nodetool drain (flush all memtables)● kill the jvm
![Page 36: Cassandra from tarball to production](https://reader031.fdocuments.in/reader031/viewer/2022030315/5884151b1a28ab95518b676b/html5/thumbnails/36.jpg)
Quick performance wins● Disable assertions - cookbook property● No swap space (or vm.swappiness=1)● max_concurrent_reads● max_concurrent_writes