Cassandra Summit 2014: Deploying Cassandra for Call of Duty

Post on 15-Dec-2014

736 views 0 download

description

Presenters: Seán O Sullivan, Service Reliability Engineer & Tim Czerniak, Software Engineer at Demonware This presentation covers the eight-month evaluation process we underwent to migrate some of Call of Duty’s core services from MySQL to Cassandra. We will outline our requirements, the process we followed for the evaluation, decisions we made around our schema, configuration and hardware, and some issues we encountered.

Transcript of Cassandra Summit 2014: Deploying Cassandra for Call of Duty

DEMONWAREDeploying Cassandra for Call of Duty

#CassandraSummit

Tim Czerniak Software Engineer

DemonWare

Seán O Sullivan Operations Engineer

DemonWare

DEMON-WHO?

DemonWare is a subsidiary of Activision-Blizzard

We write, deploy and maintain client and server applications for Activision and Blizzard games

SERVICES• Matchmaking • Leaderboards • Chat • File Storage • Leagues • Social Network

Integration • etc…

TECHNOLOGIES

Client

C++ HTTP

Server

Python Erlang

MySQL CentOS

Puppet

OUR UNUSUAL USE CASE

Release

First weekend

Christmas

Peak

– Benjamin Franklin

“By failing to prepare,you are preparing to fail.”

OUR PREDICAMENT

Needed to share data cross-DC…

…but MySQL isn’t so good at that.

• Progress store • High write, low read. • File size ~4KB • Persistent

• Presence • High write, high read • Data size minimal • Transient

• Messaging • Low write, low read • Transient

SERVICES

• Cross DC

• Ease of consolidation and expansion

• Manageability for the operations teams

• Throughput

• Storage: 1,500,000 reqs/min

• Presence: 250,000 reqs/min

• Messaging: 850,000 reqs/min

REQUIREMENTS

EVALUATION• Shortlisted suitable

options • Riak • Cassandra

• Re-wrote our application backend, twice

LOAD TESTING

• Two clusters

• Single CPU, SSD and average memory

• Dual CPU, Spindles and high memory

• Used realistic user profiles

• Included peaks and troughs during testing

• Ran a soak test

THE WINNER???• Initially Riak was a slam-dunk

• Erlang-based (we know Erlang)

• Tooling is excellent

• Performed well

• Previously evaluated

THE WINNER• Cassandra won in the end

• Write performance

• Richer feature set

• Maturity of codebase and tooling

• Testing continued 24/7 until launch

SCHEMA• Progress store

• A perfect fit! • Presence

• More relational • High throughput (Tombstones!) • TTLs

• Messaging • Time-series data, well suited • Tombstones!

• Keep it simple

• It’s not a relational DB

• Get your partition keys and clustering keys right.

• C* will do what it does best

SCHEMA: LESSONS LEARNED

SCHEMA: LESSONS LEARNED• Don’t ignore CAP theorem

• Cassandra has tuneable consistency, but there will be trade-offs

• Load test with real numbers

• Some issues aren’t evident in unit-tests

CONFIG

• Default settings, probably not what you want

• Changed many settings off the bat

• Reverted some (oops)

HARDWARE

• 2x Intel Xeon E5-2620 @ 2Ghz

• 2x 480GB SSD (RAID-1)

• 32GB

• 1Gb non-dedicated network

MONITORING

• Graphite

• Nagios

• Jolokia

GOTCHAS• Vnodes and rack awareness

• Loadbalancers

• Dev differs from production (of course...)

• Launching in a DC we didn't load test in

LAUNCH

• Request to simulate a node failure

• Two nodes died over Christmas

• Expanding to other titles

QUESTIONS?

APPENDIXcassandra.conf:

auto_bootstrap: false

hinted_handoff_throttle_in_kb: 1024

max_hints_delivery_threads: 2

trickle_fsync: true

rpc_server_type: hsha

<% if virtual == "physical" -%>

concurrent_reads: 128

<% else -%>

concurrent_reads: 32

<% end -%>

concurrent_writes: <%= processorcount.to_i * 8 -%>

multithreaded_compaction: false

<% if virtual == "physical" -%>

compaction_throughput_mb_per_sec: 0

<% else -%>

compaction_throughput_mb_per_sec: 16

<% end -%>

!

cassandra-env.sh:

<% if virtual == "physical" -%>

JVM_OPTS="$JVM_OPTS -Xss180k"

<% else -%>

JVM_OPTS="$JVM_OPTS -Xss228k"

<% end -%>

JVM_EXTRA_OPTS="$JVM_EXTRA_OPTS -javaagent:/usr/share/java/graphite-reporter-agent.jar -javaagent:/usr/share/java/jolokia-jvm-agent.jar=port=8080,host=<%= hostname %>"

EXTRA_CLASSPATH="/usr/share/java/metrics-graphite-2.0.3.jar"