Cassandra Summit 2014: Cassandra at Instagram 2014

Post on 05-Dec-2014

2.108 views 3 download

description

Presenter: Rick Branson, Infrastructure Engineer at Instagram As Instagram has scaled to over 200 million users, so has our use of Cassandra. We've built new features and rebuilt old on Cassandra, and it's become an extremely mission-critical foundation of our production infrastructure. Rick will deliver a refresh of our use cases and go deep on the technical challenges we faced during our expansion.

Transcript of Cassandra Summit 2014: Cassandra at Instagram 2014

CASSANDRA @ INSTAGRAM 2014Rick Branson, Software Engineer, Backend Engineering

@rbranson Cassandra Summit

September 11th, 2014

2013 in a Nutshell

• Moving some log-style data from Redis to Cassandra

• Saving $, better availability, more elasticity

• Some problems, but we worked through them

Overview

• "The Trouble with Redis"

• A whole new category of functionality

• Useful consumer advice

"THE TROUBLE WITH REDIS"

REDIS: THE GÜD PARTS

The Network PartIt's a heap on the network with a great text-based protocol.

Extremely EfficientIt does a lot with every ounce of CPU

StableVery much so. Sets the bar for the vast NoSQL space.

Useful, Rich TypesYou won't find anywhere else.

REDIS: ZDARKSIDE

What's in there? Hello?

Snapshots (BGSAVE)

Single-Threaded (!)

Memory Allocator

(Lack of) Durability

Unsafe Replication

THE PHILOSOPHY

CACHE OR DATABASE? Neither.

Memory Cliff

Unlikely to Change :(

ZOOKEEPER???

ZOZO: *VERY* few writes, extremely high volume, no-

latency reads without hotspots.

zozo agent

/var/cache/zozo

Consumer Process

Consumer Process

Consumer Process

WATCH /zozo/files

WriterUPDATE /zozo/files/knobs

ZooKeeper Ensemble

Examples

• "Knobs" key/integer live configuration

• Small (<250K) User Lists

• Security Data Sets (i.e. Blocked IPs)

JSON sucks.

Stay tuned.

zozo postmortem

• Agent is dead simple, never needs updating

• ZooKeeper resists both machine & human errors

• Multi-region strategy coming

REDIS2CASSANDRA (omg, finally)

Some things we moved

• Ad Serving Metadata

• Account Suggestion Metadata

• Churn Tracking

• Follow Requests

Not a drop-in.

A B

source target

"User A has requested to follow User B"

Key Column ColumnA B ...

Key Column Column

B A ...

Outbound Requests

Inbound Requests

INSTAGRAM DIRECT

Y U CASSANDRA?

• Elasticity + Scalability

• Write Availability

• Storage Efficiency

• Good Data Model Fit

WE CHOSE THRIFT.

fan-out-on-write

• Caching doesn't work

• Copy of full data for each recipient

• Atomic batches super useful

#nocache

CompositeType(IntegerType(reversed=true), IntegerType, IntegerType)

postid

typecode

entryid

POST_HEADER => creator_id: 12938 media_type: 1 asset_path: "/objects/23904801..." recipients: [28910, 208192]

(SEEN, 28910) (SEEN, 208192) (LIKE, 208192)

(COMMENT, 9302...) => [19238, "George wishes he ha..."]

(COMMENT, 3499...) => [28910, "Weird"]

IMMUTABLE (mostly) LOG

FILTER-ON-READ

CQL3, please. no? ok. :(

THE PENDING QUEUE

Key Column ColumnB [A's PostID] ...

C [A's PostID] ...

D [A's PostID] ...

A C

sender

recipients

D

B

DFirst read, delete pending

2000000 (POINTER, 1000000)1500000 (POST_HEADER, 0) => ...

(SEEN, 902394)(SEEN, 94083)(LIKE, 902394)

1000000 (POST_HEADER, 0) => ...(BUMPED, 0) => 2000000(SEEN, 1298333)(SEEN, 848183)...

Justin's Pending

25000

24000

23000

22000

21000

20000

19000

...

1234

Justin's Inbox

...

(1234, POST..)

(1234, LIKE..)

(1234, SEEN..)

(1234, SEEN..)

(1010, POST..)

(1010, SEEN..)

(1010, COMM..)

...

Inbox

...

(1234, POST..)

(1234, LIKE..)

(1234, SEEN..)

(1234, SEEN..)

(1010, POST..)

(1010, SEEN..)

(1010, COMM..)

...

Inbox

...

(1234, POST..)

(1234, LIKE..)

(1234, SEEN..)

(1234, SEEN..)

(1010, POST..)

(1010, SEEN..)

(1010, COMM..)

...

Inbox

...

(1234, POST..)

(1234, LIKE..)

(1234, SEEN..)

(1234, SEEN..)

(1010, POST..)

(1010, SEEN..)

(1010, COMM..)

...

(1234, EXCLUDE, <JUSTIN>)

Rick's Unseen

849300

738939

649019

641192

629012

610392

483201

348931

Rick's Unseen

849300

738939

649019

641192

629012

610392

483201

348931

Rick's Unseen

849300

738939

649019

641192

629012

610392

483201

348931

8

DEPLOYMENT

• 60 x hi1.4xlarge EC2 instances

• 2X our highest estimates

• Shrunk cluster after we knew load

• Peace of mind

• Data models held together

We ran C* in production.We learned some things.

young gc fun!

Yung GC has problems

• Reads = young gen garbage

• Double-collection bug in JDK 1.7+

• Best practices were for nodes with more writes

now we run: 10G Young Gen 20G Heap

<blink>DEAR GOD

ALMIGHTY!!!</blink>

This won't work out of the box.

-XX:+CMSScavengeBeforeRemark

-XX:CMSMaxAbortablePrecleanTime=60000

-XX:CMSWaitDuration=30000

-XX:+CMSEdenChunksRecordAlways -XX:+CMSParallelInitialMarkEnabled

20-core Ivy Bridge 144GB RAM PCIe Flash

15,000 reads/sec 2,000 writes/sec

TWO ISSUES

• Counters. Don't use them.

• Hundreds of thousands / millions of tombstones.

Async Storage Port Handler

READ 12349081

Client

RPC Thread Pool

Queue

ReadStage Thread Pool

Queue

GET 12349081

THANK YOU.