Netflix at-disney-09-26-2014

Post on 28-Nov-2014

156 views 0 download

description

Slides from a presentation by Monal Daxini at Disney, Glendale CA about Netflix Open Source Software, Cloud Data Persistence, and Cassandra best Practices

Transcript of Netflix at-disney-09-26-2014

Cloud Data Persistence @

Monal Daxini Senior Software Engineer

Cloud Database Engineering !

@monaldax

50m+ Subscribers

SummaryNetflix OSS

Microservices

m@Netflix Season 1, 2

Cassandra @ Netflix

Cassandra Best Practices

Coming Soon…

Start with Zero To Cloud With @NetflixOSS

!https://github.com/Netflix-Skunkworks/zerotocloud

Karyon/Governator

Hystrix

Ribbon/Eureka

Curator

EVCache

Astyanax

Turbine

Servo

Blitz4J

Function OSS Library

RxJava

Archaius

Building Apps and AMIs

ASG /Cluster

WAR

ASG/Cluster

App AMI

Deploy

Launch Instances

@stonse

NetflixOSS

Suro Data Pipeline

Eureka

Zuul

Edda

Micro ServicesMicro services DOES NOT mean better Availability

Need Fault Tolerant Architecture

Service Dependency View

Distributed Tracing (Dapper inspired)

Micro Services1 response - 1 monolithic service 99.99% uptime

1 response - 30 micro services each 99.99% uptime

overall 97% uptime (20hrs downtime)

Micro Services

Actual Scale

~2 Billion Edge Requests per day

Results in ~20 Billion Fan out requests to

~100 different MicroServices

Fault Tolerant Arch

Depedency Isolation

Aggressive timeouts

Circuit breakers

MicroServices Container

Synchronous Asynchronous

Tomcat RxNetty (UDP TCP WebSockets SSE)

ThreadPool

(1 thread per request)

EventLoops

MicroServices Container

Rx

ease async programming

avoid callback hell

Netty to leverage EventLoop

Rx + Netty RxNetty

* Courtsey Brendan Gregg

AWS Maint

@Netflix Season-1

Media Cloud Engineering

Encoding PaaS

Master - Worker Pattern

Decoupled by Priority Queues with message lease

State in Cassandra

Oracle >> Cassandra

Data Model & Lack of ACID

Client Cluster Symbiosis

Embrace Eventual Consistency

Data Migration

Shadow Write / Reads

Object To Cassandra Mapping/** * @author mdaxini */@CColumnFamily(name = “Sequence", shared = true) @Audited(columnFamily = "sequence_audit") public class SequenceBean { @CId(name = "id") private String sequenceName; @CColumn(name = "sequenceValue") private Long sequenceValue; @CColumn(name = "updated") @TemporalAutoUpdate @JsonProperty("updated") private Date updated;

Object To Cassandra Mapping@JsonAutoDetect(JsonMethod.NONE) @JsonIgnoreProperties(ignoreUnknown = true) !@CColumnFamily(name = "task") public class Job { @CId private JobKey jobKey;

public final class TaskKey { @CId(order = 0) private Long packageId; @CId(order = 1) private UUID taskId;

Priority-Scheduling Queue

Evolution:

One SQS Queue per priority range

Store and forward (rate-adaptive) to SQS Queue

Rule based priority, leases, RDBMS based with prefetch

Encoding PaaS Farm

One command deployment and upgrade

Self Serve

Homogeneous View of Windows and Linux

Pioneered Ubuntu - production since 2011

Innovate Fast Build for Pragmatic Scale

Innovate for Business Standardize Later*

@Netflix Season-2

Cloud Database Engineering

[CDE]

Platform Big Data/Caching & Services

Cassandra Astyanax Priam

CassJMeter Hadoop Platform As a Service

Genie

Lipstick

Adapted from a slide by @stonse

Caching

Invi

so*

CDE Charter

Spark*

Solr*

* Under Construction

Dynomite*

Redis

ElasticSearch

Cassandra (1.2.x >> 2.0.x)

Priam

Astyanax

Skynet*

All OLTP Data in Cassandra

!

Almost!

Cassandra Prod Footprint

90+ Clusters

2700+ Nodes

4 Datacenters (Amazon Regions)

>1 Trillion operations per day

Cassandra Best Practices* Usage

*Practices I have found useful, YMMV

Use RandomPartitioner

Have at least 3 replicas (quorum)

Same number of replicas - simpler operations

!

!

create keyspace oracle with placement_strategy = 'NetworkTopologyStrategy' and strategy_options = {us-west-2 : 3, us-east : 3}

Move to CQL3 from thrift

Codifies best practices

Leverage Collections (albeit restricted cardinality)

Use Key Caching

As a default turn off Row Caching

Rename all composite columns in one ALTER TABLE statement.

Watch length of column names

Use “COMPACT STORAGE” wisely

Cannot use collections - depends on CompositeType

Non compact storage uses 2 bytes per internal cell, but preferred.

!

!

* Image courtsey Datastax blog

cqlsh:test> SELECT * FROM events; key | column1 | column2 | value --------+---------+---------+--------- tbomba | 4 | 120 | event 1 tbomba | 4 | 2500 | event 2 tbomba | 9 | 521 | event 3 tbomba | 10 | 3525 | event 4

* Courtsey Datastax blog

CREATE TABLE events ( key text, column1 int, column2 int, value text, PRIMARY KEY(key, column1, column2) ) WITH COMPACT STORAGE

Prefer CL_ONE

data replication within 500ms across the region

Using quorum reads and writes, then set read_repair_chance to 0.0 or very low value.

Make sure repairs are run often

Eventual Consistency does not mean hopeful consistency

Avoid secondary indexes for high cardinality values

Most cases we set gc_grace_seconds = 10 days

Avoid hot rows

detect using node level latency metrics

Avoid heavy rows

Avoid too wide rows (< 100K columns if smaller)

Don’t use C* as a Queue

Tombstones will bite you

SizeTieredCompactionStrategy

write heavy workload

non-predictable I/O, 2x disk space

LeveledCompactionStrategy

read heavy work loads

predictable I/O, 2x STCS

LeveledCompactionStrategy

SizeTieredCompactionStrategy

* Image courtsey Datastax blog

Guesstimate and then validate sstable_size_in_mb

Hint: based on write rate and size

160mb for LeveledCompactionStrategy

SizeTieredCompactionStrategy - C* default 50mb

Atomic batches

no isolation, only atomic for row within partition key

no automatic rollback

Lightweight transactions

Cassandra Best Practices Operations

*Practices we have found useful, YMMV

If your C* clusters footprint is significant

must have good automation

at least a C* semi-expert

Use cstar_perf to validate your initial clusters

We don’t use vnodes

On each node size disk to have 2x of expected data - ephemeral ssds no ebs

Monitoring and alerting

read write latency - co-ordinator & node level

Compaction stats

Heap Usage

Network

Max & Min Row sizes

Fixed tokens, double the cluster to expand

Important to size the cluster for app needs initially

benefits of fixed tokens outweighs vnodes

Take back up of all the nodes

to allow for eventual consistency on restores

Note: commitlog by default fsync only ever 10 seconds

Run repairs before GCGraceSeconds expires

Throttle compactions and repairs

Repairs can take a long time

run a primary range and a Keyspace at a time to avoid performance impact.

Schema disagreements - pick the nodes with the older date and restart them one at time.

nodetool reset local schema not persistent on 1.2

Recyle nodes in aws to prevent staleness

Expanding to new region

Launch nodes in new region without bootstrapping

Change Keyspace replication

Run nodetool rebuild on nodes in new region.

More Info

http://techblog.netflix.com/

http://netflix.github.io/

http://slideshare.net/netflix

https://www.youtube.com/user/NetflixOpenSource

https://www.youtube.com/user/NetflixIR $$$

??