Cassandra Fundamentals - C* 2.0

28
Apache Cassandra Fundamentals or: How I stopped worrying and learned to love the CAP theorem Russell Spitzer @RussSpitzer Software Engineer in Test at DataStax

description

An overview of the basic concepts in CAP and how they relate to Apache Cassandra.

Transcript of Cassandra Fundamentals - C* 2.0

Page 1: Cassandra Fundamentals - C* 2.0

Apache Cassandra Fundamentals

or: How I stopped worrying and learned to love the CAP theorem

Russell Spitzer @RussSpitzer

Software Engineer in Test at DataStax

Page 2: Cassandra Fundamentals - C* 2.0

Who am I?• Former Bioinformatics Student

at UCSF

• Work on the integration of Cassandra (C*) with Hadoop, Solr, and Redacted!

• I Spend a lot of time spinning up clusters on EC2, GCE, Azure, …http://www.datastax.com/dev/blog/testing-cassandra-1000-nodes-at-a-time

• Developing new ways to make sure that C* Scales

Page 3: Cassandra Fundamentals - C* 2.0

Apache Cassandra is a Linearly Scaling and Fault Tolerant noSQL Database

Linearly Scaling: The power of the database increases linearly with the number of machines 2x machines = 2x throughput

http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html

Fault Tolerant: Nodes down != Database Down Datacenter down != Database Down

Page 4: Cassandra Fundamentals - C* 2.0

CAP Theorem Limits What Distributed Systems can do

ConsistencyWhen I ask the same question to any part of the system I should get the same answer

How many planes do we have?

Page 5: Cassandra Fundamentals - C* 2.0

CAP Theorem Limits What Distributed Systems can do

ConsistencyWhen I ask the same question to any part of the system I should get the same answer

How many planes do we have?

1 1 1 1 1 1 1

Consistent

Page 6: Cassandra Fundamentals - C* 2.0

CAP Theorem Limits What Distributed Systems can do

ConsistencyWhen I ask the same question to any part of the system I should get the same answer

How many planes do we have?

1 4 1 2 1 8 1

Not Consistent

Page 7: Cassandra Fundamentals - C* 2.0

CAP Theorem Limits What Distributed Systems can do

When I ask a question I will get an answer

How many planes do we have?

1zzzzz *snort* zzz

Available

Availability

Page 8: Cassandra Fundamentals - C* 2.0

CAP Theorem Limits What Distributed Systems can doAvailability

When I ask a question I will get an answer

How many planes do we have?

I have to wait for major snooze to wake upzzzzz *snort* zzz

Not Available

Page 9: Cassandra Fundamentals - C* 2.0

CAP Theorem Limits What Distributed Systems can do

Partition ToleranceI can ask questions even when the system is having intra-system communication

problems

How many planes do we have?

Team Edward Team Jacob

1

Tolerant

Page 10: Cassandra Fundamentals - C* 2.0

CAP Theorem Limits What Distributed Systems can do

Partition ToleranceI can ask questions even when the system is having intra-system communication

problems

How many planes do we have?

Team Edward Team JacobI’m not sure without asking those

vampire lovers and we aren’t speaking

Not Tolerant

Page 11: Cassandra Fundamentals - C* 2.0

Cassandra is an AP System which is Eventually Consistent

I don’t know without asking those vampire lovers and we aren’t speaking

How many planes do we have? How many planes do we have?

1 1 1 1 1 1

I just heard !we actually !

have 2 2 2 2 2 2 2 2

Eventually consistent: New information will make it to everyone eventually

Page 12: Cassandra Fundamentals - C* 2.0

Two knobs control fault tolerance in C*: Replication and Consistency Level

Server Side - Replication: How many copies of a data should exist in the cluster?

ACD

ABCABD

BCD

RF=3

Client

SimpleStrategy: Replicas NetworkTopologyStrategy: Replicas per Datacenter

Coordinator for this operation

Page 13: Cassandra Fundamentals - C* 2.0

Two knobs control fault tolerance in C*: Replication and Consistency Level

Client Side - Consistency Level: How many replicas should we check before acknowledgment?

ACD

ABCABD

BCD

ClientClient

Coordinator for this operation

CL = One

Page 14: Cassandra Fundamentals - C* 2.0

Two knobs control fault tolerance in C*: Replication and Consistency Level

Client Side - Consistency Level: How many replicas should we check before acknowledgment?

ACD

ABCABD

BCD

CL = Quorum

ClientClient

Coordinator for this operation

Page 15: Cassandra Fundamentals - C* 2.0

Nodes own data whose primary key hashes to their their token ranges

ACD

ABCABD

BCD

Every piece of data belongs on the node who owns the

Murmur3(2.0) Hash of its partition key + (RF-1) other

nodes

ID: ICBM_432 Loc: SF , Status: Idle

Partition Key Rest of Data

Time: 30

Clustering Key

Murmur3Hash

ID: ICBM_432

Murmur3: A

Page 16: Cassandra Fundamentals - C* 2.0

Cassandra writes are FAST due to log-append storage

Par ReClu Memory

Par ReClu

Par ReClu

Par ReClu

Commit Log

Memtable Memtable

SSTable SSTable

FlushedDisk

Memtable

Page 17: Cassandra Fundamentals - C* 2.0

Deletes in a distributed System are Challenging

We need to keep records of deletions in case of network partitions

Node1

Node2 Power Outage

Time

Tombstone Tombstone

Tombstone

Page 18: Cassandra Fundamentals - C* 2.0

Compactions merge and unify data in our stables

SSTable1

SSTable2+ SSTable

3

Since SSTables are immutable this is our chance to consolidate rows and remove tombstones (After GC Grace)

Page 19: Cassandra Fundamentals - C* 2.0

Layout of Data Allows for Rapid Queries Along Clustering Columns

ID: ICBM_432

ID: ICBM_9210

ID: ICBM_900

Time: 30

Loc: SF

Status: Idle

Disclaimer: Not exactly like this (Use sstable2json to see real layout)

Time: 45

Loc: SF

Status: Idle

Time: 60

Loc: SF

Status: Idle

Time: 30

Loc: Boston

Status: Idle

Time: 45

Loc: Boston

Status: Idle

Time: 60

Loc: Boston

Status: Idle

Time: 30

Loc: Tulsa

Status: Idle

Time: 45

Loc: Tulsa

Status: Idle

Time: 60

Loc: Tulsa

Status: Idle

Page 20: Cassandra Fundamentals - C* 2.0

CQL allows easy definition of Table Structures

ID: ICBM_432 Time: 30

Loc: SF

Status: Idle

Time: 45

Loc: SF

Status: Idle

Time: 60

Loc: SF

Status: Idle

CREATE TABLE icbmlog ( name text, time timestamp, location text, status text, PRIMARY KEY (name,time) );

Page 21: Cassandra Fundamentals - C* 2.0

Reading data is FAST but limited by disk IO

Memory

Par ReClu

Par ReClu

Par ReClu

Commit Log

Memtable Memtable

SSTable SSTable

Disk

Memtable

Client

Par ReClu

Replica

LWWPar ReClu

Page 22: Cassandra Fundamentals - C* 2.0

Reading data is FAST but limited by disk IO

Memory

Par ReClu

Par ReClu

Par ReClu

Commit Log

Memtable Memtable

SSTable SSTable

Disk

Memtable

Client

Par ReClu

Replica

LWWPar ReClu

Read Repair

Page 23: Cassandra Fundamentals - C* 2.0

New Clients provide a holistic view of the C* cluster

Client

ACD

ABCABD

BCD

Initial Contact

Cluster.builder().addContactPoint("127.0.0.1").build()

Page 24: Cassandra Fundamentals - C* 2.0

Session Objects Are used for Executing Requests

session = cluster.connect() session.execute("DROP KEYSPACE IF EXISTS icbmkey") session.execute("CREATE KEYSPACE icbmkey with replication = {'class':'SimpleStrategy','replication_factor':'1'}")

For highest throughput use asynchronous methodsResultSetFuture executeAsync(Query query)

Then add a callback or Queue the ResultSetFutures

ResultSetFuture

ResultSetFuture

ResultSetFuture

Page 25: Cassandra Fundamentals - C* 2.0

Token Aware Policies allow the reduction in the number of intra-network requests

made

Client

ACD

ABCABD

BCD

A

Page 26: Cassandra Fundamentals - C* 2.0

Prepared statements allow for sending less data over the wire

Prepared batch statements can further improve throughput

PreparedStatement ps = session.prepare("INSERT INTO messages (user_id, msg_id, title, body) VALUES (?, ?, ?, ?)"); BatchStatement batch = new BatchStatement(); batch.add(ps.bind(uid, mid1, title1, body1)); batch.add(ps.bind(uid, mid2, title2, body2)); batch.add(ps.bind(uid, mid3, title3, body3)); session.execute(batch);

Query is prepared on all nodes by driver

Page 27: Cassandra Fundamentals - C* 2.0

Avoid• Preparing statements more than once • Creating batches which are too large • Running statements in serial • Using consistency-levels above your need • Secondary Indexes in your main queries

• or really at all unless you are doing analytics

Page 28: Cassandra Fundamentals - C* 2.0

Have fun with C*

Questions?