Cassandra Fundamentals - C* 2.0
-
Upload
russell-spitzer -
Category
Data & Analytics
-
view
117 -
download
0
description
Transcript of Cassandra Fundamentals - C* 2.0
Apache Cassandra Fundamentals
or: How I stopped worrying and learned to love the CAP theorem
Russell Spitzer @RussSpitzer
Software Engineer in Test at DataStax
Who am I?• Former Bioinformatics Student
at UCSF
• Work on the integration of Cassandra (C*) with Hadoop, Solr, and Redacted!
• I Spend a lot of time spinning up clusters on EC2, GCE, Azure, …http://www.datastax.com/dev/blog/testing-cassandra-1000-nodes-at-a-time
• Developing new ways to make sure that C* Scales
Apache Cassandra is a Linearly Scaling and Fault Tolerant noSQL Database
Linearly Scaling: The power of the database increases linearly with the number of machines 2x machines = 2x throughput
http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html
Fault Tolerant: Nodes down != Database Down Datacenter down != Database Down
CAP Theorem Limits What Distributed Systems can do
ConsistencyWhen I ask the same question to any part of the system I should get the same answer
How many planes do we have?
CAP Theorem Limits What Distributed Systems can do
ConsistencyWhen I ask the same question to any part of the system I should get the same answer
How many planes do we have?
1 1 1 1 1 1 1
Consistent
CAP Theorem Limits What Distributed Systems can do
ConsistencyWhen I ask the same question to any part of the system I should get the same answer
How many planes do we have?
1 4 1 2 1 8 1
Not Consistent
CAP Theorem Limits What Distributed Systems can do
When I ask a question I will get an answer
How many planes do we have?
1zzzzz *snort* zzz
Available
Availability
CAP Theorem Limits What Distributed Systems can doAvailability
When I ask a question I will get an answer
How many planes do we have?
I have to wait for major snooze to wake upzzzzz *snort* zzz
Not Available
CAP Theorem Limits What Distributed Systems can do
Partition ToleranceI can ask questions even when the system is having intra-system communication
problems
How many planes do we have?
Team Edward Team Jacob
1
Tolerant
CAP Theorem Limits What Distributed Systems can do
Partition ToleranceI can ask questions even when the system is having intra-system communication
problems
How many planes do we have?
Team Edward Team JacobI’m not sure without asking those
vampire lovers and we aren’t speaking
Not Tolerant
Cassandra is an AP System which is Eventually Consistent
I don’t know without asking those vampire lovers and we aren’t speaking
How many planes do we have? How many planes do we have?
1 1 1 1 1 1
I just heard !we actually !
have 2 2 2 2 2 2 2 2
Eventually consistent: New information will make it to everyone eventually
Two knobs control fault tolerance in C*: Replication and Consistency Level
Server Side - Replication: How many copies of a data should exist in the cluster?
ACD
ABCABD
BCD
RF=3
Client
SimpleStrategy: Replicas NetworkTopologyStrategy: Replicas per Datacenter
Coordinator for this operation
Two knobs control fault tolerance in C*: Replication and Consistency Level
Client Side - Consistency Level: How many replicas should we check before acknowledgment?
ACD
ABCABD
BCD
ClientClient
Coordinator for this operation
CL = One
Two knobs control fault tolerance in C*: Replication and Consistency Level
Client Side - Consistency Level: How many replicas should we check before acknowledgment?
ACD
ABCABD
BCD
CL = Quorum
ClientClient
Coordinator for this operation
Nodes own data whose primary key hashes to their their token ranges
ACD
ABCABD
BCD
Every piece of data belongs on the node who owns the
Murmur3(2.0) Hash of its partition key + (RF-1) other
nodes
ID: ICBM_432 Loc: SF , Status: Idle
Partition Key Rest of Data
Time: 30
Clustering Key
Murmur3Hash
ID: ICBM_432
Murmur3: A
Cassandra writes are FAST due to log-append storage
Par ReClu Memory
Par ReClu
Par ReClu
Par ReClu
Commit Log
Memtable Memtable
SSTable SSTable
FlushedDisk
Memtable
Deletes in a distributed System are Challenging
We need to keep records of deletions in case of network partitions
Node1
Node2 Power Outage
Time
Tombstone Tombstone
Tombstone
Compactions merge and unify data in our stables
SSTable1
SSTable2+ SSTable
3
Since SSTables are immutable this is our chance to consolidate rows and remove tombstones (After GC Grace)
Layout of Data Allows for Rapid Queries Along Clustering Columns
ID: ICBM_432
ID: ICBM_9210
ID: ICBM_900
Time: 30
Loc: SF
Status: Idle
Disclaimer: Not exactly like this (Use sstable2json to see real layout)
Time: 45
Loc: SF
Status: Idle
Time: 60
Loc: SF
Status: Idle
Time: 30
Loc: Boston
Status: Idle
Time: 45
Loc: Boston
Status: Idle
Time: 60
Loc: Boston
Status: Idle
Time: 30
Loc: Tulsa
Status: Idle
Time: 45
Loc: Tulsa
Status: Idle
Time: 60
Loc: Tulsa
Status: Idle
CQL allows easy definition of Table Structures
ID: ICBM_432 Time: 30
Loc: SF
Status: Idle
Time: 45
Loc: SF
Status: Idle
Time: 60
Loc: SF
Status: Idle
CREATE TABLE icbmlog ( name text, time timestamp, location text, status text, PRIMARY KEY (name,time) );
Reading data is FAST but limited by disk IO
Memory
Par ReClu
Par ReClu
Par ReClu
Commit Log
Memtable Memtable
SSTable SSTable
Disk
Memtable
Client
Par ReClu
Replica
LWWPar ReClu
Reading data is FAST but limited by disk IO
Memory
Par ReClu
Par ReClu
Par ReClu
Commit Log
Memtable Memtable
SSTable SSTable
Disk
Memtable
Client
Par ReClu
Replica
LWWPar ReClu
Read Repair
New Clients provide a holistic view of the C* cluster
Client
ACD
ABCABD
BCD
Initial Contact
Cluster.builder().addContactPoint("127.0.0.1").build()
Session Objects Are used for Executing Requests
session = cluster.connect() session.execute("DROP KEYSPACE IF EXISTS icbmkey") session.execute("CREATE KEYSPACE icbmkey with replication = {'class':'SimpleStrategy','replication_factor':'1'}")
For highest throughput use asynchronous methodsResultSetFuture executeAsync(Query query)
Then add a callback or Queue the ResultSetFutures
ResultSetFuture
ResultSetFuture
ResultSetFuture
Token Aware Policies allow the reduction in the number of intra-network requests
made
Client
ACD
ABCABD
BCD
A
Prepared statements allow for sending less data over the wire
Prepared batch statements can further improve throughput
PreparedStatement ps = session.prepare("INSERT INTO messages (user_id, msg_id, title, body) VALUES (?, ?, ?, ?)"); BatchStatement batch = new BatchStatement(); batch.add(ps.bind(uid, mid1, title1, body1)); batch.add(ps.bind(uid, mid2, title2, body2)); batch.add(ps.bind(uid, mid3, title3, body3)); session.execute(batch);
Query is prepared on all nodes by driver
Avoid• Preparing statements more than once • Creating batches which are too large • Running statements in serial • Using consistency-levels above your need • Secondary Indexes in your main queries
• or really at all unless you are doing analytics
Have fun with C*
Questions?