Apache Cassandra in Bangalore - Cassandra Internals and Performance
Cassandra 101
-
Upload
nader-ganayem -
Category
Technology
-
view
1.862 -
download
0
description
Transcript of Cassandra 101
![Page 1: Cassandra 101](https://reader033.fdocuments.in/reader033/viewer/2022052523/55509a2db4c9058b208b486f/html5/thumbnails/1.jpg)
Cassandra 101Introduction to Apache Cassandra
![Page 2: Cassandra 101](https://reader033.fdocuments.in/reader033/viewer/2022052523/55509a2db4c9058b208b486f/html5/thumbnails/2.jpg)
What is Cassandra?● A distributed, columnar database● Data model inspired by Google BigTable (2006)● Distribution model inspired by Amazon Dynamo (2007)● Open Sourced by Facebook in 2008● Monolithic Kernel written in Java● Used by Digg, Facebook, Twitter, Reddit, Rackspace,
CloudKick and others
![Page 3: Cassandra 101](https://reader033.fdocuments.in/reader033/viewer/2022052523/55509a2db4c9058b208b486f/html5/thumbnails/3.jpg)
Etymology● In Greek mythology Cassandra (Also known as Alexandra) was
the daughter of King Priam and Queen Hecuba of Troy● Her beauty caused Apollo to grant her the gift of prophecy● When she did not return his love, Apollo placed a curse on her
so that no one would ever believe her predictions
![Page 4: Cassandra 101](https://reader033.fdocuments.in/reader033/viewer/2022052523/55509a2db4c9058b208b486f/html5/thumbnails/4.jpg)
Why Cassandra ?
● Minimal Administration● No Single Point of Failure● Scale Horizontally● Writes are durable● Optimized for writes● Consistency is flexible, can be updated
online● Schema is flexible, can be updated online● Handles failure gracefully● Replication is easy, Rack and DC aware
![Page 5: Cassandra 101](https://reader033.fdocuments.in/reader033/viewer/2022052523/55509a2db4c9058b208b486f/html5/thumbnails/5.jpg)
Commercial Support
![Page 6: Cassandra 101](https://reader033.fdocuments.in/reader033/viewer/2022052523/55509a2db4c9058b208b486f/html5/thumbnails/6.jpg)
Data Model
A Column is the basic unit consisting Key, Value and Timestamp
![Page 7: Cassandra 101](https://reader033.fdocuments.in/reader033/viewer/2022052523/55509a2db4c9058b208b486f/html5/thumbnails/7.jpg)
Data Model
A Column is the basic unit consisting Key, Value and Timestamp
![Page 8: Cassandra 101](https://reader033.fdocuments.in/reader033/viewer/2022052523/55509a2db4c9058b208b486f/html5/thumbnails/8.jpg)
RDBMS vs Cassandra
Map<RowKey, SortedMap<ColumnKey, ColumnValue>>
![Page 9: Cassandra 101](https://reader033.fdocuments.in/reader033/viewer/2022052523/55509a2db4c9058b208b486f/html5/thumbnails/9.jpg)
Cassandra is good at
Reading data from a row in the order it is stored, i.e. by Column Name!
Understand the queries you application requires before building the data model
![Page 10: Cassandra 101](https://reader033.fdocuments.in/reader033/viewer/2022052523/55509a2db4c9058b208b486f/html5/thumbnails/10.jpg)
Consistent HashingLoad Balancing in a changing world ...
● Evenly map keys to nodes● Minimize key movement when
nodes join or leave
![Page 11: Cassandra 101](https://reader033.fdocuments.in/reader033/viewer/2022052523/55509a2db4c9058b208b486f/html5/thumbnails/11.jpg)
The Partitioner:
● RandomPartitioner transforms Keys to Tokens using MD5
● In C* 1.2 the default hashing is Murmur3 algorithm
![Page 12: Cassandra 101](https://reader033.fdocuments.in/reader033/viewer/2022052523/55509a2db4c9058b208b486f/html5/thumbnails/12.jpg)
Keys and Tokens?
0 999010
‘fop’ ‘foo’
MD5 hashing for ‘fop’ is 89de73aaae8c956fb7c9379be7978e5b
MD5 hashing for ‘foo’ is d3b07384d113edec49eaa6238ad5ff00
![Page 13: Cassandra 101](https://reader033.fdocuments.in/reader033/viewer/2022052523/55509a2db4c9058b208b486f/html5/thumbnails/13.jpg)
Token Ring.
99 0
‘fop’ token: 10‘foo’
token: 90
![Page 14: Cassandra 101](https://reader033.fdocuments.in/reader033/viewer/2022052523/55509a2db4c9058b208b486f/html5/thumbnails/14.jpg)
Token Ranges (Pre 1.2)
Node 1token:0
76-0 1-25
26-5051-75
Node 2token:25
Node 3token:50
Node 4token:75
‘foo’ token 90
![Page 15: Cassandra 101](https://reader033.fdocuments.in/reader033/viewer/2022052523/55509a2db4c9058b208b486f/html5/thumbnails/15.jpg)
Token Ranges With Virtual Nodes in 1.2
Node 1
Node 2
Node 3
● Easier to Enlarge or shrink the cluster
● The cluster can grow in steps of 1 node
● Node Recovery is much more faster
![Page 16: Cassandra 101](https://reader033.fdocuments.in/reader033/viewer/2022052523/55509a2db4c9058b208b486f/html5/thumbnails/16.jpg)
Replication Strategy
Node 1token:0
76-0 1-25
26-5051-75
Node 2token:25
Node 3token:50
Node 4token:75
‘foo’ token 90
Selects Replication Factor number of nodes for a row.
![Page 17: Cassandra 101](https://reader033.fdocuments.in/reader033/viewer/2022052523/55509a2db4c9058b208b486f/html5/thumbnails/17.jpg)
Replication Strategy
Node 1token:0
76-0 1-25
26-5051-75
Node 2token:25
Node 3token:50
Node 4token:75
‘foo’ token 90
SimpleStrategy with RF 3
![Page 18: Cassandra 101](https://reader033.fdocuments.in/reader033/viewer/2022052523/55509a2db4c9058b208b486f/html5/thumbnails/18.jpg)
Replication Strategy
Node 1token:0
76-0 1-25
26-5051-75
Node 2token:25
Node 3token:50
Node 4token:75
‘foo’ token 90
NetworkTopolgyStrategy Uses Replication Factor per Data Center
Node 1token:0
76-0 1-25
26-5051-75
Node 2token:25
Node 3token:50
Node 4token:75
‘foo’ token 90
EAST WEST
![Page 19: Cassandra 101](https://reader033.fdocuments.in/reader033/viewer/2022052523/55509a2db4c9058b208b486f/html5/thumbnails/19.jpg)
SimpleSnitch
Places all nodes in the same DC & RACK (Default)
![Page 20: Cassandra 101](https://reader033.fdocuments.in/reader033/viewer/2022052523/55509a2db4c9058b208b486f/html5/thumbnails/20.jpg)
EC2Snitch/EC2MultiRegionSnitch
DC is set to AWS Region and a Rack to Availability Zone
![Page 21: Cassandra 101](https://reader033.fdocuments.in/reader033/viewer/2022052523/55509a2db4c9058b208b486f/html5/thumbnails/21.jpg)
PropertyFileSnitch
Nodes DC and Racks are maintained in a property file
![Page 22: Cassandra 101](https://reader033.fdocuments.in/reader033/viewer/2022052523/55509a2db4c9058b208b486f/html5/thumbnails/22.jpg)
GossipPropertyFileSnitch
Uses GOSSIP as first source for node info and if not available it uses the property file
![Page 23: Cassandra 101](https://reader033.fdocuments.in/reader033/viewer/2022052523/55509a2db4c9058b208b486f/html5/thumbnails/23.jpg)
The Client and the Coordinator
Node 1
Node 3
Node 4
Node 2
‘foo’ token 90
Client
![Page 24: Cassandra 101](https://reader033.fdocuments.in/reader033/viewer/2022052523/55509a2db4c9058b208b486f/html5/thumbnails/24.jpg)
Multi DC Client and Coordinator
Node 1
Node 3
Node 4
Node 2
‘foo’ token 90
Client
Node 10
Node 20
![Page 25: Cassandra 101](https://reader033.fdocuments.in/reader033/viewer/2022052523/55509a2db4c9058b208b486f/html5/thumbnails/25.jpg)
GossipNodes share information with small number of neighbours, who share information with other small number of neighbours …● Used for intra-cluster
communication● Routes client requests● Detects nodes failure ● Peers are called by seeds in
config file.
![Page 26: Cassandra 101](https://reader033.fdocuments.in/reader033/viewer/2022052523/55509a2db4c9058b208b486f/html5/thumbnails/26.jpg)
Cassandra Objects
● CommitLog● MemTable● SSTable● Index● Bloom Filter
![Page 27: Cassandra 101](https://reader033.fdocuments.in/reader033/viewer/2022052523/55509a2db4c9058b208b486f/html5/thumbnails/27.jpg)
Consistency● CAP theorem
○ Trade consistency for availability○ Consistency is a choice
* it doesn't matter if you are good at somethings long as you are consistent.
Partition
Consistency
Availability
OR
![Page 28: Cassandra 101](https://reader033.fdocuments.in/reader033/viewer/2022052523/55509a2db4c9058b208b486f/html5/thumbnails/28.jpg)
Level Description
ZERO Cross fingers
ANY 1st to Respond (HH)
ONE, TWO, THREE 1st to Respond
QUORUM N/2+1 replicas
ALL All replicas
WRITELevel Description
ZERO N/A
ANY N/A
ONE, TWO, THREE nth to Respond
QUORUM* N/2+1
ALL All replicas
READ
Consistency Level
● Specifies for each request● Number of nodes to wait for
* QUORUM, LOCAL_QUORUM, EACH_QUOROM
![Page 29: Cassandra 101](https://reader033.fdocuments.in/reader033/viewer/2022052523/55509a2db4c9058b208b486f/html5/thumbnails/29.jpg)
Write ‘foo’ at Quorum with Hinted Handoff
Node 1
Node 3 is Down
Node 4 holds ‘foo’ for node 3
Node 2
‘foo’ token 90
Client
![Page 30: Cassandra 101](https://reader033.fdocuments.in/reader033/viewer/2022052523/55509a2db4c9058b208b486f/html5/thumbnails/30.jpg)
Read ‘foo’ at Quorum
Node 1
Node 3 is Down
Node 4 holds ‘foo’ for node 3
Node 2
‘foo’ token 90
Client
![Page 31: Cassandra 101](https://reader033.fdocuments.in/reader033/viewer/2022052523/55509a2db4c9058b208b486f/html5/thumbnails/31.jpg)
Are used to resolve differences● Stored for each Column Value● 64bit Integers
Column Node 1 Node 2 Node 3
Vegetable ‘cucumber’ (timestamp 10)
‘cucumber’ (timestamp 10)
<missing>
Fruit ‘Apple’(timestamp 10)
‘banana’(timestamp 15)
‘Apple’(timestamp 10)
Column TimeStamps
![Page 32: Cassandra 101](https://reader033.fdocuments.in/reader033/viewer/2022052523/55509a2db4c9058b208b486f/html5/thumbnails/32.jpg)
Strong Consistency
W + R > N
#Write Nodes + #Read Nodes > Replication Factor
● QUORUM Read + QUORUM Write● ALL Read + ONE Write● ONE Read + ALL Write
![Page 33: Cassandra 101](https://reader033.fdocuments.in/reader033/viewer/2022052523/55509a2db4c9058b208b486f/html5/thumbnails/33.jpg)
Achieving Consistency
● Consistency Level● Hinted Handoff● Read Repair● Anti Entropy (User triggered Repairs)
![Page 34: Cassandra 101](https://reader033.fdocuments.in/reader033/viewer/2022052523/55509a2db4c9058b208b486f/html5/thumbnails/34.jpg)
Write Path
● Append to Commit Log File● Merge Columns into Memtable● Asynchronously flush Memtabe to a
new file (Never update existing files)● Data is stored in immutable files called
SSTables (Sorted String Tables)
![Page 35: Cassandra 101](https://reader033.fdocuments.in/reader033/viewer/2022052523/55509a2db4c9058b208b486f/html5/thumbnails/35.jpg)
SSTables Files
*-Data.db*-Index.db*-Filter.db
(And others)
![Page 36: Cassandra 101](https://reader033.fdocuments.in/reader033/viewer/2022052523/55509a2db4c9058b208b486f/html5/thumbnails/36.jpg)
Read Path
Bloom Filter (cache)
Index/Key Cache
Memory
SStable-1.Data.dbfoo:fruit (ts:10)
applevegetable (ts:15)
cucumber….….….
SSTable-1-Index.db
Disk
Bloom Filter (cache)
Index/Key Cache
SStable-2.Data.dbfoo:fruit (ts:10)
applevegetable (ts:10)
Pepper….….….
SSTable-2-Index.db
Bloom Filter Bloom Filter
![Page 37: Cassandra 101](https://reader033.fdocuments.in/reader033/viewer/2022052523/55509a2db4c9058b208b486f/html5/thumbnails/37.jpg)
CompactionsCompactions merges truth from multiple SSTables into one SSTable with the same
truth
(Manual and continuous background process)
Column SSTable 1 SStable 2 New
Vegetable ‘cucumber’ (timestamp 10)
‘cucumber’ (timestamp 10)
‘cucumber’ (timestamp 10)
Fruit ‘Apple’(timestamp 10)
<tombstone>(timestamp 15)
<tombstone>(timestamp: 15)
![Page 38: Cassandra 101](https://reader033.fdocuments.in/reader033/viewer/2022052523/55509a2db4c9058b208b486f/html5/thumbnails/38.jpg)
Writes and Reads
![Page 39: Cassandra 101](https://reader033.fdocuments.in/reader033/viewer/2022052523/55509a2db4c9058b208b486f/html5/thumbnails/39.jpg)
Managing Cassandra
● Single configuration file /etc/cassandra/cassandra.yaml file
● Single control command /usr/bin/nodetool
● Monitoring done by DataStax OpsCenter
![Page 40: Cassandra 101](https://reader033.fdocuments.in/reader033/viewer/2022052523/55509a2db4c9058b208b486f/html5/thumbnails/40.jpg)
Troubleshooting CassandraAlways inspect these files:
● /var/log/cassandra/cassandra.log (Startup)● /var/log/cassandra/system.log (Normal work)
![Page 41: Cassandra 101](https://reader033.fdocuments.in/reader033/viewer/2022052523/55509a2db4c9058b208b486f/html5/thumbnails/41.jpg)
Backup
Use Cassandra snapshots...
And God said to Noah, Noah make me a backup ... 'cause I shall format
![Page 42: Cassandra 101](https://reader033.fdocuments.in/reader033/viewer/2022052523/55509a2db4c9058b208b486f/html5/thumbnails/42.jpg)
Client (API) Choices● Thrift, original and still fully supported API:
○ JAVA: Thrift, Hector, Astyanax, DataStax Driver, Cundera…○ Python: Pycassa, Telephus, …○ Ruby: Fauna○ PHP: PHP Client Library○ C#○ Node.JS○ GO○ SImba ODBC○ C++: LibQtCassandra○ ORM○ ….
● CQL3: A Table oriented, Schema Driven, Data Model and Similar to SQL
![Page 43: Cassandra 101](https://reader033.fdocuments.in/reader033/viewer/2022052523/55509a2db4c9058b208b486f/html5/thumbnails/43.jpg)
CQL3 Create KeySpace
● Using CQL3 via cqlsh command tool ($CASSANDRA_HOME/bin/cqlsh):● Create a new Keyspace with Replication factor of 3 and NetworkTopology
CREATE KEYSPACEkenshoo_cass_fans
WITH replication = {‘class’:’NetworkTopologyStrategy’, ‘us_east_dc’:3};
![Page 44: Cassandra 101](https://reader033.fdocuments.in/reader033/viewer/2022052523/55509a2db4c9058b208b486f/html5/thumbnails/44.jpg)
CQL3 Working with Tables● CQL3 Example● Table is a sparse collection of well known ordered columns
CREATE TABLE User(
user_name text,password text,real_name text,PRIMARY KEY (user_name)
);---------------------------------------------------------INSERT INTO User
(user_name, password, real_name)VALUES
(‘nader’,’sekr8t’,’MR NADER’);---------------------------------------------------------
SELECT * From User where user_name = ‘NADER’;
user_name| password | real_name---------+----------+-----------
nader| sekr8t | MR NADER