Cassandra introduction at FinishJUG
-
Upload
duyhai-doan -
Category
Technology
-
view
127 -
download
1
Transcript of Cassandra introduction at FinishJUG
@doanduyhai
Who Am I ?!Duy Hai DOAN Cassandra technical advocate • talks, meetups, confs • open-source devs (Achilles, …) • OSS Cassandra point of contact
☞ [email protected] ☞ @doanduyhai
2
@doanduyhai
Datastax!• Founded in April 2010
• We contribute a lot to Apache Cassandra™
• 400+ customers (25 of the Fortune 100), 200+ employees
• Headquarter in San Francisco Bay area
• EU headquarter in London, offices in France and Germany
• Datastax Enterprise = OSS Cassandra + extra features
3
@doanduyhai
Agenda!Architecture • Cluster, Replication, Consistency
Data model • Last Write Win (LWW), CQL basics, From SQL to CQL,
Lightweight Transaction
DSE
Use Cases
4
@doanduyhai
Cassandra history!NoSQL database • created at Facebook • open-sourced since 2008 • current version = 2.1 • column-oriented ☞ distributed table
5
@doanduyhai
Cassandra 5 key facts!Key fact 1: linear scalability
C*
C* C*
NetcoSports 3 nodes, ≈3GB
1k+ nodes, PB+
YOU
6
@doanduyhai
Cassandra 5 key facts!Key fact 2: continuous availability (≈100% up-time) • resilient architecture (Dynamo)
7
@doanduyhai
Cassandra 5 key facts!Key fact 3: multi-data centers • out-of-the-box (config only) • AWS conf for multi-region DCs • GCE/CloudStack support • Microsoft Azure
8
@doanduyhai
Multi-DC usages!
New York (DC1) London (DC2)
Data-locality, disaster recovery
n2
n3
n4
n5
n6
n7
n8
n1
n2
n3
n4 n5
n1
Async replication
9
@doanduyhai
Multi-DC usages!Workload segregation/virtual DC
n2
n3
n4
n5
n6
n7
n8
n1
n2
n3
n4 n5
n1
Production (Live)
Analytics (Spark/Hadoop)
Same room
Async replication
10
@doanduyhai
Multi-DC usages!Prod data copy for testing/benchmarking
n2
n3
n4
n5
n6
n7
n8
n1
n2
n3 n1
Use LOCAL
consistency
My tiny test cluster
Data copy
NEVER WRITE HERE !!!
11
@doanduyhai
Cassandra 5 key facts!Key fact 4: operational simplicity • 1 node = 1 process + 2 config file (main + IP) • deployment automation • OpsCenter for monitoring
12
@doanduyhai
Cassandra 5 key facts!Key fact 5: analytics combo • Cassandra + Spark = awesome ! • realtime streaming/analytics/aggregation …
14
@doanduyhai
Cassandra architecture!Cluster layer • Amazon DynamoDB paper • masterless architecture
Data-store layer • Google Big Table paper • Columns/columns family
16
@doanduyhai
Data distribution!Random: hash of #partition → token = hash(#p) Hash: ]-X, X] X = huge number (264/2)
n1
n2
n3
n4
n5
n6
n7
n8
17
@doanduyhai
Token Ranges!A: ]0, X/8] B: ] X/8, 2X/8] C: ] 2X/8, 3X/8] D: ] 3X/8, 4X/8] E: ] 4X/8, 5X/8] F: ] 5X/8, 6X/8] G: ] 6X/8, 7X/8] H: ] 7X/8, X]
n1
n2
n3
n4
n5
n6
n7
n8
A
B
C
D
E
F
G
H
18
@doanduyhai
Distributed Table!
n1
n2
n3
n4
n5
n6
n7
n8
A
B
C
D
E
F
G
H
user_id1
user_id2
user_id3
user_id4
user_id5
19
@doanduyhai
Distributed Table!
n1
n2
n3
n4
n5
n6
n7
n8
A
B
C
D
E
F
G
H
user_id1
user_id2
user_id3
user_id4
user_id5
20
@doanduyhai
Linear scalability!
n1
n2
n3
n4
n5
n6
n7
n8
n1
n2
n3 n4
n5
n6
n7
n8 n9
n10
8 nodes 10 nodes
21
@doanduyhai
Failure tolerance!Replication Factor (RF) = 3
n1
n2
n3
n4
n5
n6
n7
n8
1
2
3{B, A, H}
{C, B, A}
{D, C, B}
A
B
C
D
E
F
G
H
22
@doanduyhai
Coordinator node!Incoming requests (read/write) Coordinator node handles the request
Every node can be coordinator àmasterless
n1
n2
n3
n4
n5
n6
n7
n8
1
2
3
coordinator request
23
@doanduyhai
Consistency!Tunable at runtime • ONE • QUORUM (strict majority w.r.t. RF) • ALL Apply both to read & write
24
@doanduyhai
Consistency in action!RF = 3, Write ONE, Read ONE
B A A
B A A
Read ONE: A
data replication in progress …
Write ONE: B
25
@doanduyhai
Consistency in action!RF = 3, Write ONE, Read QUORUM
B A A
Write ONE: B
Read QUORUM: A
B A A
data replication in progress …
26
@doanduyhai
Consistency in action!RF = 3, Write ONE, Read ALL
B A A
Read ALL: B
B A A
data replication in progress …
Write ONE: B
27
@doanduyhai
Consistency in action!RF = 3, Write QUORUM, Read ONE
B B A
Write QUORUM: B
Read ONE: A
B B A
data replication in progress …
28
@doanduyhai
Consistency in action!RF = 3, Write QUORUM, Read QUORUM
B B A
Read QUORUM: B
B B A
data replication in progress …
Write QUORUM: B
29
@doanduyhai
Consistency summary!
ONERead + ONEWrite
☞ available for read/write even (N-1) replicas down
QUORUMRead + QUORUMWrite
☞ available for read/write even 1+ replica down 34
@doanduyhai
Cassandra Write Path!
Memory
Commit log1
. . .
1
Commit log2
Commit logn
MemTable Table1
MemTable Table2
MemTable TableN
2
. . .
38
@doanduyhai
Cassandra Write Path!
Commit log1
Commit log2
Commit logn
Table1
SSTable1
Table2 Table3
SSTable2 SSTable3 3
Memory
. . .
39
@doanduyhai
Cassandra Write Path!
Commit log1
Commit log2
Commit logn
Table1
SSTable1
Table2 Table3
SSTable2 SSTable3
Memory . . . MemTable Table1
MemTable Table2
MemTable TableN
. . .
40
@doanduyhai
Cassandra Write Path!
Commit log1
Commit log2
Commit logn
Table1
SSTable1
Table2 Table3
SSTable2 SSTable3
Memory
SSTable1
SSTable2
SSTable3 . . .
41
@doanduyhai
Last Write Win (LWW)!
jdoe age name
33 John DOE
INSERT INTO users(login, name, age) VALUES(‘jdoe’, ‘John DOE’, 33);
#partition
42
@doanduyhai
Last Write Win (LWW)!
jdoe age (t1) name (t1)
33 John DOE
INSERT INTO users(login, name, age) VALUES(‘jdoe’, ‘John DOE’, 33);
auto-generated timestamp
.
43
@doanduyhai
Last Write Win (LWW)!
UPDATE users SET age = 34 WHERE login = ‘jdoe’;
jdoe age (t1) name (t1)
33 John DOE jdoe
age (t2)
34
SSTable1 SSTable2
44
@doanduyhai
Last Write Win (LWW)!
DELETE age FROM users WHERE login = ‘jdoe’;
jdoe age (t3)
ý
tombstone
jdoe age (t1) name (t1)
33 John DOE jdoe
age (t2)
34
SSTable1 SSTable2 SSTable3
45
@doanduyhai
Last Write Win (LWW)!
SELECT age FROM users WHERE login = ‘jdoe’;
? ? ?
SSTable1 SSTable2 SSTable3
jdoe age (t3)
ý jdoe
age (t1) name (t1)
33 John DOE jdoe
age (t2)
34
46
@doanduyhai
Last Write Win (LWW)!
SELECT age FROM users WHERE login = ‘jdoe’;
✓ ✕ ✕
SSTable1 SSTable2 SSTable3
jdoe age (t3)
ý jdoe
age (t1) name (t1)
33 John DOE jdoe
age (t2)
34
47
@doanduyhai
Compaction!SSTable1 SSTable2 SSTable3
jdoe age (t3)
ý jdoe
age (t1) name (t1)
33 John DOE jdoe
age (t2)
34
New SSTable
jdoe age (t3) name (t1)
ý John DOE
48
@doanduyhai
Historical data!
history
id date1(t1) date2(t2) … date9(t9)
… … … …
SSTable1 SSTable2
You want to keep data history ? • do not use internal generated timestamp !!! • ☞ time-series data modeling
id date10(t10) date11(t11) … …
… … … …
49
@doanduyhai
CRUD operations!
INSERT INTO users(login, name, age) VALUES(‘jdoe’, ‘John DOE’, 33);
UPDATE users SET age = 34 WHERE login = ‘jdoe’;
DELETE age FROM users WHERE login = ‘jdoe’;
SELECT age FROM users WHERE login = ‘jdoe’;
50
@doanduyhai
Simple Table!
CREATE TABLE users ( login text, name text, age int, … PRIMARY KEY(login));
partition key (#partition)
51
@doanduyhai
What about joins ?!How can I join data between tables ? How can I model 1 – N relationships ? How to model a mailbox ?
Emails User 1 n
52
@doanduyhai
Clustered table (1 – N)!
CREATE TABLE mailbox ( login text, message_id timeuuid, interlocutor text, message text, PRIMARY KEY((login), message_id));
partition key clustering column (sorted)
unicity
53
@doanduyhai
SSTable2
SSTable1
On disk layout
jdoe message_id1 message_id2 … message_id104
… … … …
hsue message_id1 message_id2 … message_id78
… … … …
jdoe message_id105 message_id106 … message_id169
… … … …
54
@doanduyhai
Queries!Get message by user and message_id (date)
SELECT * FROM mailbox WHERE login = jdoe and message_id = ‘2014-09-25 16:00:00’;
Get message by user and date interval
SELECT * FROM mailbox WHERE login = jdoe and message_id <= ‘2014-09-25 16:00:00’ and message_id >= ‘2014-09-20 16:00:00’;
55
@doanduyhai
Queries!Get message by message_id only ?
SELECT * FROM mailbox WHERE message_id = ‘2014-09-25 16:00:00’;
Get message by date interval only ?
SELECT * FROM mailbox WHERE and message_id <= ‘2014-09-25 16:00:00’ and message_id >= ‘2014-09-20 16:00:00’;
❓
❓
56
@doanduyhai
Queries!Get message by message_id only (#partition not provided)
SELECT * FROM mailbox WHERE message_id = ‘2014-09-25 16:00:00’;
Get message by date interval only (#partition not provided)
SELECT * FROM mailbox WHERE and message_id <= ‘2014-09-25 16:00:00’ and message_id >= ‘2014-09-20 16:00:00’;
57
@doanduyhai
Without #partition
?
?
?
?
?
?
?
?
❓
❓
❓
❓
❓ ❓
❓
❓
No #partition ☞ no token ☞ where are my data ?
58
@doanduyhai
Queries!
SELECT * FROM mailbox WHERE login >= ‘hsue’ and login <= ‘jdoe’;
Get message by user range (range query on #partition)
SELECT * FROM mailbox WHERE login like ‘%doe%‘;
Get message by user pattern (non exact match on #partition)
61
@doanduyhai
WHERE clause restrictions!All queries (INSERT/UPDATE/DELETE/SELECT) must provide #partition
Only exact match (=) on #partition, range queries (<, ≤, >, ≥) not allowed • ☞ full cluster scan
On clustering columns, only range queries (<, ≤, >, ≥) and exact match
WHERE clause only possible • on columns defined in PRIMARY KEY • on indexed columns (⚠)
62
@doanduyhai
WHERE clause restrictions!What if I want to perform « arbitrary » WHERE clause ? • search form scenario, dynamic search fields
63
@doanduyhai
WHERE clause restrictions!What if I want to perform « arbitrary » WHERE clause ? • search form scenario, dynamic search fields DO NOT RE-INVENT THE WHEEL ! ☞ Apache Solr (Lucene) integration (Datastax Enterprise) ☞ Same JVM, 1-cluster-2-products (Solr & Cassandra)
64
@doanduyhai
WHERE clause restrictions!What if I want to perform « arbitrary » WHERE clause ? • search form scenario, dynamic search fields DO NOT RE-INVENT THE WHEEL ! ☞ Apache Solr (Lucene) integration (Datastax Enterprise) ☞ Same JVM, 1-cluster-2-products (Solr & Cassandra)
SELECT * FROM users WHERE solr_query = ‘age:[33 TO *] AND gender:male’;
SELECT * FROM users WHERE solr_query = ‘lastname:*schwei?er’;
65
@doanduyhai
Collections & maps!
CREATE TABLE users ( login text, name text, age int, friends set<text>, hobbies list<text>, languages map<int, text>, … PRIMARY KEY(login));
66
Keep the cardinality low ≈ 1000
@doanduyhai
User Defined Type (UDT)!
CREATE TABLE users ( login text, … street_number int, street_name text, postcode int, country text, … PRIMARY KEY(login));
Instead of
67
@doanduyhai
User Defined Type (UDT)!
CREATE TYPE address ( street_number int, street_name text, postcode int, country text);
CREATE TABLE users ( login text, … location frozen <address>, … PRIMARY KEY(login));
68
@doanduyhai
UDT insert! INSERT INTO users(login,name, location) VALUES ( ‘jdoe’, ’John DOE’, { ‘street_number’: 124, ‘street_name’: ‘Congress Avenue’, ‘postcode’: 95054, ‘country’: ‘USA’ });
69
@doanduyhai
UDT update!
UPDATE users set location = { ‘street_number’: 125, ‘street_name’: ‘Congress Avenue’, ‘postcode’: 95054, ‘country’: ‘USA’ } WHERE login = jdoe;
Can be nested ☞ store documents • but no dynamic fields (or use map<text, blob>)
70
@doanduyhai
From SQL to CQL!Normalized
Comment
User
1
n
CREATE TABLE comments ( article_id uuid, comment_id timeuuid, author_login text, // typical join id content text, PRIMARY KEY((article_id), comment_id));
71
@doanduyhai
From SQL to CQL 1 SELECT - 10 last comments - 10 author_login
What to do with 10 author_login ???
Comment
User
1
n
72
@doanduyhai
From SQL to CQL 1 SELECT - 10 last comments - 10 author_login
What to do with 10 author_login ??? 10 extra SELECT → N+1 SELECT problem !
Comment
User
1
n
73
@doanduyhai
From SQL to CQL!De-normalized
Comment
User
1
n
CREATE TABLE comments ( article_id uuid, comment_id timeuuid, author frozen<person>, // person is UDT content text, PRIMARY KEY((article_id), comment_id));
74
@doanduyhai
Data modeling best practices!Start by queries • identify core functional read paths • 1 read scenario ≈ 1 SELECT
75
@doanduyhai
Data modeling best practices!Start by queries • identify core functional read paths • 1 read scenario ≈ 1 SELECT
Denormalize • wisely, only duplicate necessary & immutable data • functional/technical trade-off
76
@doanduyhai
Data modeling best practices!
Person UDT - firstname/lastname - date of birth - gender - mood - location
77
@doanduyhai
Data modeling best practices!
John DOE, male birthdate: 21/02/1981 subscribed since 03/06/2011 ☉ San Mateo, CA
’’Impossible is not John DOE’’
Full detail read from User table on click
78
@doanduyhai
Data modeling trade-off What if ... • not possible to de-normalize with immutable data ? • have to duplicate mutable data ?
79
@doanduyhai
Data modeling trade-off 2 strategies • either accept to normalize some data (extra SELECT required) • or de-normalize and update everywhere upon data mutation
80
@doanduyhai
Data modeling trade-off 2 strategies • either accept to normalize some data (extra SELECT required) • or de-normalize and update everywhere upon data mutation But always keep those scenarios rare (5%-10% max), focus on the 90%
81
@doanduyhai
Data modeling trade-off 2 strategies • either accept to normalize some data (extra SELECT required) • or de-normalize and update everywhere upon data mutation But always keep those scenarios rare (5%-10% max), focus on the 90% Example: Twitter tweet deletion
82
@doanduyhai
Lightweight Transaction (LWT)!What ? ☞ make operations linearizable Why ? ☞ solve a class of race conditions in Cassandra that would require installing an external lock manager
84
@doanduyhai
Lightweight Transaction (LWT)!
INSERT INTO account (id, email) VALUES (‘jdoe’, ‘[email protected]’);
SELECT * FROM account WHERE id= ‘jdoe’; (0 rows) SELECT * FROM account
WHERE id= ‘jdoe’; (0 rows)
INSERT INTO account (id, email) VALUES (‘jdoe’, ‘[email protected]’);
winner
85
@doanduyhai
Lightweight Transaction (LWT)!How ? ☞ implementing Paxos protocol on Cassandra Syntax ?
INSERT INTO account (id, email) VALUES (‘jdoe’, ‘[email protected]’) IF NOT EXISTS;
UPDATE account SET email = ‘[email protected]’ IF email = ‘[email protected]’ WHERE id=‘jdoe’;
86
@doanduyhai
Lightweight Transaction (LWT)!Recommendations • insert with LWT ☞ delete must use LWT
INSERT INTO my_table … IF NOT EXISTS ☞ DELETE FROM my_table … IF EXISTS
87
@doanduyhai
Lightweight Transaction (LWT)!Recommendations • LWT expensive (4 round-trips), do not abuse • only for 1% – 5% use cases
88
@doanduyhai
Use Cases!
Messaging
Collections/ Playlists
Fraud detection
Recommendation/ Personalization
Internet of things/ Sensor data
92
@doanduyhai
Use Cases!
Messaging
Collections/ Playlists
Fraud detection
Recommendation/ Personalization
Internet of things/ Sensor data
93