Cassandra introduction at FinishJUG

@doanduyhai

Introduction to Cassandra DuyHai DOAN, Technical Advocate

@doanduyhai

Who Am I ?!Duy Hai DOAN Cassandra technical advocate •  talks, meetups, confs •  open-source devs (Achilles, …) •  OSS Cassandra point of contact

☞ [email protected] ☞ @doanduyhai

2

@doanduyhai

Datastax!•  Founded in April 2010

•  We contribute a lot to Apache Cassandra™

•  400+ customers (25 of the Fortune 100), 200+ employees

•  Headquarter in San Francisco Bay area

•  EU headquarter in London, offices in France and Germany

•  Datastax Enterprise = OSS Cassandra + extra features

3

@doanduyhai

Agenda!Architecture •  Cluster, Replication, Consistency

Data model •  Last Write Win (LWW), CQL basics, From SQL to CQL,

Lightweight Transaction

DSE

Use Cases

4

@doanduyhai

Cassandra history!NoSQL database •  created at Facebook •  open-sourced since 2008 •  current version = 2.1 •  column-oriented ☞ distributed table

5

@doanduyhai

Cassandra 5 key facts!Key fact 1: linear scalability

C*

C* C*

NetcoSports 3 nodes, ≈3GB

1k+ nodes, PB+

YOU

6

@doanduyhai

Cassandra 5 key facts!Key fact 2: continuous availability (≈100% up-time) •  resilient architecture (Dynamo)

7

@doanduyhai

Cassandra 5 key facts!Key fact 3: multi-data centers •  out-of-the-box (config only) •  AWS conf for multi-region DCs •  GCE/CloudStack support •  Microsoft Azure

8

@doanduyhai

Multi-DC usages!

New York (DC1) London (DC2)

Data-locality, disaster recovery

n2

n3

n4

n5

n6

n7

n8

n1

n2

n3

n4 n5

n1

Async replication

9

@doanduyhai

Multi-DC usages!Workload segregation/virtual DC

n2

n3

n4

n5

n6

n7

n8

n1

n2

n3

n4 n5

n1

Production (Live)

Analytics (Spark/Hadoop)

Same room

Async replication

10

@doanduyhai

Multi-DC usages!Prod data copy for testing/benchmarking

n2

n3

n4

n5

n6

n7

n8

n1

n2

n3 n1

Use LOCAL

consistency

My tiny test cluster

Data copy

NEVER WRITE HERE !!!

11

@doanduyhai

Cassandra 5 key facts!Key fact 4: operational simplicity •  1 node = 1 process + 2 config file (main + IP) •  deployment automation •  OpsCenter for monitoring

12

@doanduyhai

Cassandra 5 key facts!

13

@doanduyhai

Cassandra 5 key facts!Key fact 5: analytics combo •  Cassandra + Spark = awesome ! •  realtime streaming/analytics/aggregation …

14

Cassandra architecture!

Cluster Replication

Consistency

@doanduyhai

Cassandra architecture!Cluster layer •  Amazon DynamoDB paper •  masterless architecture

Data-store layer •  Google Big Table paper •  Columns/columns family

16

@doanduyhai

Data distribution!Random: hash of #partition → token = hash(#p) Hash: ]-X, X] X = huge number (264/2)

n1

n2

n3

n4

n5

n6

n7

n8

17

@doanduyhai

Token Ranges!A: ]0, X/8] B: ] X/8, 2X/8] C: ] 2X/8, 3X/8] D: ] 3X/8, 4X/8] E: ] 4X/8, 5X/8] F: ] 5X/8, 6X/8] G: ] 6X/8, 7X/8] H: ] 7X/8, X]

n1

n2

n3

n4

n5

n6

n7

n8

A

B

C

D

E

F

G

H

18

@doanduyhai

Distributed Table!

n1

n2

n3

n4

n5

n6

n7

n8

A

B

C

D

E

F

G

H

user_id1

user_id2

user_id3

user_id4

user_id5

19

@doanduyhai

Distributed Table!

n1

n2

n3

n4

n5

n6

n7

n8

A

B

C

D

E

F

G

H

user_id1

user_id2

user_id3

user_id4

user_id5

20

@doanduyhai

Linear scalability!

n1

n2

n3

n4

n5

n6

n7

n8

n1

n2

n3 n4

n5

n6

n7

n8 n9

n10

8 nodes 10 nodes

21

@doanduyhai

Failure tolerance!Replication Factor (RF) = 3

n1

n2

n3

n4

n5

n6

n7

n8

1

2

3{B, A, H}

{C, B, A}

{D, C, B}

A

B

C

D

E

F

G

H

22

@doanduyhai

Coordinator node!Incoming requests (read/write) Coordinator node handles the request

Every node can be coordinator àmasterless

n1

n2

n3

n4

n5

n6

n7

n8

1

2

3

coordinator request

23

@doanduyhai

Consistency!Tunable at runtime •  ONE •  QUORUM (strict majority w.r.t. RF) •  ALL Apply both to read & write

24

@doanduyhai

Consistency in action!RF = 3, Write ONE, Read ONE

B A A

B A A

Read ONE: A

data replication in progress …

Write ONE: B

25

@doanduyhai

Consistency in action!RF = 3, Write ONE, Read QUORUM

B A A

Write ONE: B

Read QUORUM: A

B A A


26

@doanduyhai

Consistency in action!RF = 3, Write ONE, Read ALL

B A A

Read ALL: B

B A A


Write ONE: B

27

@doanduyhai

Consistency in action!RF = 3, Write QUORUM, Read ONE

B B A

Write QUORUM: B

Read ONE: A

B B A


28

@doanduyhai

Consistency in action!RF = 3, Write QUORUM, Read QUORUM

B B A

Read QUORUM: B

B B A


Write QUORUM: B

29

@doanduyhai

Consistency trade-off!

30

@doanduyhai

Consistency level!

ONE Fast, may not read latest written value

31

@doanduyhai

Consistency level!

QUORUM Strict majority w.r.t. Replication Factor

Good balance

32

@doanduyhai

Consistency level!

ALL Paranoid

Slow, no high availability

33

@doanduyhai

Consistency summary!

ONERead + ONEWrite

☞ available for read/write even (N-1) replicas down

QUORUMRead + QUORUMWrite

☞ available for read/write even 1+ replica down 34

Q & R

! " !

Data model!

Last Write Win!CQL basics!

From SQL to CQL!Lightweight Transaction!

@doanduyhai

Cassandra Write Path!

Commit log1

. . .

1

Commit log2

Commit logn

Memory

37

@doanduyhai


Memory

Commit log1

. . .

1

Commit log2

Commit logn

MemTable Table1

MemTable Table2

MemTable TableN

2

. . .

38

@doanduyhai


Commit log1

Commit log2

Commit logn

Table1

SSTable1

Table2 Table3

SSTable2 SSTable3 3

Memory

. . .

39

@doanduyhai


Commit log1

Commit log2

Commit logn

Table1

SSTable1

Table2 Table3

SSTable2 SSTable3

Memory . . . MemTable Table1

MemTable Table2

MemTable TableN

. . .

40

@doanduyhai


Commit log1

Commit log2

Commit logn

Table1

SSTable1

Table2 Table3

SSTable2 SSTable3

Memory

SSTable1

SSTable2

SSTable3 . . .

41

@doanduyhai

Last Write Win (LWW)!

jdoe age name

33 John DOE

INSERT INTO users(login, name, age) VALUES(‘jdoe’, ‘John DOE’, 33);

#partition

42

@doanduyhai


jdoe age (t1) name (t1)

33 John DOE


auto-generated timestamp

.

43

@doanduyhai


UPDATE users SET age = 34 WHERE login = ‘jdoe’;


33 John DOE jdoe

age (t2)

34

SSTable1 SSTable2

44

@doanduyhai


DELETE age FROM users WHERE login = ‘jdoe’;

jdoe age (t3)

ý

tombstone


33 John DOE jdoe

age (t2)

34

SSTable1 SSTable2 SSTable3

45

@doanduyhai


SELECT age FROM users WHERE login = ‘jdoe’;

? ? ?


jdoe age (t3)

ý jdoe

age (t1) name (t1)

33 John DOE jdoe

age (t2)

34

46

@doanduyhai



✓ ✕ ✕


jdoe age (t3)

ý jdoe

age (t1) name (t1)

33 John DOE jdoe

age (t2)

34

47

@doanduyhai

Compaction!SSTable1 SSTable2 SSTable3

jdoe age (t3)

ý jdoe

age (t1) name (t1)

33 John DOE jdoe

age (t2)

34

New SSTable


ý John DOE

48

@doanduyhai

Historical data!

history

id date1(t1) date2(t2) … date9(t9)

… … … …

SSTable1 SSTable2

You want to keep data history ? •  do not use internal generated timestamp !!! •  ☞ time-series data modeling

id date10(t10) date11(t11) … …

… … … …

49

@doanduyhai

CRUD operations!


UPDATE users SET age = 34 WHERE login = ‘jdoe’;

DELETE age FROM users WHERE login = ‘jdoe’;


50

@doanduyhai

Simple Table!

CREATE TABLE users ( login text, name text, age int, … PRIMARY KEY(login));

partition key (#partition)

51

@doanduyhai

What about joins ?!How can I join data between tables ? How can I model 1 – N relationships ? How to model a mailbox ?

Emails User 1 n

52

@doanduyhai

Clustered table (1 – N)!

CREATE TABLE mailbox ( login text, message_id timeuuid, interlocutor text, message text, PRIMARY KEY((login), message_id));

partition key clustering column (sorted)

unicity

53

@doanduyhai

SSTable2

SSTable1

On disk layout

jdoe message_id1 message_id2 … message_id104

… … … …

hsue message_id1 message_id2 … message_id78

… … … …

jdoe message_id105 message_id106 … message_id169

… … … …

54

@doanduyhai

Queries!Get message by user and message_id (date)

SELECT * FROM mailbox WHERE login = jdoe and message_id = ‘2014-09-25 16:00:00’;

Get message by user and date interval

SELECT * FROM mailbox WHERE login = jdoe and message_id <= ‘2014-09-25 16:00:00’ and message_id >= ‘2014-09-20 16:00:00’;

55

@doanduyhai

Queries!Get message by message_id only ?

SELECT * FROM mailbox WHERE message_id = ‘2014-09-25 16:00:00’;

Get message by date interval only ?

SELECT * FROM mailbox WHERE and message_id <= ‘2014-09-25 16:00:00’ and message_id >= ‘2014-09-20 16:00:00’;

❓

❓

56

@doanduyhai

Queries!Get message by message_id only (#partition not provided)

SELECT * FROM mailbox WHERE message_id = ‘2014-09-25 16:00:00’;

Get message by date interval only (#partition not provided)

SELECT * FROM mailbox WHERE and message_id <= ‘2014-09-25 16:00:00’ and message_id >= ‘2014-09-20 16:00:00’;

57

@doanduyhai

Without #partition

?

?

?

?

?

?

?

?

❓

❓

❓

❓

❓ ❓

❓

❓

No #partition ☞ no token ☞ where are my data ?

58

@doanduyhai

The importance of #partition

In RDBMS, no primary key ☞ full table scan

😭 59

@doanduyhai

The importance of #partition

With Cassandra, no partition key ☞ full CLUSTER scan

😱 60

@doanduyhai

Queries!

SELECT * FROM mailbox WHERE login >= ‘hsue’ and login <= ‘jdoe’;

Get message by user range (range query on #partition)

SELECT * FROM mailbox WHERE login like ‘%doe%‘;

Get message by user pattern (non exact match on #partition)

61

@doanduyhai

WHERE clause restrictions!All queries (INSERT/UPDATE/DELETE/SELECT) must provide #partition

Only exact match (=) on #partition, range queries (<, ≤, >, ≥) not allowed •  ☞ full cluster scan

On clustering columns, only range queries (<, ≤, >, ≥) and exact match

WHERE clause only possible •  on columns defined in PRIMARY KEY •  on indexed columns (⚠)

62

@doanduyhai

WHERE clause restrictions!What if I want to perform « arbitrary » WHERE clause ? •  search form scenario, dynamic search fields

63

@doanduyhai

WHERE clause restrictions!What if I want to perform « arbitrary » WHERE clause ? •  search form scenario, dynamic search fields DO NOT RE-INVENT THE WHEEL ! ☞ Apache Solr (Lucene) integration (Datastax Enterprise) ☞ Same JVM, 1-cluster-2-products (Solr & Cassandra)

64

@doanduyhai

WHERE clause restrictions!What if I want to perform « arbitrary » WHERE clause ? •  search form scenario, dynamic search fields DO NOT RE-INVENT THE WHEEL ! ☞ Apache Solr (Lucene) integration (Datastax Enterprise) ☞ Same JVM, 1-cluster-2-products (Solr & Cassandra)

SELECT * FROM users WHERE solr_query = ‘age:[33 TO *] AND gender:male’;

SELECT * FROM users WHERE solr_query = ‘lastname:*schwei?er’;

65

@doanduyhai

Collections & maps!

CREATE TABLE users ( login text, name text, age int, friends set<text>, hobbies list<text>, languages map<int, text>, … PRIMARY KEY(login));

66

Keep the cardinality low ≈ 1000

@doanduyhai

User Defined Type (UDT)!

CREATE TABLE users ( login text, … street_number int, street_name text, postcode int, country text, … PRIMARY KEY(login));

Instead of

67

@doanduyhai

User Defined Type (UDT)!

CREATE TYPE address ( street_number int, street_name text, postcode int, country text);

CREATE TABLE users ( login text, … location frozen <address>, … PRIMARY KEY(login));

68

@doanduyhai

UDT insert! INSERT INTO users(login,name, location) VALUES ( ‘jdoe’, ’John DOE’, { ‘street_number’: 124, ‘street_name’: ‘Congress Avenue’, ‘postcode’: 95054, ‘country’: ‘USA’ });

69

@doanduyhai

UDT update!

UPDATE users set location = { ‘street_number’: 125, ‘street_name’: ‘Congress Avenue’, ‘postcode’: 95054, ‘country’: ‘USA’ } WHERE login = jdoe;

Can be nested ☞ store documents •  but no dynamic fields (or use map<text, blob>)

70

@doanduyhai

From SQL to CQL!Normalized

Comment

User

1

n

CREATE TABLE comments ( article_id uuid, comment_id timeuuid, author_login text, // typical join id content text, PRIMARY KEY((article_id), comment_id));

71

@doanduyhai

From SQL to CQL 1 SELECT -  10 last comments -  10 author_login

What to do with 10 author_login ???

Comment

User

1

n

72

@doanduyhai

From SQL to CQL 1 SELECT -  10 last comments -  10 author_login

What to do with 10 author_login ??? 10 extra SELECT → N+1 SELECT problem !

Comment

User

1

n

73

@doanduyhai

From SQL to CQL!De-normalized

Comment

User

1

n

CREATE TABLE comments ( article_id uuid, comment_id timeuuid, author frozen<person>, // person is UDT content text, PRIMARY KEY((article_id), comment_id));

74

@doanduyhai

Data modeling best practices!Start by queries •  identify core functional read paths •  1 read scenario ≈ 1 SELECT

75

@doanduyhai

Data modeling best practices!Start by queries •  identify core functional read paths •  1 read scenario ≈ 1 SELECT

Denormalize •  wisely, only duplicate necessary & immutable data •  functional/technical trade-off

76

@doanduyhai

Data modeling best practices!

Person UDT - firstname/lastname - date of birth - gender - mood - location

77

@doanduyhai

Data modeling best practices!

John DOE, male birthdate: 21/02/1981 subscribed since 03/06/2011 ☉ San Mateo, CA

’’Impossible is not John DOE’’

Full detail read from User table on click

78

@doanduyhai

Data modeling trade-off What if ... •  not possible to de-normalize with immutable data ? •  have to duplicate mutable data ?

79

@doanduyhai

Data modeling trade-off 2 strategies •  either accept to normalize some data (extra SELECT required) •  or de-normalize and update everywhere upon data mutation

80

@doanduyhai

Data modeling trade-off 2 strategies •  either accept to normalize some data (extra SELECT required) •  or de-normalize and update everywhere upon data mutation But always keep those scenarios rare (5%-10% max), focus on the 90%

81

@doanduyhai

Data modeling trade-off 2 strategies •  either accept to normalize some data (extra SELECT required) •  or de-normalize and update everywhere upon data mutation But always keep those scenarios rare (5%-10% max), focus on the 90% Example: Twitter tweet deletion

82

Q & R

! " !

@doanduyhai

Lightweight Transaction (LWT)!What ? ☞ make operations linearizable Why ? ☞ solve a class of race conditions in Cassandra that would require installing an external lock manager

84

@doanduyhai

Lightweight Transaction (LWT)!

INSERT INTO account (id, email) VALUES (‘jdoe’, ‘[email protected]’);

SELECT * FROM account WHERE id= ‘jdoe’; (0 rows) SELECT * FROM account

WHERE id= ‘jdoe’; (0 rows)

INSERT INTO account (id, email) VALUES (‘jdoe’, ‘[email protected]’);

winner

85

@doanduyhai

Lightweight Transaction (LWT)!How ? ☞ implementing Paxos protocol on Cassandra Syntax ?

INSERT INTO account (id, email) VALUES (‘jdoe’, ‘[email protected]’) IF NOT EXISTS;

UPDATE account SET email = ‘[email protected]’ IF email = ‘[email protected]’ WHERE id=‘jdoe’;

86

@doanduyhai

Lightweight Transaction (LWT)!Recommendations •  insert with LWT ☞ delete must use LWT

INSERT INTO my_table … IF NOT EXISTS ☞ DELETE FROM my_table … IF EXISTS

87

@doanduyhai

Lightweight Transaction (LWT)!Recommendations •  LWT expensive (4 round-trips), do not abuse •  only for 1% – 5% use cases

88

@doanduyhai

Lightweight Transaction (LWT)!1

2

3

4Compare Swap / Learn

Queue-in Consensus

89

Q & R

! " !

@doanduyhai

DSE (Datastax Enterprise)!

Security

Analytics (Spark & Hadoop)

Search (Solr)

91

@doanduyhai

Use Cases!

Messaging

Collections/ Playlists

Fraud detection

Recommendation/ Personalization

Internet of things/ Sensor data

92

@doanduyhai

Use Cases!

Messaging

Collections/ Playlists

Fraud detection

Recommendation/ Personalization

Internet of things/ Sensor data

93

Thank You @doanduyhai

[email protected]

https://academy.datastax.com/

Cassandra introduction at FinishJUG

Technology

Transcript of Cassandra introduction at FinishJUG