Post on 26-Jan-2015
description
The CQL3/Cassandra
MappingJohn Berryman
OpenSource Connections
OpenSource Connections
Outline
• What Problem does CQL Solve?• The Cassandra Data Model• Pain Points of “Old” Cassandra• Introducing CQL• Understanding the CQL/Cassandra Mapping• CQL for Sets, Lists, and Maps• Putting it All Together
OpenSource Connections
What Problem does CQL Solve?
• The Awesomeness that is Cassandra:o Distributed columnar data storeo No single point of failureo Optimized for availability (though “Tunably” consistent)o Optimized for writeso Easily maintainableo Almost infinitely scalable
.
OpenSource Connections
What Problem does CQL Solve?
• The Awesomeness that is Cassandra:o Distributed columnar data storeo No single point of failureo Optimized for availability (though “Tunably” consistent)o Optimized for writeso Easily maintainableo Almost infinitely scalable
• Cassandra’s usability challengeso NoSQL – “Where are my JOINS? No Schema? De-normalize!?”o BigTable – “Tables with millions of columns!?”
.
OpenSource Connections
What Problem does CQL Solve?
• The Awesomeness that is Cassandra:o Distributed columnar data storeo No single point of failureo Optimized for availability (though “Tunably” consistent)o Optimized for writeso Easily maintainableo Almost infinitely scalable
• Cassandra’s usability challengeso NoSQL – “Where are my JOINS? No Schema? De-
normalize!?”o BigTable – “Tables with millions of columns!?”
• CQL saves the day!o A best-practices interface to Cassandrao Uses familiar SQL-like language
OpenSource Connections
C* Data Model
OpenSource Connections
Keyspace
C* Data Model
OpenSource Connections
Keyspace
Column Family Column Family
C* Data Model
OpenSource Connections
Keyspace
Column Family Column Family
C* Data Model
OpenSource Connections
Keyspace
Column Family Column Family
C* Data Model
OpenSource Connections
Row Key
C* Data Model
OpenSource Connections
Row Key
Column Name
Column Value (or Tombstone)
Timestamp
Time-to-live
Column
C* Data Model
OpenSource Connections
● Row Key, Column Name, Column Value have types
● Column Name has comparator● RowKey has partitioner● Rows can have any number of
columns - even in same column family
● Rows can have many columns● Column Values can be omitted● Time-to-live is useful!● Tombstones
Row Key
Column Name
Column Value (or Tombstone)
Timestamp
Time-to-live
Column
C* Data Model: Writes
OpenSource Connections
MemTable
CommitLog
Row Cache
● Insert into MemTable
● Dump to CommitLog
● No read● Very Fast!● Blocks on CPU
before O/I!
Key Cache
SSTable
SSTable
SSTable
SSTableKey
CacheKey
CacheKey
Cache
BloomFilter
C* Data Model: Writes
OpenSource Connections
MemTable
CommitLog
Row Cache
● Insert into MemTable
● Dump to CommitLog
● No read● Very Fast!● Blocks on CPU
before O/I!
Key Cache
SSTable
SSTable
SSTable
SSTableKey
CacheKey
CacheKey
Cache
BloomFilter
C* Data Model: Writes
OpenSource Connections
MemTable
CommitLog
Row Cache
● Insert into MemTable
● Dump to CommitLog
● No read● Very Fast!● Blocks on CPU
before O/I!
Key Cache
SSTable
SSTable
SSTable
SSTableKey
CacheKey
CacheKey
Cache
BloomFilter
OpenSource Connections
MemTable
CommitLog
Row Cache
Key Cache
SSTable
SSTable
SSTable
SSTableKey
CacheKey
CacheKey
Cache
BloomFilter
● Get values from Memtable
● Get values from row cache if present
● Otherwise check bloom filter to find appropriate SSTables
● Check Key Cache for fast SSTable Search
● Get values from SSTables● Repopulate Row Cache● Super Fast Col.
retrieval● Fast row slicing
C* Data Model:Reads
OpenSource Connections
MemTable
CommitLog
Row Cache
Key Cache
SSTable
SSTable
SSTable
SSTableKey
CacheKey
CacheKey
Cache
BloomFilter
● Get values from Memtable
● Get values from row cache if present
● Otherwise check bloom filter to find appropriate SSTables
● Check Key Cache for fast SSTable Search
● Get values from SSTables● Repopulate Row Cache● Super Fast Col.
retrieval● Fast row slicing
C* Data Model:Reads
OpenSource Connections
MemTable
CommitLog
Row Cache
Key Cache
SSTable
SSTable
SSTable
SSTableKey
CacheKey
CacheKey
Cache
BloomFilter
● Get values from Memtable
● Get values from row cache if present
● Otherwise check bloom filter to find appropriate SSTables
● Check Key Cache for fast SSTable Search
● Get values from SSTables● Repopulate Row Cache● Super Fast Col.
retrieval● Fast row slicing
C* Data Model:Reads
OpenSource Connections
MemTable
CommitLog
Row Cache
Key Cache
SSTable
SSTable
SSTable
SSTableKey
CacheKey
CacheKey
Cache
BloomFilter
● Get values from Memtable
● Get values from row cache if present
● Otherwise check bloom filter to find appropriate SSTables
● Check Key Cache for fast SSTable Search
● Get values from SSTables● Repopulate Row Cache● Super Fast Col.
retrieval● Fast row slicing
C* Data Model:Reads
OpenSource Connections
MemTable
CommitLog
Row Cache
Key Cache
SSTable
SSTable
SSTable
SSTableKey
CacheKey
CacheKey
Cache
BloomFilter
● Get values from Memtable
● Get values from row cache if present
● Otherwise check bloom filter to find appropriate SSTables
● Check Key Cache for fast SSTable Search
● Get values from SSTables● Repopulate Row Cache● Super Fast Col.
retrieval● Fast row slicing
C* Data Model:Reads
OpenSource Connections
MemTable
CommitLog
Row Cache
Key Cache
SSTable
SSTable
SSTable
SSTableKey
CacheKey
CacheKey
Cache
BloomFilter
● Get values from Memtable
● Get values from row cache if present
● Otherwise check bloom filter to find appropriate SSTables
● Check Key Cache for fast SSTable Search
● Get values from SSTables● Repopulate Row Cache● Super Fast Col.
retrieval● Fast row slicing
C* Data Model:Reads
Cassandra Pain Points
• Twitter Example• My tweets
o SET tweets[JnBrymn][2013-07-19 T 09:20] = “Wonderful morning. This coffee is great.”
o SET tweets[JnBrymn][2013-07-19 T 09:21] = “Oops, smoke is coming out of the SQL server!”
o SET tweets[JnBrymn][2013-07-19 T 09:51] = “Now my coffee is cold :-(”
• Get John’s tweetso GET tweets[JnBrymn] (output is as expected)
OpenSource Connections
Cassandra Pain Points
• Twitter Example• My tweets
o SET tweets[JnBrymn][2013-07-19 T 09:20] = “Wonderful morning. This coffee is great.”
o SET tweets[JnBrymn][2013-07-19 T 09:21] = “Oops, smoke is coming out of the SQL server!”
o SET tweets[JnBrymn][2013-07-19 T 09:51] = “Now my coffee is cold :-(”
• Get John’s tweetso GET tweets[JnBrymn] (output is as expected)
• Pain-point: schema-less means that you have to read code to understand data model
OpenSource Connections
Cassandra Pain Points
• My timeline (other’s tweets)• More complicated – must store corresponding
user names• Bad Option 1: keep multiple column families
o SET timeline_from[JnBrymn][2013-07-19 T 09:20] = “softwaredoug”
o SET timeline_text[JnBrymn][2013-07-19 T 09:20] = “Hey John I posted on reddit, upvote me!”
• Get John’s timelineo GET timeline_from[JnBrymn]o GET timeline_text[JnBrymn]
OpenSource Connections
Cassandra Pain Points
• My timeline (other’s tweets)• More complicated – must store corresponding
user names• Bad Option 1: keep multiple column families
o SET timeline_from[JnBrymn][2013-07-19 T 09:20] = “softwaredoug”
o SET timeline_text[JnBrymn][2013-07-19 T 09:20] = “Hey John I posted on reddit, upvote me!”
• Get John’s timelineo GET timeline_from[JnBrymn]o GET timeline_text[JnBrymn]
• Pain-point: Multiple queries required.
OpenSource Connections
Cassandra Pain Points
• My timeline• Bad Option 2: shove into single column value
o SET timeline[JnBrymn][2013-07-19 T 09:20] = {from:”softwaredoug”, text: “Hey John I posted on reddit, upvote me!”
• Get John’s timelineo GET timeline[JnBrymn] (…not too bad.)
OpenSource Connections
Cassandra Pain Points
• My timeline• Bad Option 2: shove into single column value
o SET timeline[JnBrymn][2013-07-19 T 09:20] = {from:”softwaredoug”, text: “Hey John I posted on reddit, upvote me!”
• Get John’s timelineo GET timeline[JnBrymn] (…not too bad.)
• Pain-point: Updates require a read-then-modify
OpenSource Connections
Cassandra Pain Points
• My timeline• Best Option: composite column names
o SET timeline[JnBrymn][2013-07-19 T 09:20|from] = ”softwaredoug”
o SET timeline[JnBrymn][2013-07-19 T 09:20|text] = “Hey John, I posted on reddit, upvote me!”
• Get John’s timelineo GET timeline[JnBrymn] (extract from and text in
client)
• Resolves prior pain points! Scales well!
OpenSource Connections
Cassandra Pain Points
• My timeline• Best Option: composite column names
o SET timeline[JnBrymn][2013-07-19 T 09:20|from] = ”softwaredoug”
o SET timeline[JnBrymn][2013-07-19 T 09:20|text] = “Hey John, I posted on reddit, upvote me!”
• Get John’s timelineo GET timeline[JnBrymn] (extract from and text in
client)
• Resolves prior pain points! Scales well!• Pain-point: Even more code reading to
understand data model!
OpenSource Connections
Cassandra Pain Points
• Justin Bieber’s timeline (e.g. many tweets)• Previous solution fails if number of columns >
2Billion• Best Option: composite row names
o SET timeline[bieber|2013-07][19 T 09:20|from] = ”softwaredoug”
o SET timeline[bieber|2013-07][19 T 09:20|text] = “Justin Bieber, you complete me.”
• Get Justin’s timelineo GET timeline[bieber|2013-07] (get other months
too)
OpenSource Connections
Cassandra Pain Points
• Justin Bieber’s timeline (e.g. many tweets)• Previous solution fails if number of columns >
2Billion• Best Option: composite row names
o SET timeline[bieber|2013-07][19 T 09:20|from] = ”softwaredoug”
o SET timeline[bieber|2013-07][19 T 09:20|text] = “Justin Bieber, you complete me.”
• Get Justin’s timelineo GET timeline[bieber|2013-07] (get other months
too)
• Pain-point: Even more code reading to understand data model!
OpenSource Connections
Introducing CQL
• CQL is a reintroduction of schema so that you don’t have to read code to understand the data model.
• CQL creates a common language so that details of the data model can be easily communicated.
• CQL is a best-practices Cassandra interface and hides the messy details.
OpenSource Connections
Introducing CQL
• CQL is a reintroduction of schema so that you don’t have to read code to understand the data model.
• CQL creates a common language so that details of the data model can be easily communicated.
• CQL is a best-practices Cassandra interface and hides the messy details.
OpenSource Connections
Let’s see it!
Introducing CQL
CREATE TABLE users ( id timeuuid PRIMARY KEY, lastname varchar, firstname varchar, dateOfBirth timestamp );
OpenSource Connections
Introducing CQL
CREATE TABLE users ( id timeuuid PRIMARY KEY, lastname varchar, firstname varchar, dateOfBirth timestamp );
INSERT INTO users (id,lastname, firstname, dateofbirth) VALUES (now(),'Berryman',’John','1975-09-15');
OpenSource Connections
Introducing CQL
CREATE TABLE users ( id timeuuid PRIMARY KEY, lastname varchar, firstname varchar, dateOfBirth timestamp );
INSERT INTO users (id,lastname, firstname, dateofbirth) VALUES (now(),’Berryman’,’John’,’1975-09-15’);
UPDATE users SET firstname = ’John’ WHERE id = f74c0b20-0862-11e3-8cf6-b74c10b01fc6;
OpenSource Connections
Introducing CQL
CREATE TABLE users ( id timeuuid PRIMARY KEY, lastname varchar, firstname varchar, dateOfBirth timestamp );
INSERT INTO users (id,lastname, firstname, dateofbirth) VALUES (now(),'Berryman',’John','1975-09-15');
UPDATE users SET firstname = 'John’ WHERE id = f74c0b20-0862-11e3-8cf6-b74c10b01fc6;
SELECT dateofbirth,firstname,lastname FROM users ;
dateofbirth | firstname | lastname--------------------------+-----------+---------- 1975-09-15 00:00:00-0400 | John | Berryman
OpenSource Connections
Introducing CQL
“Hey sweet! It’s exactly the same as MySQL!”
OpenSource Connections
Introducing CQL
“Hey sweet! It’s exactly the same as MySQL!”Hold your horses. There are some
important differences.
OpenSource Connections
Introducing CQL
“Hey sweet! It’s exactly the same as MySQL!”Hold your horses. There are some
important differences.
“Wait? What happened to theCassandra’s wide rows?”
OpenSource Connections
Introducing CQL
“Hey sweet! It’s exactly the same as MySQL!”Hold your horses. There are some
important differences.
“Wait? What happened to theCassandra’s wide rows?”
There’s still there. Understandingthe mapping is crucial!
OpenSource Connections
Introducing CQL
“Hey sweet! It’s exactly the same as MySQL!”Hold your horses. There are some
important differences.
“Wait? What happened to theCassandra’s wide rows?”
There’s still there. Understandingthe mapping is crucial!
OpenSource Connections
Remember this:• Cassandra finds rows fast
• Cassandra scans columns fast• Cassandra does not scan rows
The CQL/Cassandra Mapping
CREATE TABLE employees ( name text PRIMARY KEY, age int, role text);
OpenSource Connections
The CQL/Cassandra Mapping
CREATE TABLE employees ( name text PRIMARY KEY, age int, role text);
OpenSource Connections
name | age | role-----+-----+-----john | 37 | deveric | 38 | ceo
The CQL/Cassandra Mapping
CREATE TABLE employees ( name text PRIMARY KEY, age int, role text);
OpenSource Connections
name | age | role-----+-----+-----john | 37 | deveric | 38 | ceo
age role
john 37 dev
age role
eric 38 ceo
The CQL/Cassandra Mapping
CREATE TABLE employees ( company text, name text, age int, role text, PRIMARY KEY (company,name));
OpenSource Connections
The CQL/Cassandra Mapping
CREATE TABLE employees ( company text, name text, age int, role text, PRIMARY KEY (company,name));
OpenSource Connections
company | name | age | role--------+------+-----+-----OSC | eric | 38 | ceoOSC | john | 37 | devRKG | anya | 29 | leadRKG | ben | 27 | devRKG | chad | 35 | ops
The CQL/Cassandra Mapping
CREATE TABLE employees ( company text, name text, age int, role text, PRIMARY KEY (company,name));
OpenSource Connections
company | name | age | role--------+------+-----+-----OSC | eric | 38 | ceoOSC | john | 37 | devRKG | anya | 29 | leadRKG | ben | 27 | devRKG | chad | 35 | ops
eric:age eric:role john:age john:role
OSC 38 dev 37 dev
anya:age anya:role ben:age ben:role chad:age chad:role
RKG 29 lead 27 dev 35 ops
The CQL/Cassandra Mapping
CREATE TABLE example ( A text, B text, C text, D text, E text, F text, PRIMARY KEY ((A,B),C,D));
OpenSource Connections
The CQL/Cassandra Mapping
CREATE TABLE example ( A text, B text, C text, D text, E text, F text, PRIMARY KEY ((A,B),C,D));
OpenSource Connections
A | B | C | D | E | F --+---+---+---+---+---a | b | c | d | e | fa | b | c | g | h | ia | b | j | k | l | ma | n | o | p | q | rs | t | u | v | w | x
The CQL/Cassandra Mapping
CREATE TABLE example ( A text, B text, C text, D text, E text, F text, PRIMARY KEY ((A,B),C,D));
OpenSource Connections
A | B | C | D | E | F --+---+---+---+---+---a | b | c | d | e | fa | b | c | g | h | ia | b | j | k | l | ma | n | o | p | q | rs | t | u | v | w | x
c:d:E c:d:F c:g:E c:g:F j:k:E j:k:F
a:b e f h i l m
o:p:E o:p:F
a:n q r
u:v:E u:v:F
s:t w x
CQL for Sets, Lists, and Maps
• Collection Semanticso Sets hold list of unique elementso Lists hold ordered, possibly repeating elementso Maps hold a list of key-value pairs
• Uses same old Cassandra data structure
OpenSource Connections
CQL for Sets, Lists, and Maps
• Collection Semanticso Sets hold list of unique elementso Lists hold ordered, possibly repeating elementso Maps hold a list of key-value pairs
• Uses same old Cassandra data structure• Declaring
OpenSource Connections
CREATE TABLE mytable( X text, Y text, myset set<text>, mylist list<int>, mymap map<text, text>, PRIMARY KEY (X,Y));
CQL for Sets, Lists, and Maps
• Collection Semanticso Sets hold list of unique elementso Lists hold ordered, possibly repeating elementso Maps hold a list of key-value pairs
• Uses same old Cassandra data structure• Declaring
OpenSource Connections
CREATE TABLE mytable( X text, Y text, myset set<text>, mylist list<int>, mymap map<text, text>, PRIMARY KEY (X,Y));
Collection fields can not be used in primary keys
CQL for Sets, Lists, and Maps
• Inserting
INSERT INTO mytable (row, myset) VALUES (123, { ‘apple’, ‘banana’});
OpenSource Connections
CQL for Sets, Lists, and Maps
• Inserting
INSERT INTO mytable (row, myset) VALUES (123, { ‘apple’, ‘banana’});
OpenSource Connections
INSERT INTO mytable (row, mylist) VALUES (123, [‘apple’,’banana’,’apple’]);
CQL for Sets, Lists, and Maps
• Inserting
INSERT INTO mytable (row, myset) VALUES (123, { ‘apple’, ‘banana’});
OpenSource Connections
INSERT INTO mytable (row, mylist) VALUES (123, [‘apple’,’banana’,’apple’]);
INSERT INTO mytable (row, mymap) VALUES (123, {1:’apple’,2:’banana’})
CQL for Sets, Lists, and Maps• UpdatingUPDATE mytable SET myset = myset + {‘apple’,‘banana’} WHERE row = 123;UPDATE mytable SET myset = myset - { ‘apple’ } WHERE row = 123;
OpenSource Connections
CQL for Sets, Lists, and Maps• UpdatingUPDATE mytable SET myset = myset + {‘apple’,‘banana’} WHERE row = 123;UPDATE mytable SET myset = myset - { ‘apple’ } WHERE row = 123;
OpenSource Connections
UPDATE mytable SET mylist = mylist + [‘apple’,‘banana’] WHERE row = 123;UPDATE mytable SET mylist = [‘banana’] + mylist WHERE row = 123;
CQL for Sets, Lists, and Maps• UpdatingUPDATE mytable SET myset = myset + {‘apple’,‘banana’} WHERE row = 123;UPDATE mytable SET myset = myset - { ‘apple’ } WHERE row = 123;
OpenSource Connections
UPDATE mytable SET mylist = mylist + [‘apple’,‘banana’] WHERE row = 123;UPDATE mytable SET mylist = [‘banana’] + mylist WHERE row = 123;
UPDATE mytable SET mymap[‘fruit’] = ‘apple’ WHERE row = 123UPDATE mytable SET mymap = mymap + { ‘fruit’:‘apple’} WHERE row = 123
CQL for Sets, Lists, and Maps
SETS
CREATE TABLE mytable( X text, Y text, myset set<int>, PRIMARY KEY (X,Y));
OpenSource Connections
CQL for Sets, Lists, and Maps
SETS
CREATE TABLE mytable( X text, Y text, myset set<int>, PRIMARY KEY (X,Y));
OpenSource Connections
X | Y | myset ---+---+------------ a | b | {1,2} a | c | {3,4,5}
CQL for Sets, Lists, and Maps
SETS
CREATE TABLE mytable( X text, Y text, myset set<int>, PRIMARY KEY (X,Y));
OpenSource Connections
X | Y | myset ---+---+------------ a | b | {1,2} a | c | {3,4,5}
b:myset:1 b:myset:2 c:myset:3 c:myset:4 c:myset:5
a
CQL for Sets, Lists, and Maps
OpenSource Connections
LISTS
CREATE TABLE mytable( X text, Y text, mylist list<int>, PRIMARY KEY (X,Y));
CQL for Sets, Lists, and Maps
OpenSource Connections
X | Y | mylist ---+---+------------ a | b | [1,2]
LISTS
CREATE TABLE mytable( X text, Y text, mylist list<int>, PRIMARY KEY (X,Y));
CQL for Sets, Lists, and Maps
OpenSource Connections
X | Y | mylist ---+---+------------ a | b | [1,2]
b:mylist:f7e5450039..8d b:mylist:f7e5450139..8d
a 1 2
LISTS
CREATE TABLE mytable( X text, Y text, mylist list<int>, PRIMARY KEY (X,Y));
CQL for Sets, Lists, and Maps
OpenSource Connections
X | Y | mylist ---+---+------------ a | b | [1,2]
b:mylist:f7e5450039..8d b:mylist:f7e5450139..8d
a 1 2
LISTS
CREATE TABLE mytable( X text, Y text, mylist list<int>, PRIMARY KEY (X,Y));
CQL for Sets, Lists, and Maps
MAPS
CREATE TABLE mytable( X text, Y text, mymap map<text,int>, PRIMARY KEY (X,Y));
OpenSource Connections
CQL for Sets, Lists, and Maps
MAPS
CREATE TABLE mytable( X text, Y text, mymap map<text,int>, PRIMARY KEY (X,Y));
OpenSource Connections
X | Y | mymap ---+---+------------ a | b | {m:1,n:2} a | c |{n:3,p:4,q:5}
CQL for Sets, Lists, and Maps
MAPS
CREATE TABLE mytable( X text, Y text, mymap map<text,int>, PRIMARY KEY (X,Y));
OpenSource Connections
X | Y | mymap ---+---+------------ a | b | {m:1,n:2} a | c |{n:3,p:4,q:5}
b:mymap:m b:mymap:n c:mymap:n c:mymap:p c:mymap:q
a 1 2 3 4 5
Peek Behind the Scenes! Do it!
(in cqlsh)CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};USE test;CREATE TABLE stuff ( a int, b int, myset set<int>, mylist list<int>, mymap map<int,int>, PRIMARY KEY (a,b));UPDATE stuff SET myset = {1,2}, mylist = [3,4,5], mymap = {6:7,8:9} WHERE a = 0 AND b = 1;SELECT * FROM stuff;
(in cassandra-cli)use test;list stuff ;
(in cqlsh)SELECT key_aliases,column_aliases from system.schema_columnfamilies WHERE keyspace_name = 'test' AND columnfamily_name = 'stuff';
OpenSource Connections
Putting it All Together…you already know• CQL is a reintroduction of schema• CQL creates a common data modeling language• CQL is a best-practices Cassandra interface
.OpenSource Connections
Putting it All Together…you already know• CQL is a reintroduction of schema• CQL creates a common data modeling language• CQL is a best-practices Cassandra interface
…now you know• CQL let’s you take advantage of the C* Data structure
.OpenSource Connections
Putting it All Together…you already know• CQL is a reintroduction of schema• CQL creates a common data modeling language• CQL is a best-practices Cassandra interface
…now you know• CQL let’s you take advantage of the C* Data structure
…but also• CQL protocol is binary and therefore interoperable with
any language• CQL is asynchronous and fast (Thrift transport layer is
synchronous)• CQL allows the possibility for prepared statements
OpenSource Connections
Thanks!
Follow me on Twitter @JnBrymn
Check out the OpenSource Connection
Blog
http://www.opensourceconnections.com/blog/
OpenSource Connections