Understanding How CQL3 Maps to Cassandra's Internal Data Structure

The CQL3/Cassandra

MappingJohn Berryman

OpenSource Connections

Outline

• What Problem does CQL Solve?• The Cassandra Data Model• Pain Points of “Old” Cassandra• Introducing CQL• Understanding the CQL/Cassandra Mapping• CQL for Sets, Lists, and Maps• Putting it All Together

What Problem does CQL Solve?

• The Awesomeness that is Cassandra:o Distributed columnar data storeo No single point of failureo Optimized for availability (though “Tunably” consistent)o Optimized for writeso Easily maintainableo Almost infinitely scalable

• Cassandra’s usability challengeso NoSQL – “Where are my JOINS? No Schema? De-normalize!?”o BigTable – “Tables with millions of columns!?”

• Cassandra’s usability challengeso NoSQL – “Where are my JOINS? No Schema? De-

normalize!?”o BigTable – “Tables with millions of columns!?”

• CQL saves the day!o A best-practices interface to Cassandrao Uses familiar SQL-like language

C* Data Model

Keyspace

C* Data Model

Keyspace

Column Family Column Family

C* Data Model

Keyspace

C* Data Model

Keyspace

C* Data Model

Row Key

C* Data Model

Row Key

Column Name

Column Value (or Tombstone)

Timestamp

Time-to-live

Column

C* Data Model

● Row Key, Column Name, Column Value have types

● Column Name has comparator● RowKey has partitioner● Rows can have any number of

columns - even in same column family

● Rows can have many columns● Column Values can be omitted● Time-to-live is useful!● Tombstones

Row Key

Column Name

Column Value (or Tombstone)

Timestamp

Time-to-live

Column

C* Data Model: Writes

MemTable

CommitLog

Row Cache

● Insert into MemTable

● Dump to CommitLog

● No read● Very Fast!● Blocks on CPU

before O/I!

Key Cache

SSTable

SSTableKey

CacheKey

BloomFilter

MemTable

CommitLog

Row Cache

before O/I!

Key Cache

SSTable

SSTableKey

CacheKey

BloomFilter

MemTable

CommitLog

Row Cache

before O/I!

Key Cache

SSTable

SSTableKey

CacheKey

BloomFilter

MemTable

CommitLog

Row Cache

Key Cache

SSTable

SSTableKey

CacheKey

BloomFilter

● Get values from Memtable

● Get values from row cache if present

● Otherwise check bloom filter to find appropriate SSTables

● Check Key Cache for fast SSTable Search

● Get values from SSTables● Repopulate Row Cache● Super Fast Col.

retrieval● Fast row slicing

C* Data Model:Reads

MemTable

CommitLog

Row Cache

Key Cache

SSTable

SSTableKey

CacheKey

BloomFilter

C* Data Model:Reads

MemTable

CommitLog

Row Cache

Key Cache

SSTable

SSTableKey

CacheKey

BloomFilter

C* Data Model:Reads

MemTable

CommitLog

Row Cache

Key Cache

SSTable

SSTableKey

CacheKey

BloomFilter

C* Data Model:Reads

MemTable

CommitLog

Row Cache

Key Cache

SSTable

SSTableKey

CacheKey

BloomFilter

C* Data Model:Reads

MemTable

CommitLog

Row Cache

Key Cache

SSTable

SSTableKey

CacheKey

BloomFilter

C* Data Model:Reads

Cassandra Pain Points

• Twitter Example• My tweets

o SET tweets[JnBrymn][2013-07-19 T 09:20] = “Wonderful morning. This coffee is great.”

o SET tweets[JnBrymn][2013-07-19 T 09:21] = “Oops, smoke is coming out of the SQL server!”

o SET tweets[JnBrymn][2013-07-19 T 09:51] = “Now my coffee is cold :-(”

• Get John’s tweetso GET tweets[JnBrymn] (output is as expected)

• Twitter Example• My tweets

o SET tweets[JnBrymn][2013-07-19 T 09:20] = “Wonderful morning. This coffee is great.”

o SET tweets[JnBrymn][2013-07-19 T 09:21] = “Oops, smoke is coming out of the SQL server!”

o SET tweets[JnBrymn][2013-07-19 T 09:51] = “Now my coffee is cold :-(”

• Get John’s tweetso GET tweets[JnBrymn] (output is as expected)

• Pain-point: schema-less means that you have to read code to understand data model

• My timeline (other’s tweets)• More complicated – must store corresponding

user names• Bad Option 1: keep multiple column families

o SET timeline_from[JnBrymn][2013-07-19 T 09:20] = “softwaredoug”

o SET timeline_text[JnBrymn][2013-07-19 T 09:20] = “Hey John I posted on reddit, upvote me!”

• Get John’s timelineo GET timeline_from[JnBrymn]o GET timeline_text[JnBrymn]

• My timeline (other’s tweets)• More complicated – must store corresponding

user names• Bad Option 1: keep multiple column families

o SET timeline_from[JnBrymn][2013-07-19 T 09:20] = “softwaredoug”

o SET timeline_text[JnBrymn][2013-07-19 T 09:20] = “Hey John I posted on reddit, upvote me!”

• Get John’s timelineo GET timeline_from[JnBrymn]o GET timeline_text[JnBrymn]

• Pain-point: Multiple queries required.

• My timeline• Bad Option 2: shove into single column value

o SET timeline[JnBrymn][2013-07-19 T 09:20] = {from:”softwaredoug”, text: “Hey John I posted on reddit, upvote me!”

• Get John’s timelineo GET timeline[JnBrymn] (…not too bad.)

• My timeline• Bad Option 2: shove into single column value

o SET timeline[JnBrymn][2013-07-19 T 09:20] = {from:”softwaredoug”, text: “Hey John I posted on reddit, upvote me!”

• Get John’s timelineo GET timeline[JnBrymn] (…not too bad.)

• Pain-point: Updates require a read-then-modify

• My timeline• Best Option: composite column names

o SET timeline[JnBrymn][2013-07-19 T 09:20|from] = ”softwaredoug”

o SET timeline[JnBrymn][2013-07-19 T 09:20|text] = “Hey John, I posted on reddit, upvote me!”

• Get John’s timelineo GET timeline[JnBrymn] (extract from and text in

client)

• Resolves prior pain points! Scales well!

• My timeline• Best Option: composite column names

o SET timeline[JnBrymn][2013-07-19 T 09:20|from] = ”softwaredoug”

o SET timeline[JnBrymn][2013-07-19 T 09:20|text] = “Hey John, I posted on reddit, upvote me!”

• Get John’s timelineo GET timeline[JnBrymn] (extract from and text in

client)

• Resolves prior pain points! Scales well!• Pain-point: Even more code reading to

understand data model!

• Justin Bieber’s timeline (e.g. many tweets)• Previous solution fails if number of columns >

2Billion• Best Option: composite row names

o SET timeline[bieber|2013-07][19 T 09:20|from] = ”softwaredoug”

o SET timeline[bieber|2013-07][19 T 09:20|text] = “Justin Bieber, you complete me.”

• Get Justin’s timelineo GET timeline[bieber|2013-07] (get other months

• Justin Bieber’s timeline (e.g. many tweets)• Previous solution fails if number of columns >

2Billion• Best Option: composite row names

o SET timeline[bieber|2013-07][19 T 09:20|from] = ”softwaredoug”

o SET timeline[bieber|2013-07][19 T 09:20|text] = “Justin Bieber, you complete me.”

• Get Justin’s timelineo GET timeline[bieber|2013-07] (get other months

• Pain-point: Even more code reading to understand data model!

Introducing CQL

• CQL is a reintroduction of schema so that you don’t have to read code to understand the data model.

• CQL creates a common language so that details of the data model can be easily communicated.

• CQL is a best-practices Cassandra interface and hides the messy details.

Introducing CQL

• CQL is a reintroduction of schema so that you don’t have to read code to understand the data model.

• CQL creates a common language so that details of the data model can be easily communicated.

• CQL is a best-practices Cassandra interface and hides the messy details.

Let’s see it!

Introducing CQL

CREATE TABLE users ( id timeuuid PRIMARY KEY, lastname varchar, firstname varchar, dateOfBirth timestamp );

Introducing CQL

INSERT INTO users (id,lastname, firstname, dateofbirth) VALUES (now(),'Berryman',’John','1975-09-15');

Introducing CQL

INSERT INTO users (id,lastname, firstname, dateofbirth) VALUES (now(),’Berryman’,’John’,’1975-09-15’);

UPDATE users SET firstname = ’John’ WHERE id = f74c0b20-0862-11e3-8cf6-b74c10b01fc6;

Introducing CQL

INSERT INTO users (id,lastname, firstname, dateofbirth) VALUES (now(),'Berryman',’John','1975-09-15');

UPDATE users SET firstname = 'John’ WHERE id = f74c0b20-0862-11e3-8cf6-b74c10b01fc6;

SELECT dateofbirth,firstname,lastname FROM users ;

dateofbirth | firstname | lastname--------------------------+-----------+---------- 1975-09-15 00:00:00-0400 | John | Berryman

Introducing CQL

“Hey sweet! It’s exactly the same as MySQL!”

Introducing CQL

“Hey sweet! It’s exactly the same as MySQL!”Hold your horses. There are some

important differences.

Introducing CQL

“Wait? What happened to theCassandra’s wide rows?”

Introducing CQL

There’s still there. Understandingthe mapping is crucial!

Introducing CQL

There’s still there. Understandingthe mapping is crucial!

Remember this:• Cassandra finds rows fast

• Cassandra scans columns fast• Cassandra does not scan rows

The CQL/Cassandra Mapping

CREATE TABLE employees ( name text PRIMARY KEY, age int, role text);

name | age | role-----+-----+-----john | 37 | deveric | 38 | ceo

age role

john 37 dev

age role

eric 38 ceo

CREATE TABLE employees ( company text, name text, age int, role text, PRIMARY KEY (company,name));

company | name | age | role--------+------+-----+-----OSC | eric | 38 | ceoOSC | john | 37 | devRKG | anya | 29 | leadRKG | ben | 27 | devRKG | chad | 35 | ops

eric:age eric:role john:age john:role

OSC 38 dev 37 dev

anya:age anya:role ben:age ben:role chad:age chad:role

RKG 29 lead 27 dev 35 ops

CREATE TABLE example ( A text, B text, C text, D text, E text, F text, PRIMARY KEY ((A,B),C,D));

A | B | C | D | E | F --+---+---+---+---+---a | b | c | d | e | fa | b | c | g | h | ia | b | j | k | l | ma | n | o | p | q | rs | t | u | v | w | x

c:d:E c:d:F c:g:E c:g:F j:k:E j:k:F

a:b e f h i l m

o:p:E o:p:F

a:n q r

u:v:E u:v:F

s:t w x

CQL for Sets, Lists, and Maps

• Collection Semanticso Sets hold list of unique elementso Lists hold ordered, possibly repeating elementso Maps hold a list of key-value pairs

• Uses same old Cassandra data structure

• Uses same old Cassandra data structure• Declaring

CREATE TABLE mytable( X text, Y text, myset set<text>, mylist list<int>, mymap map<text, text>, PRIMARY KEY (X,Y));

• Uses same old Cassandra data structure• Declaring

CREATE TABLE mytable( X text, Y text, myset set<text>, mylist list<int>, mymap map<text, text>, PRIMARY KEY (X,Y));

Collection fields can not be used in primary keys

• Inserting

INSERT INTO mytable (row, myset) VALUES (123, { ‘apple’, ‘banana’});

• Inserting

INSERT INTO mytable (row, mylist) VALUES (123, [‘apple’,’banana’,’apple’]);

• Inserting

INSERT INTO mytable (row, mylist) VALUES (123, [‘apple’,’banana’,’apple’]);

INSERT INTO mytable (row, mymap) VALUES (123, {1:’apple’,2:’banana’})

CQL for Sets, Lists, and Maps• UpdatingUPDATE mytable SET myset = myset + {‘apple’,‘banana’} WHERE row = 123;UPDATE mytable SET myset = myset - { ‘apple’ } WHERE row = 123;

UPDATE mytable SET mylist = mylist + [‘apple’,‘banana’] WHERE row = 123;UPDATE mytable SET mylist = [‘banana’] + mylist WHERE row = 123;

UPDATE mytable SET mymap[‘fruit’] = ‘apple’ WHERE row = 123UPDATE mytable SET mymap = mymap + { ‘fruit’:‘apple’} WHERE row = 123

CREATE TABLE mytable( X text, Y text, myset set<int>, PRIMARY KEY (X,Y));

X | Y | myset ---+---+------------ a | b | {1,2} a | c | {3,4,5}

b:myset:1 b:myset:2 c:myset:3 c:myset:4 c:myset:5

CREATE TABLE mytable( X text, Y text, mylist list<int>, PRIMARY KEY (X,Y));

X | Y | mylist ---+---+------------ a | b | [1,2]

b:mylist:f7e5450039..8d b:mylist:f7e5450139..8d

X | Y | mylist ---+---+------------ a | b | [1,2]

b:mylist:f7e5450039..8d b:mylist:f7e5450139..8d

CREATE TABLE mytable( X text, Y text, mymap map<text,int>, PRIMARY KEY (X,Y));

X | Y | mymap ---+---+------------ a | b | {m:1,n:2} a | c |{n:3,p:4,q:5}

b:mymap:m b:mymap:n c:mymap:n c:mymap:p c:mymap:q

a 1 2 3 4 5

Peek Behind the Scenes! Do it!

(in cqlsh)CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};USE test;CREATE TABLE stuff ( a int, b int, myset set<int>, mylist list<int>, mymap map<int,int>, PRIMARY KEY (a,b));UPDATE stuff SET myset = {1,2}, mylist = [3,4,5], mymap = {6:7,8:9} WHERE a = 0 AND b = 1;SELECT * FROM stuff;

(in cassandra-cli)use test;list stuff ;

(in cqlsh)SELECT key_aliases,column_aliases from system.schema_columnfamilies WHERE keyspace_name = 'test' AND columnfamily_name = 'stuff';

Putting it All Together…you already know• CQL is a reintroduction of schema• CQL creates a common data modeling language• CQL is a best-practices Cassandra interface

.OpenSource Connections

…now you know• CQL let’s you take advantage of the C* Data structure

.OpenSource Connections

…now you know• CQL let’s you take advantage of the C* Data structure

…but also• CQL protocol is binary and therefore interoperable with

any language• CQL is asynchronous and fast (Thrift transport layer is

synchronous)• CQL allows the possibility for prepared statements

Thanks!

Follow me on Twitter @JnBrymn

Check out the OpenSource Connection

http://www.opensourceconnections.com/blog/

Understanding How CQL3 Maps to Cassandra's Internal Data Structure

Technology

Transcript of Understanding How CQL3 Maps to Cassandra's Internal Data Structure

Maps, Maps, Maps! What types of maps can we see?.

Nalini Malani's Cassandra's Gift

SPONS AGEliCY 36p. ($3.25). - ERIC · explores relationships between Cassandra's knowledge and beliefs about mathematics, how it is taught and learned, and how she transforms and

Apache Cassandra Lesson: Data Modelling and CQL3

CASSANDRA - KnowGravity Readme.pdf · An alternative way to increase CASSANDRA's memory configuration is to amend the corresponding entries in the CASSANDRA.INI file e.g.: [pro386w]

I-maps and perfect maps

Maps of Malaysia PowerPoint Maps

Maps and More Maps

K-Maps. Outline 2-variable K-maps 3-variable K-maps 4-variable K-maps 5-variable and larger K-maps.

Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

The Geographer’s Tools Maps, Maps, & More Maps YAY!

Maps in the Brain – Introduction. Overview A few words about Maps Cortical Maps: Development and (Re-)Structuring Auditory Maps Visual Maps Place Fields.

Street maps-and-other-maps-signed

Antique Maps (History Maps eBook)

Session 2 Outline Background Aerial Maps USGS Maps Digitized CADD Maps Data base support Background Aerial Maps USGS Maps Digitized CADD Maps Data base.

Google Maps vs iPhone Maps

On cassandra's evolution @ Berlin buzzwords

greets Argos and the gods ... Ostensibly trying to help Cassandra adjust to her new slave status she in fact rubs ... Cassandra's body has been ...

Chapter Eight: Mapping Earth 8.1 Maps 8.2 Topographic Maps 8.3 Bathymetric Maps.

Driving with Google Maps - Association of Personal ......Compass, Maps, GPS( Garmin), Google Maps, Android Auto for Cars. Maps (Garmin) Streaming Maps (Google Maps) ... Promoted by