C* Summit EU 2013: The State of CQL

The State of CQL

Sylvain Lebresne (DataStax)

A short CQL primer

New in Cassandra 2.0

Native protocol

What's next?

2/20

A better API for CassandraThrift is not satisfactory:

Cassandra has often been regarded as hard to develop against.

It doesn't have to be that way!

Not user friendly, hard to use.

Low level, very little abstraction.

Hard to evolve (in a backward compatible way).

Unreadable without driver abstraction.

····

3/20

Quick historical notesCQL1 first introduced in Cassandra 0.8, became CQL2 in Cassandra 1.0

"These aren't the CQL you are looking for"

CQL3 (CQL for short thereafter) introduced in Cassandra 1.2

Semantically, CQL1/CQL2 are closer to the Thrift API than to CQL3.

CQL3 is the version that's here to stay: no plan for a CQL4 any time soon.

·····

4/20

A short CQL primer

The Cassandra Query LanguageSyntactically, a subset of SQL (with a few extensions)

INSERT and UPDATE are both upserts

No joins, no sub-queries, no aggregation, ...

Denormalization is the norm: do the work at write time, not read time

·CREATE TABLE users ( user_id uuid, name text, password text, email text, picture_profile blob, PRIMARY KEY (user_id))

CQL

···

6/20

Denormalization: Cassandra modeling 101Efficient queries in Cassandra are based on 2 principles:

Denormalization is the technique that allows to achieve this in practice.

But this means CQL exposes:

the data queried is collocated on one replica set

the data queried is collocated on disk on those replicas

··

how to collocate data on the same replica set

how to collocate data on disk (for a given replica)

··

7/20

This is done in CQL through the primary key

CQL distinguishes 2 sub-parts in the PRIMARY KEY:

This is important, because CQL only allow queries for which an explicit indexexists:

CREATE TABLE inboxes ( user_id uuid, email_id timeuuid, sender text, recipients set<text>, subject text, is_read boolean, PRIMARY KEY (user_id, email_id))

CQL

partition key: decides the node on which the data is storedclustering columns: within the same partition key, (CQL3) rows arephysically ordered following the clustering columns

··

-- Get last 50 emails in user 51b-23-ab8 inboxSELECT * FROM inboxes WHERE user_id=51b-23-ab8 ORDER BY email_id DESC LIMIT 50;

CQL

8/20

CQL main features

For more details:

Collections (set, map and list)

Secondary indexes

Convenience functions (timeuuid, type conversions, ...)

...

····

http://cassandra.apache.org/doc/cql3/CQL.html

http://www.datastax.com/documentation/cql/3.1/webhelp/index.html

··

9/20

http://cassandra.apache.org/doc/cql3/CQL.html

http://www.datastax.com/documentation/cql/3.1/webhelp/index.html

New in Cassandra 2.0

New in Cassandra 2.0Lightweight transactions:

Triggers:

ALTER DROP:

Preparing TIMESTAMP, TTL and LIMIT:

INSERT INTO test (id, name) VALUES (42, 'Tom') IF NOT EXISTS;UPDATE test SET password='newpass' WHERE id=42 IF password='oldpass';

CQL

CREATE TRIGGER myTrigger ON test USING 'my.trigger.Class'; CQL

CREATE TABLE test (k int PRIMARY KEY, prop1 int, prop2 text, prop3 float);ALTER TABLE test DROP prop3;

CQL

SELECT * FROM myTable LIMIT ?;UPDATE myTable USING TTL ? SET v = 2 WHERE k = 'foo';

CQL

11/20

New in Cassandra 2.0Conditional DDL:

Secondary indexes everywhere (almost):

SELECT aliases:

CREATE TABLE IF NOT EXISTS test (k int PRIMARY KEY);DROP KEYSPACE IF EXISTS ks;

CQL

CREATE TABLE timeline ( event_id uuid, created_at timeuuid, content blob, PRIMARY KEY (event_id, created_at));CREATE INDEX ON timeline (created_at);

CQL

SELECT event_id, dateOf(created_at) AS creation_date, FROM timeline;

CQL

12/20

Coming in Cassandra 2.0.2Named bind variables:

Prepared IN:

Limited SELECT DISTINCT:

SELECT * FROM timeline WHERE created_at > :tlow AND created_at <= :thigh AND key = :k;CQL

SELECT * FROM users WHERE user_id IN ?; CQL

CREATE TABLE test ( event_id int, created_at timestamp, content blob, PRIMARY KEY (event_id, created_at));SELECT DISTINCT event_id FROM test;

CQL

13/20

The native protocolA binary transport protocol for CQL

Native protocol

Example usage of the Java driver (https://github.com/datastax/java-driver):

Binary transport protocol for CQL

Query execution, prepared statements, authentication, compression, ...

Asynchronous (allows multiple concurrent queries per connection)

Server notifications (Only generic cluster events currently)

Existing drivers for Java, C#, Python, C++, Golang, ...

·····

Cluster cluster = Cluster.builder().addContactPoint("127.0.0.1").build();Session session = cluster.connect("myKeyspace");

for (Row row : session.execute("SELECT * FROM myTable")) // Do something ...

JAVA

15/20

https://github.com/datastax/java-driver

New in Cassandra 2.0: native protocol 2Cursors:

Batching prepared statements:

One-shot prepare and execute:

SASL for authentication

for (Row row : session.execute("SELECT * FROM myTable")) // Do something ...

JAVA

PreparedStatement ps = session.prepare("INSERT INTO myTable (p1, p1) VALUES (?, ?)");

BatchStatement bs = new BatchStatement();bs.add(ps.bind(0, "v1"));bs.add(ps.bind(1, "v2"));bs.add(ps.bind(2, "v3"));session.execute(bs);

JAVA

session.execute("INSERT INTO users (id, photo) VALUES (?, ?)", someId, photoBytes);JAVA

16/20

What's next?Cassandra 2.1 and beyond

CQL: some ideasStorage engine optimizations for CQL

Secondary index for collections

Server side functions

User defined types

...

·····

18/20

User defined types

CREATE TYPE address ( street text, zip_code int, state text, phones set<text>);

CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses map<text, address>);

INSERT INTO users (id, name) VALUES (234-4a-761, "Sylvain Lebresne");UPDATE users SET addresses["work"] = { street: '777 Mariners Island Blvd #510', zip_code: 94404, state: 'CA', phones: { 650-389-6000 }} WHERE id = 234-4a-761;

CQL

19/20

Thank You!(Questions?)

C* Summit EU 2013: The State of CQL

Technology

Transcript of C* Summit EU 2013: The State of CQL