C* Summit EU 2013: The State of CQL
-
Upload
planet-cassandra -
Category
Technology
-
view
514 -
download
5
description
Transcript of C* Summit EU 2013: The State of CQL
The State of CQL
Sylvain Lebresne (DataStax)
A short CQL primer
New in Cassandra 2.0
Native protocol
What's next?
2/20
A better API for CassandraThrift is not satisfactory:
Cassandra has often been regarded as hard to develop against.
It doesn't have to be that way!
Not user friendly, hard to use.
Low level, very little abstraction.
Hard to evolve (in a backward compatible way).
Unreadable without driver abstraction.
····
3/20
Quick historical notesCQL1 first introduced in Cassandra 0.8, became CQL2 in Cassandra 1.0
"These aren't the CQL you are looking for"
CQL3 (CQL for short thereafter) introduced in Cassandra 1.2
Semantically, CQL1/CQL2 are closer to the Thrift API than to CQL3.
CQL3 is the version that's here to stay: no plan for a CQL4 any time soon.
·····
4/20
A short CQL primer
The Cassandra Query LanguageSyntactically, a subset of SQL (with a few extensions)
INSERT and UPDATE are both upserts
No joins, no sub-queries, no aggregation, ...
Denormalization is the norm: do the work at write time, not read time
·CREATE TABLE users ( user_id uuid, name text, password text, email text, picture_profile blob, PRIMARY KEY (user_id))
CQL
···
6/20
Denormalization: Cassandra modeling 101Efficient queries in Cassandra are based on 2 principles:
Denormalization is the technique that allows to achieve this in practice.
But this means CQL exposes:
the data queried is collocated on one replica set
the data queried is collocated on disk on those replicas
··
how to collocate data on the same replica set
how to collocate data on disk (for a given replica)
··
7/20
This is done in CQL through the primary key
CQL distinguishes 2 sub-parts in the PRIMARY KEY:
This is important, because CQL only allow queries for which an explicit indexexists:
CREATE TABLE inboxes ( user_id uuid, email_id timeuuid, sender text, recipients set<text>, subject text, is_read boolean, PRIMARY KEY (user_id, email_id))
CQL
partition key: decides the node on which the data is storedclustering columns: within the same partition key, (CQL3) rows arephysically ordered following the clustering columns
··
-- Get last 50 emails in user 51b-23-ab8 inboxSELECT * FROM inboxes WHERE user_id=51b-23-ab8 ORDER BY email_id DESC LIMIT 50;
CQL
8/20
CQL main features
For more details:
Collections (set, map and list)
Secondary indexes
Convenience functions (timeuuid, type conversions, ...)
...
····
http://cassandra.apache.org/doc/cql3/CQL.html
http://www.datastax.com/documentation/cql/3.1/webhelp/index.html
··
9/20
New in Cassandra 2.0
New in Cassandra 2.0Lightweight transactions:
Triggers:
ALTER DROP:
Preparing TIMESTAMP, TTL and LIMIT:
INSERT INTO test (id, name) VALUES (42, 'Tom') IF NOT EXISTS;UPDATE test SET password='newpass' WHERE id=42 IF password='oldpass';
CQL
CREATE TRIGGER myTrigger ON test USING 'my.trigger.Class'; CQL
CREATE TABLE test (k int PRIMARY KEY, prop1 int, prop2 text, prop3 float);ALTER TABLE test DROP prop3;
CQL
SELECT * FROM myTable LIMIT ?;UPDATE myTable USING TTL ? SET v = 2 WHERE k = 'foo';
CQL
11/20
New in Cassandra 2.0Conditional DDL:
Secondary indexes everywhere (almost):
SELECT aliases:
CREATE TABLE IF NOT EXISTS test (k int PRIMARY KEY);DROP KEYSPACE IF EXISTS ks;
CQL
CREATE TABLE timeline ( event_id uuid, created_at timeuuid, content blob, PRIMARY KEY (event_id, created_at));CREATE INDEX ON timeline (created_at);
CQL
SELECT event_id, dateOf(created_at) AS creation_date, FROM timeline;
CQL
12/20
Coming in Cassandra 2.0.2Named bind variables:
Prepared IN:
Limited SELECT DISTINCT:
SELECT * FROM timeline WHERE created_at > :tlow AND created_at <= :thigh AND key = :k;CQL
SELECT * FROM users WHERE user_id IN ?; CQL
CREATE TABLE test ( event_id int, created_at timestamp, content blob, PRIMARY KEY (event_id, created_at));SELECT DISTINCT event_id FROM test;
CQL
13/20
The native protocolA binary transport protocol for CQL
Native protocol
Example usage of the Java driver (https://github.com/datastax/java-driver):
Binary transport protocol for CQL
Query execution, prepared statements, authentication, compression, ...
Asynchronous (allows multiple concurrent queries per connection)
Server notifications (Only generic cluster events currently)
Existing drivers for Java, C#, Python, C++, Golang, ...
·····
Cluster cluster = Cluster.builder().addContactPoint("127.0.0.1").build();Session session = cluster.connect("myKeyspace");
for (Row row : session.execute("SELECT * FROM myTable")) // Do something ...
JAVA
15/20
New in Cassandra 2.0: native protocol 2Cursors:
Batching prepared statements:
One-shot prepare and execute:
SASL for authentication
for (Row row : session.execute("SELECT * FROM myTable")) // Do something ...
JAVA
PreparedStatement ps = session.prepare("INSERT INTO myTable (p1, p1) VALUES (?, ?)");
BatchStatement bs = new BatchStatement();bs.add(ps.bind(0, "v1"));bs.add(ps.bind(1, "v2"));bs.add(ps.bind(2, "v3"));session.execute(bs);
JAVA
session.execute("INSERT INTO users (id, photo) VALUES (?, ?)", someId, photoBytes);JAVA
16/20
What's next?Cassandra 2.1 and beyond
CQL: some ideasStorage engine optimizations for CQL
Secondary index for collections
Server side functions
User defined types
...
·····
18/20
User defined types
CREATE TYPE address ( street text, zip_code int, state text, phones set<text>);
CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses map<text, address>);
INSERT INTO users (id, name) VALUES (234-4a-761, "Sylvain Lebresne");UPDATE users SET addresses["work"] = { street: '777 Mariners Island Blvd #510', zip_code: 94404, state: 'CA', phones: { 650-389-6000 }} WHERE id = 234-4a-761;
CQL
19/20
Thank You!(Questions?)