Introduction to Cassandra

Introduction toCassandra

Shimi Kiviti@shimi_k

Motivation

Scaling

How do you scale your database?● reads● writes

Influential Papers

● Bigtable: A distributed storage system for structured data, 2006

● Dynamo: amazon's highly available key-value store, 2007

Cassandra:● partition and replication - Dynamo● log structure column family - Bigtable

Cassandra Highlights

● Symmetric - all nodes are exactly the same○ No single point of failure○ Linearly scalable○ Ease of administration

● High availability with multiple datacenters● Consistency vs Latency● Read/Write anywhere● Flexible Schema● Column TTL● Distributed Counters

DHT - Distributed Hash Table

DHT

● O(1) node lookup● Explicit replication● Linear Scalability

Consistency

N = Replication factorR = Number of replicas to block when read <= NW = Number of replicas to block when write <= NQuorum = N/2 + 1

When W + R > N there is a full consistencyexamples:

● W = 1, R = N● W = N, R = 1● W = Quorum, R = Quorum

Consistency Level

● Every request defines consistency level○ Any○ One○ Two○ Three○ Quorum○ Local Quorum○ Each Quorum○ All

Data Model

● Keyspace ~ schema● ColumnFamilies ~ table● Rows● Columns

Column Family

Key1 Column Column Column

Key2 Column Column

Column Family

ColumnFamily: { TOK: { chen: 1, ronen: 7 } CityPath: { yuval: 5 }}

Super Column Family

ColumnFamily: { Key: { super1: { name: value, name: value } super2: { name: value } }}

KeyColumn Column ColumnSuper2

Column Column ColumnSuper1

Write

● Any node● Partitioner● Commit log, memtable ● Wait for W responses

Write

● No reads● No seeks● Sequential disk access● Atomic within a column family● Fast● Always writeable (hinted hand-off)

Read

● Choose any node● Partitioner● Wait for R responses● tunable read repair in the background

Read

Read can be from multiple SSTablesSlower then writes

Cache

● There is no need to use memcached● There is an internal configurable cache

○ Key cache○ Row cache

Sorting

When you preform get the result is sorted● Rows are sorted according to the partitioner● Columns in a row are sorted according to the type of the

column name

Partitioner

● RandomPartitioner - Uses hash values as tokens. useful for distributing the load on all nodes.If you use it, set the nodes tokens manually

● OrderPreservePartioner - You can get sorted rows but it will cost you with an even cluster

Column Types

Available types:● Bytes● UTF8● Ascii● Long● Date● UUID● Composite - <Type1>:<Type2>

Column Types

Examples:

Sort1:8 109 vs 810 9

Sort2:dan:8 dan:10dan:10 vs dan:8shimi:1 shimi:1

Clients

● Thrift - Cassandra driver level interface● CQL - Cassandra query language (SQL like)● High level clients:

○ Python○ Java○ Scala○ Clojure○ .Net○ Ruby○ PHP○ Perl○ C++○ Haskel

Cascal - Scala client

Insert column:

session.insert("app" \ "users" \ "shimi" \ "passwd" \ "mypass")

val key = "app" \ "users" \ "shimi"session.insert(key \ "email" \ "shimi.k@...")

Get column value:

val pass = session.get(key \ "passwd")

Cascal

Get multiple columns:

val row = session.list(key)val cols = session.list(key, RangePredicate("email", "passwd"))val cols = session.list(key, ColumnPredicate( List("passwd", "email") ))

Cascal

Get multiple rows:

val family = "app" \ "users"val rows = session.list(family, RangePredicate("dan", "shimi"))val rows = session.list(family, KeyPrdicate("dan", "shimi"))

Cascal

Remove column:session.remove("app" \ "users" \ "shimi" \ "passwd")

Remove row:session.remove("app" \ "users" \ "shimi")

Batch operations:

val deleteCols = Delete(key, ColumnPredicate("age" :: "sex"))val insertEmail = Insert(key \ "email" \ "shimi.k@...")session.batch(insertEmail :: deleteCols)

Guidelines

● Keep together the data you query together● Think about your use case and how you should fetch your

data.● Don't try to normalize your data● You can't win the disk● Be ready to get your hands dirty● There is no single solution for everything. You might

consider using different solutions together

The End

Useful links:● Cassandra, http://cassandra.apache.org/● Wiki http://wiki.apache.org/cassandra/● Cassandra mailing list● IRC● Bigtable, http://labs.google.com/papers/bigtable.html● Dynamo http://www.allthingsdistributed.

com/2007/10/amazons_dynamo.html● Cascal, https://github.com/shimi/cascal

http://cassandra.apache.org/

http://wiki.apache.org/cassandra/

http://labs.google.com/papers/bigtable.html

http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html

http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html

https://github.com/shimi/cascal

Introduction to Cassandra

Technology

Transcript of Introduction to Cassandra