Introduction to Cassandra
-
Upload
shimik -
Category
Technology
-
view
1.397 -
download
2
description
Transcript of Introduction to Cassandra
![Page 1: Introduction to Cassandra](https://reader033.fdocuments.in/reader033/viewer/2022050920/54b7a20c4a79591c048b45e6/html5/thumbnails/1.jpg)
Introduction toCassandra
Shimi Kiviti@shimi_k
![Page 2: Introduction to Cassandra](https://reader033.fdocuments.in/reader033/viewer/2022050920/54b7a20c4a79591c048b45e6/html5/thumbnails/2.jpg)
Motivation
Scaling
How do you scale your database?● reads● writes
![Page 3: Introduction to Cassandra](https://reader033.fdocuments.in/reader033/viewer/2022050920/54b7a20c4a79591c048b45e6/html5/thumbnails/3.jpg)
![Page 4: Introduction to Cassandra](https://reader033.fdocuments.in/reader033/viewer/2022050920/54b7a20c4a79591c048b45e6/html5/thumbnails/4.jpg)
Influential Papers
● Bigtable: A distributed storage system for structured data, 2006
● Dynamo: amazon's highly available key-value store, 2007
Cassandra:● partition and replication - Dynamo● log structure column family - Bigtable
![Page 5: Introduction to Cassandra](https://reader033.fdocuments.in/reader033/viewer/2022050920/54b7a20c4a79591c048b45e6/html5/thumbnails/5.jpg)
Cassandra Highlights
● Symmetric - all nodes are exactly the same○ No single point of failure○ Linearly scalable○ Ease of administration
● High availability with multiple datacenters● Consistency vs Latency● Read/Write anywhere● Flexible Schema● Column TTL● Distributed Counters
![Page 6: Introduction to Cassandra](https://reader033.fdocuments.in/reader033/viewer/2022050920/54b7a20c4a79591c048b45e6/html5/thumbnails/6.jpg)
DHT - Distributed Hash Table
![Page 7: Introduction to Cassandra](https://reader033.fdocuments.in/reader033/viewer/2022050920/54b7a20c4a79591c048b45e6/html5/thumbnails/7.jpg)
DHT
● O(1) node lookup● Explicit replication● Linear Scalability
![Page 8: Introduction to Cassandra](https://reader033.fdocuments.in/reader033/viewer/2022050920/54b7a20c4a79591c048b45e6/html5/thumbnails/8.jpg)
![Page 9: Introduction to Cassandra](https://reader033.fdocuments.in/reader033/viewer/2022050920/54b7a20c4a79591c048b45e6/html5/thumbnails/9.jpg)
Consistency
N = Replication factorR = Number of replicas to block when read <= NW = Number of replicas to block when write <= NQuorum = N/2 + 1
When W + R > N there is a full consistencyexamples:
● W = 1, R = N● W = N, R = 1● W = Quorum, R = Quorum
![Page 10: Introduction to Cassandra](https://reader033.fdocuments.in/reader033/viewer/2022050920/54b7a20c4a79591c048b45e6/html5/thumbnails/10.jpg)
Consistency Level
● Every request defines consistency level○ Any○ One○ Two○ Three○ Quorum○ Local Quorum○ Each Quorum○ All
![Page 11: Introduction to Cassandra](https://reader033.fdocuments.in/reader033/viewer/2022050920/54b7a20c4a79591c048b45e6/html5/thumbnails/11.jpg)
Data Model
● Keyspace ~ schema● ColumnFamilies ~ table● Rows● Columns
![Page 12: Introduction to Cassandra](https://reader033.fdocuments.in/reader033/viewer/2022050920/54b7a20c4a79591c048b45e6/html5/thumbnails/12.jpg)
Column Family
Key1 Column Column Column
Key2 Column Column
![Page 13: Introduction to Cassandra](https://reader033.fdocuments.in/reader033/viewer/2022050920/54b7a20c4a79591c048b45e6/html5/thumbnails/13.jpg)
Column Family
ColumnFamily: { TOK: { chen: 1, ronen: 7 } CityPath: { yuval: 5 }}
![Page 14: Introduction to Cassandra](https://reader033.fdocuments.in/reader033/viewer/2022050920/54b7a20c4a79591c048b45e6/html5/thumbnails/14.jpg)
Super Column Family
ColumnFamily: { Key: { super1: { name: value, name: value } super2: { name: value } }}
KeyColumn Column ColumnSuper2
Column Column ColumnSuper1
![Page 15: Introduction to Cassandra](https://reader033.fdocuments.in/reader033/viewer/2022050920/54b7a20c4a79591c048b45e6/html5/thumbnails/15.jpg)
Write
● Any node● Partitioner● Commit log, memtable ● Wait for W responses
![Page 16: Introduction to Cassandra](https://reader033.fdocuments.in/reader033/viewer/2022050920/54b7a20c4a79591c048b45e6/html5/thumbnails/16.jpg)
Write
![Page 17: Introduction to Cassandra](https://reader033.fdocuments.in/reader033/viewer/2022050920/54b7a20c4a79591c048b45e6/html5/thumbnails/17.jpg)
Write
● No reads● No seeks● Sequential disk access● Atomic within a column family● Fast● Always writeable (hinted hand-off)
![Page 18: Introduction to Cassandra](https://reader033.fdocuments.in/reader033/viewer/2022050920/54b7a20c4a79591c048b45e6/html5/thumbnails/18.jpg)
Read
● Choose any node● Partitioner● Wait for R responses● tunable read repair in the background
![Page 19: Introduction to Cassandra](https://reader033.fdocuments.in/reader033/viewer/2022050920/54b7a20c4a79591c048b45e6/html5/thumbnails/19.jpg)
Read
Read can be from multiple SSTablesSlower then writes
![Page 20: Introduction to Cassandra](https://reader033.fdocuments.in/reader033/viewer/2022050920/54b7a20c4a79591c048b45e6/html5/thumbnails/20.jpg)
Cache
● There is no need to use memcached● There is an internal configurable cache
○ Key cache○ Row cache
![Page 21: Introduction to Cassandra](https://reader033.fdocuments.in/reader033/viewer/2022050920/54b7a20c4a79591c048b45e6/html5/thumbnails/21.jpg)
Sorting
When you preform get the result is sorted● Rows are sorted according to the partitioner● Columns in a row are sorted according to the type of the
column name
![Page 22: Introduction to Cassandra](https://reader033.fdocuments.in/reader033/viewer/2022050920/54b7a20c4a79591c048b45e6/html5/thumbnails/22.jpg)
Partitioner
● RandomPartitioner - Uses hash values as tokens. useful for distributing the load on all nodes.If you use it, set the nodes tokens manually
● OrderPreservePartioner - You can get sorted rows but it will cost you with an even cluster
![Page 23: Introduction to Cassandra](https://reader033.fdocuments.in/reader033/viewer/2022050920/54b7a20c4a79591c048b45e6/html5/thumbnails/23.jpg)
Column Types
Available types:● Bytes● UTF8● Ascii● Long● Date● UUID● Composite - <Type1>:<Type2>
![Page 24: Introduction to Cassandra](https://reader033.fdocuments.in/reader033/viewer/2022050920/54b7a20c4a79591c048b45e6/html5/thumbnails/24.jpg)
Column Types
Examples:
Sort1:8 109 vs 810 9
Sort2:dan:8 dan:10dan:10 vs dan:8shimi:1 shimi:1
![Page 25: Introduction to Cassandra](https://reader033.fdocuments.in/reader033/viewer/2022050920/54b7a20c4a79591c048b45e6/html5/thumbnails/25.jpg)
Clients
● Thrift - Cassandra driver level interface● CQL - Cassandra query language (SQL like)● High level clients:
○ Python○ Java○ Scala○ Clojure○ .Net○ Ruby○ PHP○ Perl○ C++○ Haskel
![Page 26: Introduction to Cassandra](https://reader033.fdocuments.in/reader033/viewer/2022050920/54b7a20c4a79591c048b45e6/html5/thumbnails/26.jpg)
Cascal - Scala client
Insert column:
session.insert("app" \ "users" \ "shimi" \ "passwd" \ "mypass")
val key = "app" \ "users" \ "shimi"session.insert(key \ "email" \ "shimi.k@...")
Get column value:
val pass = session.get(key \ "passwd")
![Page 27: Introduction to Cassandra](https://reader033.fdocuments.in/reader033/viewer/2022050920/54b7a20c4a79591c048b45e6/html5/thumbnails/27.jpg)
Cascal
Get multiple columns:
val row = session.list(key)val cols = session.list(key, RangePredicate("email", "passwd"))val cols = session.list(key, ColumnPredicate( List("passwd", "email") ))
![Page 28: Introduction to Cassandra](https://reader033.fdocuments.in/reader033/viewer/2022050920/54b7a20c4a79591c048b45e6/html5/thumbnails/28.jpg)
Cascal
Get multiple rows:
val family = "app" \ "users"val rows = session.list(family, RangePredicate("dan", "shimi"))val rows = session.list(family, KeyPrdicate("dan", "shimi"))
![Page 29: Introduction to Cassandra](https://reader033.fdocuments.in/reader033/viewer/2022050920/54b7a20c4a79591c048b45e6/html5/thumbnails/29.jpg)
Cascal
Remove column:session.remove("app" \ "users" \ "shimi" \ "passwd")
Remove row:session.remove("app" \ "users" \ "shimi")
Batch operations:
val deleteCols = Delete(key, ColumnPredicate("age" :: "sex"))val insertEmail = Insert(key \ "email" \ "shimi.k@...")session.batch(insertEmail :: deleteCols)
![Page 30: Introduction to Cassandra](https://reader033.fdocuments.in/reader033/viewer/2022050920/54b7a20c4a79591c048b45e6/html5/thumbnails/30.jpg)
Guidelines
● Keep together the data you query together● Think about your use case and how you should fetch your
data.● Don't try to normalize your data● You can't win the disk● Be ready to get your hands dirty● There is no single solution for everything. You might
consider using different solutions together
![Page 31: Introduction to Cassandra](https://reader033.fdocuments.in/reader033/viewer/2022050920/54b7a20c4a79591c048b45e6/html5/thumbnails/31.jpg)
The End
Useful links:● Cassandra, http://cassandra.apache.org/● Wiki http://wiki.apache.org/cassandra/● Cassandra mailing list● IRC● Bigtable, http://labs.google.com/papers/bigtable.html● Dynamo http://www.allthingsdistributed.
com/2007/10/amazons_dynamo.html● Cascal, https://github.com/shimi/cascal