The myth of Cassandra
-
Upload
cameron-kilgore -
Category
Technology
-
view
6.778 -
download
1
description
Transcript of The myth of Cassandra
The Myth of Cassandra
I’ve had it with these crazed oracles
Cameron Kilgore | @thrillgore
NoSQL Series
Cas·san·dra [kəˈsændrə], noun
1) [Classical Greek Mythology.] A daughter of Priam and Hecuba, a prophet cursed by Apollo so that her prophecies, though true, were fated never to be believed.
2) [fml. “Apache Cassandra”] An open-source distributed, non-relational (NoSQL) database developed at Facebook, written in Java, and maintained as an Apache Software Foundation product
What Cassandra does
Nonrelational associative array (key-value) data storage
Distributed One-hop DHT (akin to Amazon Dynamo)
Eventually Consistent Column-based storage Queries faster than MySQL
Based on white papers and real-world use cases Fault tolerant
Provides no single point of failure Load balancing
What Cassandra Does Not Revision History Relational Data
There’s this thing called “MySQL” that might be just up your alley
Provide an admin app Chiton is an in-development desktop app▪ http://github.com/driftx/chiton
Store individual data fields greater than 231-1 (2,147,483,647) bytes
Provide any interfaces outside of Thrift or high-level interfaces
She who entangles companies
Already at use at Facebook Also being used at:
Digg Reddit Twitter Rackspace Cisco IBM Cloudkick OpenX And more…
Introducing CassandraUnderstanding the concepts of data in Cassandra, scalability
Columns and Data
Data is stored in columns, each organized by keyspaces
Each column stores data and can be culled based on its name value, akin to an associative array
Key1
Column
Key2
Column
Column
Key3ColumnColumnColumnColumn
+name: byte[]+value: byte[]+timestamp: long
Supercolumns
What happens when Xzibit uses Cassandra Supercolumns allow you to nest n number of
columns in another column And in return in a key you can nest n number of
supercolumns. (not shown here due to Office fail)
Key2
Supercolumn
Column
Column
Column
Column
Key1
Supercolumn
Column
Column
Column
Anatomy of a Column
Cassandra is written in Java, so we abide by the rules of its variables Most of them will be bytestrings (byte[]), set
in Unicode +time being the only value not stored as a
bytestring, instead as a long▪ Java compares the +time across other Cassandra
nodes to reconcile data across nodes▪ Is NOT used for revision history
Each column represented by an unseen UUID
Anatomy of a Column (cont.)
Columns are found by their +name value, not their UUID
You cannot have multiple columns of the same name (assigning one with the same name rewrites an existing one in that given keyspace)
Accessing the Data
Data accessed through the Apache Incubator™ Thrift API
Thrift can be accessed with any programming language or application
High-level implementations for languages exist
For our demos we’re going to use the cassandra-cli client, which gives us the ability to insert/remove/edit
<INSERT CALL TO DEMO HERE>OH GOD HOW DID I GET HERE I AM NOT GOOD WITH COMPUTER
Security in Cassandra
Cassandra does have user authentication through a SimpleAuthenticator module that is configured in conf files Very rudimentary
Ran out of time and suitable documentation to demonstrate it
Cassandra is not ACID-compliant
Load Balancing
Cassandra 0.6 has load balancing capabilities Not automatic, must be configured per
node Load is shared in a token-ring fashion
across the nodes in a multi-node configuration
Covered in the documentation for Cassandra
Monitoring Cassandra
Cassandra exposes metrics as JMX data, so any JMX monitoring app should be sufficient. Nagios Munin OpenNMS Any official Oracle™ Java monitoring and
administration software▪ What? I can’t be bothered to not search for the name of
the software? Cassandra also has software for monitoring
node activity, check the docs
Use Case ExampleAnd a very simple one at that
Product Ordering Application
An ordering application implemented using a SQL database could span hundreds of tables and require constant iterations over its lifespan
What if the attributes of these products (in this case, HVAC components) were stored in Cassandra, and we kept pricing, users, and sessions data in a RDBMS?
Benefits to Cassandra
The data for these products that might need to be added won’t require new RDBMS fields – we can just add them in new columns and write our code statements to ignore them if they aren’t there
We aren’t limited to bottlenecks in the RDBMS if we choose to go multinode in our Cassandra setup
No single point of failure if we choose to go multinode
If we get a lot of users (unlikely), the nodes will equally distribute the load
Less time spent on queries Depends on how effective our data is stored and the
performance of our application
Downsides to Cassandra
We may not have the funding needed to procure a multinode configuration
No guarantee that existing data that might need to be reconfigured might be changed over time to meet the demands of sales, engineering, executive, etc.
Data collected and given some form of relation inside the application itself, with no schema
Cassandra lacks a vetted security framework that could put us at risk
Cassandra also lacks a complete administration application Chiton is barely functional as-is
Might not make sense when some RDBMS can scale across machines
A (crude) data map showing our data in practice
Louvers keyspace
Louver supercolumn
PriceProduct Multipli
erHeight Width
Actuator
Options
Misc. Options Paint
Cassandra and PHPThis is a PHP User group after all.
Talking to Cassandra
Low-level framework, Thrift, is the actual client API for Cassandra
In PHP we have two such frameworks that work through Thrift phpcassa Pandra
Ran out of time to prepare a demo There’s always another time for a demo.
Stay tuned.
Any Questions?You will be baked, and there will be cake