Cassandra at arkivum
-
Upload
arkivum -
Category
Technology
-
view
1.141 -
download
0
description
Transcript of Cassandra at arkivum
About Arkivum
• We offer a safe, secure archive service for digital data
• We use data archiving expertise to keep data for the long-term: for
years, decades or forever
• Our service allows our customers to meet their compliance needs
and asset retention goals whilst focusing on their core business
© Arkivum Limited, 2012
2
Our architecture
© Arkivum Limited, 2012
• Gateway appliance is installed at customer site running our software,
talking across WAN using secure VPN to our software in our DCs
• File data is encrypted and stored on variety of storage media,
including SSD, hard disk and tape
• Focus is on maintaining long term data integrity, not low latency or
high availability
3
Legacy design
• Original code used an SQL database
• Our knowledge was biased towards RDBMS
• Normalization, JDBC, ACID, mature platform
• The software design assumed SQL
• Indexes and ad-hoc queries gave basic search functionality for
relatively little extra effort
© Arkivum Limited, 2012
4
Relational model of a file system CREATE TABLE files (
file_id VARCHAR NOT NULL PRIMARY KEY,
parent_id VARCHAR NOT NULL,
name VARCHAR NOT NULL,
size BIGINT DEFAULT 0,
created_date DATETIME DEFAULT CURRENT_TIMESTAMP,
modified_date DATETIME DEFAULT CURRENT_TIMESTAMP,
owner_uid INT DEFAULT 0,
owner_gid INT DEFAULT 0,
file_mode INT DEFAULT 493,
file_attr INT DEFAULT 0,
UNIQUE(parent_id, name)
);
© Arkivum Limited, 2012
5
Relational model of a file system Get a file by id SELECT * FROM files WHERE
file_id = 'f90b3e92-0e96-482f-b4e5-f1ca071f26d6';
List all files in a particular directory SELECT * FROM files WHERE
parent_id = 'e98eaaaa-07a6-4ffa-bd21-f3975529718b';
List all files modified in April 2010 and sort by size SELECT * FROM files WHERE
modified_date > '2010-03-31'
AND modified_date < '2010-05-01'
ORDER BY bytesize DESC;
© Arkivum Limited, 2012
6
Why Cassandra?
• Scalability
• Meets our need to scale to billions of records
• Designed for high-availability, high-throughput environments
• Replication
• Data safety is paramount to us
• Cassandra replication is a really strong feature
• Stability
• Well supported and used worldwide in high-profile, high-end
production systems
© Arkivum Limited, 2012
7
Cassandra model of a file system
Approach 1: Pretend we're using a relational database
• Use column families as if they're tables
• Use CQL because it's like SQL
• Create secondary indexes for everything in case we want to query
on it later
© Arkivum Limited, 2012
"parent_id" "name" "size" "modified" "accessed" "gid" "uid" "mode"
file_id UUID UTF8 Long Long Long Long Long Long
Files CF
8
Cassandra model of a file system
Approach 1 doesn't work
• Column families are not tables
• CQL looks like SQL, but isn't SELECT * FROM Files WHERE modified > '2010-03-31';
• Secondary indexes aren't cheap
• Can't sort based on column values, only on column names
© Arkivum Limited, 2012
9
Cassandra model of a file system
Approach 2: Use composite types and blobs
• Serialize file record and store as single object instead of multiple
values
• Use actual values as part of composite column name, so we can
search and sort based on them
© Arkivum Limited, 2012
(name, size, mtime, atime, gid, uid, mode)
(parent_id, file_id) file_blob
Files CF
10
Cassandra model of a file system
Approach 2 doesn't work either
• Need to know all the values for a composite to query based on it -
otherwise it means a range query, which is expensive file_exists = len(list(files_cf.get_range(
start = CompositeType(MIN_UUID, file_id),
finish = CompositeType(MAX_UUID, file_id),
row_count = 1))) == 1
• Sorting compares the entire composite, not each field [CompositeType('apples', 6), CompositeType('bananas', 2),
CompositeType('oranges', 5), CompositeType('pears', 4)]
© Arkivum Limited, 2012
11
Cassandra model of a file system
Approach 3: De-normalize
• Look at the most common queries and optimize for those
• Most lookups should require just a single get or slice query
• Speed vs. space: do we really care if a record is stored twice?
© Arkivum Limited, 2012
"file"
file_id file_blob
name
parent_id file_blob
Files CF Directories CF
12
Cassandra model of a file system
Approach 3 works
Get a file by id file = unpackFile(
files_cf.get(key=file_id, columns=['file']))
List all files in a particular directory files = unpackFiles(list(
directories_cf.get(key=directory_id)))
© Arkivum Limited, 2012
13
Lessons learned
• CQL isn't necessarily the easiest or best interface
• Break the golden rule
• Composites are useful under limited circumstances
• Avoid wide rows, they can lead to pain
• Should focus on queries that are most important
• Post-processing or Map/Reduce can be used to meet needs of less
common queries
© Arkivum Limited, 2012
14
Cassandra and network usage
10Mbit connection, replicating to 2 nodes
© Arkivum Limited, 2012
15
Cassandra and network usage
So how can it be used on a slow WAN?
• Tune down the message and packet size rpc_send_buff_size_in_bytes
rpc_recv_buff_size_in_bytes
thrift_framed_transport_size_in_mb
thrift_max_message_length_in_mb
• Be prepared for higher failure rates when things get busy rpc_timeout_in_ms
• Use an additional cache layer to reduce network I/O
© Arkivum Limited, 2012
16
Cassandra and network usage
© Arkivum Limited, 2012
10MBit connection, replicating to 2 nodes, after tuning
17
Cassandra and network usage
© Arkivum Limited, 2012
Cassandra replication
is better than DIY
alternative
18
Configuring Cassandra is key
Cassandra has lots of configuration options.
Taking time to understand and tweak them is worth the effort. Leaving
them as default probably won't give the best results.
Determine custom policies for how often to compact, repair, scrub,
etc. as these depend on the profile of the data being stored.
© Arkivum Limited, 2012
19
Future work
Continuing to scale our systems to cope with growing load and data
volumes
Adding additional search capabilities
Applying analytics to better understand how people are using our
service to store petabytes of data
© Arkivum Limited, 2012
20
Summary
• Arkivum provides a guaranteed service for long-term data archive
• We've transitioned our data model from RDBMS to Cassandra
• Our Cassandra deployment is multi-DC, multi-site across WAN
• Future tasks include improving search and using analytics
21
Cassandra cheat sheet
© Arkivum Limited, 2012
23