Cassandra – A Decentralized Structured Storage System

24
Cassandra – A Decentralized Structured Storage System A. Lakshaman 1 , P.Malik 1 1 Facebook SIGOPS ‘10 2011. 03. 18. Summarized and Presented by Sang-il Song, IDS Lab., Seoul National University

description

Cassandra – A Decentralized Structured Storage System. A. Lakshaman 1 , P.Malik 1 1 Facebook SIGOPS ‘10 2011. 03. 18. Summarized and Presented by Sang-il Song , IDS Lab., Seoul National University. The Rise of NoSQL. Refer to http :// www.google.com / trends?q = nosql. - PowerPoint PPT Presentation

Transcript of Cassandra – A Decentralized Structured Storage System

Page 1: Cassandra – A Decentralized  Structured Storage System

Cassandra – A Decentralized Structured Storage System

A. Lakshaman1, P.Malik1

1Facebook

SIGOPS ‘10

2011. 03. 18.

Summarized and Presented by Sang-il Song, IDS Lab., Seoul National University

Page 2: Cassandra – A Decentralized  Structured Storage System

Copyright 2010 by CEBT

The Rise of NoSQL

2

Refer to http://www.google.com/trends?q=nosql

Eric Evans, a Rackpage employee, reintroduce the term NoSQL in earlier 2009 when Johan Oskarsson of Last.fm wanted to or-ganize an event to discuss open-source distributed databases.

The name attempted to label the emergence of growing dis-tributed data stores that often did not attempt to provide ACID guarantees

Page 3: Cassandra – A Decentralized  Structured Storage System

Copyright 2010 by CEBT

NoSQL Database

Based on Key-value

memchached, Dynamo, Volemort, Tokyo Cabinet

Based on Column

Google BigTable, Cloudata, Hbase, Hypertable, Cassandra

Based on Document

MongoDB, CouchDB

Based on Graph

Meo4j, FlockDB, InfiniteGraph

3

Page 4: Cassandra – A Decentralized  Structured Storage System

Copyright 2010 by CEBT

NoSQL BigData Database

Based on Key-Value

memchached, Dynamo, Volemort, Tokyo Cabinet

Based on Column

Google BigTable, Cloudata, Hbase, Hypertable, Cassandra

Based onDocument

MongoDB, CouchDB

Based on Graph

Meo4j, FlockDB, InfiniteGraph

4

Page 5: Cassandra – A Decentralized  Structured Storage System

Copyright 2010 by CEBT 5

Refer to http://blog.nahurst.com/visual-guide-to-nosql-systems

Page 6: Cassandra – A Decentralized  Structured Storage System

Copyright 2010 by CEBT

Contents

Introduction

Remind: Dynamo

Cassandra

Data Model

System Architecture

Partitioning

Replication

Membership

Bootstrapping

6

Operations

WRITE

READ

Consistency level

Performance Bench-mark

Case Study

Conclusion

Page 7: Cassandra – A Decentralized  Structured Storage System

Copyright 2010 by CEBT

Remind: Dynamo

Distributed Hash Table

BASE

Basically Available

Soft-state

Eventually Consistent

Client Tunable consistency/availability

7

NRW Configuration

W=N, R=1 Read optimized strong consis-tency

W=1, R=N Write optimized strong consis-tency

W+R ≦ N Weak eventual consistency

W+R > N Strong consistency

Page 8: Cassandra – A Decentralized  Structured Storage System

Copyright 2010 by CEBT

Cassandra

Dynamo-Bigtable lovechild

Column-based data model

Distributed Hash Table

Tunable tradeoff

– Consistency vs. Latency

Properties

No single point of Failure

Linearly scalable

Flexible partitioning, replica placement

High Availability (eventually consistency)

8

Page 9: Cassandra – A Decentralized  Structured Storage System

Copyright 2010 by CEBT

Data Model

Cluster

Key Space is corresponding to db or table space

Column Family is corresponding to table

Column is unit of data stored in Cassandra

9

Row Key Column Family: “User” Column Family: “Article”

“userid1”

name: Username, value: uname1name: Email, value: [email protected]

name: Tel, value: 123-4567

“userid2”

name: Username, value: uname2name: Email, value: [email protected]

name: Tel, value: 123-4568

name: ArticleId, value:userid2-1name: ArticleId, value:userid2-2name: ArticleId, value:userid2-3

“userid3”

name: Username, value: uname3name: Email, value: [email protected]

name: Tel, value: 123-4569

Page 10: Cassandra – A Decentralized  Structured Storage System

Copyright 2010 by CEBT

System Architecture

Partitioning

Replication

Membership

Bootstraping

10

Page 11: Cassandra – A Decentralized  Structured Storage System

Copyright 2010 by CEBT

Partitioning Algorithm

Distributed Hash Table

Data and Server are located in the same address space

Consistent Hashing

Key Space Partition: arrangement of the key

Overlay Networking: Routing Mechanism

11

N1

N3 N2

Hash(key1)value

N3

N2

hash(key1)

N1

high

low

N2 is deemed the coordinator of key 1

Page 12: Cassandra – A Decentralized  Structured Storage System

Copyright 2010 by CEBT

Partitioning Algorithm (cont’d)

Challenges

Non-uniform data and load distribution

Oblivious to the heterogenity in the performance of nodes

Solutions

Nodes get assigned to multiple positions in the circle (like Dynamo)

Analyze load information on the ring and have lightly loads move on the ring to alleviate heavily loaded nodes (like Cassandra)

12

N1

N3 N2

N2

N1

N3 N2

N2N1

N2

N3N2

N1

N3

Page 13: Cassandra – A Decentralized  Structured Storage System

Copyright 2010 by CEBT

Replication

RackUnware

RackAware

DataCenter-Shared

13

E

AB

C

D

FGH

I

J data1

Coordinator of data 1

Page 14: Cassandra – A Decentralized  Structured Storage System

Copyright 2010 by CEBT

Cluster Membership

Gossip Protocol is used for cluster membership

Super lightweight with mathematically provable proper-ties

State disseminated in O(logN) rounds

Every T Seconds each member increments its heartbeat counter and selects one other member send its list to

A member merges the list with its own list

14

Page 15: Cassandra – A Decentralized  Structured Storage System

Copyright 2010 by CEBT

Gossip Protocol

15

server 1

server1: t1

t1

server 1

server1: t1

server 2

server2: t2

t2

server 1

server1: t1server2: t2

server 2

server2: t2

t3

server 1

server1: t4server2: t2

server 2

server1: t4server2: t2

t4

server 1server1: t4server2: t2server3 :t5

server 2

server1: t4server2: t2

t5

server 3

server3: t5

server 1server1: t6server2: t2server3 :t5

server 2server1: t6server2: t6server3: t5

t6

server 3server1: t6server2: t6server3: t5

Page 16: Cassandra – A Decentralized  Structured Storage System

Copyright 2010 by CEBT

Accrual Failure Detector

Valuable for system management, replication, load bal-ancing

Designed to adapt to changing network conditions

The value output, PHI, represents a suspicion level

Applications set an appropriate threshold, trigger suspi-cions and perform appropriate actions

In Cassandra the average time taken to detect a failure is 10-15 seconds with the PHI threshold set at 5

16

where

Page 17: Cassandra – A Decentralized  Structured Storage System

Copyright 2010 by CEBT

Bootstraping

New node gets assigned a token such that it can allevi-ate a heavily loaded node

17

N1

N2

N1

N3 N2

Page 18: Cassandra – A Decentralized  Structured Storage System

Copyright 2010 by CEBT

WRITE

Interface

Simple: put(key,col,value)

Complex: put(key,[col:val,…,col:val])

Batch

WRITE Opertation

Commit log for durability

– Configurable fsync

– Sequential writes only

MemTable

– Nodisk access (no reads and seek)

Sstables are final

– Read-only

– indexes

Always Writable

18

Page 19: Cassandra – A Decentralized  Structured Storage System

Copyright 2010 by CEBT

READ

Interface

get(key,column)

get_slice(key,SlicePredicate)

Get_range_sllices(keyRange,SlicePredicate)

READ

Practically lock-free

Sstable proliferation

Row cache

Key cache

19

Page 20: Cassandra – A Decentralized  Structured Storage System

Copyright 2010 by CEBT

Consistency Level

Level Description

ZERO Hail Mary

ANY 1 replica

ONE 1 replica

QUORUM (N/2)+1

ALL All replica

20

Level Description

ZERO N/A

ANY N/A

ONE 1 replica

QUORUM (N/2)+1

ALL All replica

Write Operation Read Operation

Tuning the consistency level for each WRITE/READ opera-tion

Page 21: Cassandra – A Decentralized  Structured Storage System

Copyright 2010 by CEBT

Performance Benchmark

Random and Sequential Writes

Limited by bandwidth

Facebook Inbox Search

Two kinds of Search

– Term Search

– Interactions

50+TB on 150 node cluster

21

Latency Stat Search Interac-tions

Term Search

Min 7.69ms 7.78ms

Median 15.69ms 18.27ms

Max 26.13ms 44.41ms

Page 22: Cassandra – A Decentralized  Structured Storage System

Copyright 2010 by CEBT

vs MySQL with 50GB Data

MySQL

~300ms write

~350ms read

Cassandra

~0.12ms write

~15ms read

22

Page 23: Cassandra – A Decentralized  Structured Storage System

Copyright 2010 by CEBT

Case Study

Cassandra as primary data store

Datacenter and rack-aware replica-tion

~1,000,000 ops/s

high sharding and low replication

Inbox Search

100TB

5,000,000,000 writes per day

23

Page 24: Cassandra – A Decentralized  Structured Storage System

Copyright 2010 by CEBT

Conclusions

Cassandra

Scalability

High Performance

Wide Applicability

Future works

Compression

Atomicity

Secondary Index

24