Case study: CASSANDRA% - Jordi Torres · 3 Architecture ! The architecture of Cassandra is...

22
Case study: CASSANDRA Spring- 2013 Jordi Torres, UPC - BSC www.JordiTorres.eu Cloud Computing – MIRI (CLC-MIRI) UPC Master in Innovation & Research in Informatics Course Notes in Transparency Format

Transcript of Case study: CASSANDRA% - Jordi Torres · 3 Architecture ! The architecture of Cassandra is...

Case  study:  CASSANDRA  

Spring- 2013

Jordi Torres, UPC - BSC www.JordiTorres.eu

Cloud Computing – MIRI (CLC-MIRI) UPC Master in Innovation & Research in Informatics

Course  Notes  in  Transparency  Format  

2

Cassandra: main features

§  Cassandra does not support relationships between column families (“tables”), disregarding foreign keys and join operations.

§  Knowing this, the best practice when designing a data model is to keep related data in the same column family.

In this section we will review only the main features of Cassandra as an example

3

Architecture

§  The architecture of Cassandra is completely decentralized and peer-to-peer, meaning all nodes in a Cassandra cluster are equivalent and provide the same functionality: receive read and write requests, or forward them to other nodes.

–  Peer-to-peer, distributed system –  All nodes the same –  Data Partitioned –  Custom data replication

4

Partitioning

Cassandra implements automatic partitioning and replication mechanisms to decide which nodes are in charge of each replica. §  How?

PARTITIONER

§  Divide the data across the nodes in the cluster §  Each node is responsible for a range of the overall data

Source: Juan Luis Pérez – researcher at BSC (EEDC 2012 master course)

5

Partitioning

Node A Node B

Node C Node D

Source: Juan Luis Pérez – researcher at BSC (EEDC 2012 master course)

6

Partitioning

raiser name: john pass: **** url: icann.org

trucker name: james pass: **** url: w3.org

dumper name: maria pass: ****

biker name: linda pass: ****

Row Key determines node placement

7

Partitioning

[000..1    400..0]  

[400..1    800..0]  

[800..1    c00..0]  

[c00..1    000..0]  

Range of MD5 hash

8

Partitioning

raiser

trucker dumper

biker

[000..1    400..0]  

[400..1    800..0]  

[800..1    c00..0]  

[c00..1    000..0]  

65236c...  

a113f4...  

d4ab26...  

864058...  

Row Key MD5 Hash

9

Partitioning

raiser

trucker dumper

biker

[000..1    400..0]  

[400..1    800..0]  

[800..1    c00..0]  

[c00..1    000..0]  

65236c...  

a113f4...  

d4ab26...  

864058...  

Row Key MD5 Hash

10

Partitioning

raiser

trucker dumper

biker

[000..1    400..0]  

[400..1    800..0]  

[800..1    c00..0]  

[c00..1    000..0]  

65236c...  

a113f4...  

d4ab26...  

864058...  

Row Key MD5 Hash

11

Partitioning

raiser

trucker dumper

biker

[000..1    400..0]  

[400..1    800..0]  

[800..1    c00..0]  

[c00..1    000..0]  

65236c...  

a113f4...  

d4ab26...  

864058...  

Row Key MD5 Hash

12

Partitioning

raiser

trucker dumper

biker

[000..1    400..0]  

[400..1    800..0]  

[800..1    c00..0]  

[c00..1    000..0]  

65236c...  

a113f4...  

d4ab26...  

864058...  

Row Key MD5 Hash

13

Replication

§  Remember: “Cassandra implements automatic partitioning and replication mechanisms to decide which nodes are in charge of each replica”

àThe user only needs to configure the number of replicas and the system assigns each replica to a node in the cluster.

14

Replication

Cassandra stores multiple copies of rows on multiple nodes

§ Replication factor = number of replicas § Replica Placement Strategy

•  DEFAULT: SimpleStrategy •  NetworkTopologyStrategy •  …

§ Configurable both: –  Replication factor –  Placement Strategy

15

Replication

§  SimpleStrategy –  First replica determined by the partitioner –  Additional replicas rows are placed on the next nodes clockwise

in the ring

raiser

raiser

Original Row

Copy Row

16

Replication

§  NetworkTopologyStrategy –  Allows replication between different racks –  Racks in a data center or in multiple data centers –  Reliability & Performance –  …

§  Others …

17

Consistency

§  The goal of current distributed key-value stores such as Cassandra is to read and write data operations, exactly the same as any database system

–  However, while traditional databases … provide strong consistency guarantees of replicated data by controlling the concurrent execution of transactions,

–  Cassandra … provides tunable consistency in order to favour scalability and availability.

18

Consistency

§  Data consistency is tunable by the user when queries are performed, so …

depending on the desired level of consistency, operations can either return as soon as possible or wait until a majority or all nodes respond – Tunable data consistency

Choose between strong and eventual consistency

– Consistency per-operation (reads & writes)

19

Strategy for Read

19

20

Strategy for Writes

20

21

Strong/Weak consistency?

§  As it can be derived from their description, strong consistency can only be achieved when using (Quorum and) All consistency levels.

§  Operations that use weaker consistency levels, such as Zero, Any and One, aren’t guaranteed to read the most recent data.

§  However, this weaker consistency provides certain flexibility for applications that can benefit from better performance and don’t have strong consistency needs. … imagine your facebook wall!!!

22

Caching §  Data is first written to a commit log for durability

–  Local to the node (for disaster recovery purpouse)

§  Then written to a in-memory structure (memtable) –  Node that store the row

§  And then to disk (SSTable) once memtable is full Data durability is assured

Commit log

memtable

SSTable

Source: Juan Luis Pérez – researcher at BSC (EEDC 2012 master course)