The Apache Cassandra storage engine

47
©2012 DataStax The Apache Cassandra storage engine Sylvain Lebresne 1 NoSQL matters 2012

Transcript of The Apache Cassandra storage engine

Page 1: The Apache Cassandra storage engine

©2012 DataStax

The Apache Cassandra storage engine

Sylvain Lebresne

1

NoSQL matters 2012

Page 2: The Apache Cassandra storage engine

©2012 DataStax

• Sylvain Lebresne

[email protected]

• @pcmanus

About me

2

Page 3: The Apache Cassandra storage engine

©2012 DataStax3

Page 4: The Apache Cassandra storage engine

©2012 DataStax3

1. What is Apache Cassandra

2. Data Model

3. The storage engine

Page 5: The Apache Cassandra storage engine

©2012 DataStax

1. What is Apache Cassandra

2. Data Model

3. The storage engine

3

Page 6: The Apache Cassandra storage engine

©2012 DataStax

about:project• Distributed data store aimed at big data.

• Apache project since 2010.

• Version 1.1 released last month.• Proven in production (Netflix, Twitter, Reddit,

Cisco, ...). Largest know cluster has over 300TB in over 400 machines.

4

Page 7: The Apache Cassandra storage engine

©2012 DataStax

Apache Cassandra

5

Page 8: The Apache Cassandra storage engine

©2012 DataStax

Apache CassandraA database:

5

Page 9: The Apache Cassandra storage engine

©2012 DataStax

Apache CassandraA database:• distributed / decentralized

5

Page 10: The Apache Cassandra storage engine

©2012 DataStax

Apache CassandraA database:• distributed / decentralized• replicated & durable

5

Page 11: The Apache Cassandra storage engine

©2012 DataStax

Apache CassandraA database:• distributed / decentralized• replicated & durable• scalable / elastic

5

Page 12: The Apache Cassandra storage engine

©2012 DataStax

Apache CassandraA database:• distributed / decentralized• replicated & durable• scalable / elastic

5

Page 13: The Apache Cassandra storage engine

©2012 DataStax

Apache CassandraA database:• distributed / decentralized• replicated & durable• scalable / elastic• fault-tolerant / no SPOF

5

Page 14: The Apache Cassandra storage engine

©2012 DataStax

Apache CassandraA database:• distributed / decentralized• replicated & durable• scalable / elastic• fault-tolerant / no SPOF• highly available

5

Page 15: The Apache Cassandra storage engine

©2012 DataStax

Apache CassandraA database:• distributed / decentralized• replicated & durable• scalable / elastic• fault-tolerant / no SPOF• highly available• data center aware

US Europe

6

Page 16: The Apache Cassandra storage engine

©2012 DataStax7

1. What is Apache Cassandra

2. Data Model

3. The storage engine

Page 17: The Apache Cassandra storage engine

©2012 DataStax

• Not SQL (no transaction, nor joins) but more than Key/Value.

• Inspired by Google BigTable

• Column families based.

Data Model

8

Page 18: The Apache Cassandra storage engine

©2012 DataStaxUsers

Ex: user profiles

birth_year 1994

50e8-e29b

fname Justin

lname Bieber

“For each user, holds profile infos”

9

Page 19: The Apache Cassandra storage engine

©2012 DataStaxUsers

Ex: user profiles

birth_year 1994

50e8-e29b

fname Justin

lname Bieber

birth_year 1978

2ab1-f1b7

email [email protected]

fname Ashton

lname Kutcher

“For each user, holds profile infos”

10

Page 20: The Apache Cassandra storage engine

©2012 DataStaxTimeline

Ex: user’s Tweets

50e8-e29b

“For each user, tweets he has made”

11

Page 21: The Apache Cassandra storage engine

©2012 DataStaxTimeline

Ex: user’s Tweets

50e8-e29b

0 @LiveLoveKary glad you had a good birthday #muchlove

“For each user, tweets he has made”

11

Page 22: The Apache Cassandra storage engine

©2012 DataStaxTimeline

Ex: user’s Tweets

50e8-e29b

0 @LiveLoveKary glad you had a good birthday #muchlove

1 @NickDeMoura happy bday my dude.

“For each user, tweets he has made”

11

Page 23: The Apache Cassandra storage engine

©2012 DataStaxTimeline

Ex: user’s Tweets

50e8-e29b

0 @LiveLoveKary glad you had a good birthday #muchlove

1 @NickDeMoura happy bday my dude.

2 @MickyArison @miamiHEAT thanks for the gam tonight

“For each user, tweets he has made”

11

Page 24: The Apache Cassandra storage engine

©2012 DataStaxTimeline

Ex: user’s Tweets

50e8-e29b

0 @LiveLoveKary glad you had a good birthday #muchlove

1 @NickDeMoura happy bday my dude.

2 @MickyArison @miamiHEAT thanks for the gam tonight

3 still a little tired. back in the studio today with Timbaland

“For each user, tweets he has made”

11

Page 25: The Apache Cassandra storage engine

©2012 DataStax

There’s more• Secondary indexes

• Distributed counters

• Composite columns

12

Page 26: The Apache Cassandra storage engine

©2012 DataStax13

Page 27: The Apache Cassandra storage engine

©2012 DataStax13

1. What is Apache Cassandra

2. Data Model

3. The storage engine

Page 28: The Apache Cassandra storage engine

©2012 DataStax

• Writes are harder than reads to scale

• Spinning disks aren’t good with random I/O

• Goal: minimize random I/O

Goal

14

Page 29: The Apache Cassandra storage engine

©2012 DataStax

A write’s journey

Memory

Hard drive

Memtable

Commit log

15

write( , )k1 c1:v1

Page 30: The Apache Cassandra storage engine

©2012 DataStax

A write’s journey

Memory

Hard drive

Memtable

write( , )k1 c1:v1

Commit log

k1 c1:v1

k1 c1:v1

16

Page 31: The Apache Cassandra storage engine

©2012 DataStax

A write’s journey

Memory

Hard drive

k1 c1:v1

k1 c1:v1

ack

17

Page 32: The Apache Cassandra storage engine

©2012 DataStax

A write’s journey

Memory

Hard drive

k1 c1:v1

k1 c1:v1

write( , )k2 c1:v1 c2:v2

k2 c1:v1 c2:v2

k2 c1:v1 c2:v2

18

Page 33: The Apache Cassandra storage engine

©2012 DataStax

A write’s journey

Memory

Hard drive

k1 c1:v1

k1 c1:v4 c2:v2

write( , )k1 c1:v4 c3:v3

k2 c1:v1 c2:v2

c3:v3

c2:v2

k2 c1:v1 c2:v2

k1 c1:v4 c3:v3

c2:v2

19

Page 34: The Apache Cassandra storage engine

©2012 DataStax

A write’s journey

Memory

Hard drive

SSTable

flush

k1 c1:v4 c2:v2

k2 c1:v1 c2:v2

c3:v3

index

cleanup

20

Page 35: The Apache Cassandra storage engine

©2012 DataStax

A write’s journey

Memory

Hard drive

k2 c1:v2 c3:v3

k1 c1:v5 c4:v4

k1 c1:v4 c2:v2

k2 c1:v1 c2:v2

c3:v3

index

more updates

k1 c1:v5 c4:v4

k2 c1:v2 c3:v3

21

Page 36: The Apache Cassandra storage engine

©2012 DataStax

A write’s journey

Memory

Hard drive

k1 c1:v4 c2:v2

k2 c1:v1 c2:v2

c3:v3

indexk1 c1:v5 c4:v4

k2 c1:v2 c3:v3

index

flush

22

Page 37: The Apache Cassandra storage engine

©2012 DataStax

Writes properties• No reads or seeks

• Only sequential I/O

• Immutable SSTables: easy snapshots

23

Page 38: The Apache Cassandra storage engine

©2012 DataStax

A read’s journey

Memory

Hard drive

k1 c1:v4 c2:v2

k2 c1:v1 c2:v2

c3:v3

indexk1 c1:v5 c4:v4

k2 c1:v2 c3:v3

index

read( )k1

?

24

Page 39: The Apache Cassandra storage engine

A read’s journey

Memory

Hard drive

k1 c1:v4 c2:v2

k2 c1:v1 c2:v2

c3:v3

index

k1 c1:v5 c4:v4

k2 c1:v2 c3:v3

index

k1

merge

c1:v5 c2:v2 c3:v3 c4:v4

Page 40: The Apache Cassandra storage engine

©2012 DataStax

Compaction

• Goal: keep the number of SSTables low

• Merge sort against multiple sstables

• Sequential I/O

26

Page 41: The Apache Cassandra storage engine

©2012 DataStax

Compaction

• Goal: keep the number of SSTables low

• Merge sort against multiple sstables

• Sequential I/O

k1 c1:v4 c2:v2

k2 c1:v1 c2:v2

c3:v3

index

k1 c1:v5 c4:v4

k2 c1:v2 c3:v3

index

26

Page 42: The Apache Cassandra storage engine

©2012 DataStax

Compaction

• Goal: keep the number of SSTables low

• Merge sort against multiple sstables

• Sequential I/O

k1 c1:v4 c2:v2

k2 c1:v1 c2:v2

c3:v3

index

k1 c1:v5 c4:v4

k2 c1:v2 c3:v3

indexk1 c1:v5 c2:v2

k2 c1:v2 c2:v2

c3:v3

indexc4:v4

c3:v3

26

Page 43: The Apache Cassandra storage engine

©2012 DataStax

SSTables

27

BF Index SummaryMemoryDisk

k1 k2 k3

k1

312 0 ...

Index

Data

Col. BF Col. Index c1:v4 c2:v2 c3:v3 ... k2 Col. BF ...

Page 44: The Apache Cassandra storage engine

©2012 DataStax

Optimizations• Row Cache

• Bloom filters: eliminates whole SSTable

• Key Cache• Rows & Columns Indexes

• ...

28

Page 45: The Apache Cassandra storage engine

©2012 DataStax

Other features

• Compression

• Checksums

• Time to live

29

Page 46: The Apache Cassandra storage engine

©2012 DataStax30

QUESTIONS?

Page 47: The Apache Cassandra storage engine

©2012 DataStax

• http://cassandra.apache.org/

• http://wiki.apache.org/cassandra/

• http://www.datastax.com/docs/1.0

31