Ceph Day NYC: Ceph Fundamentals

Post on 16-Jan-2015

922 views 4 download

Tags:

description

Ross Turk provides an overview of the Ceph architecture at Ceph Day NYC.

Transcript of Ceph Day NYC: Ceph Fundamentals

Ceph FundamentalsRoss TurkVP Community, Inktank

2

ME ME ME ME ME ME.

Ross TurkVP Community, Inktank

ross@inktank.com@rossturk

inktank.com | ceph.com

3

Ceph Architectural OverviewAh! Finally, 32 slides in and he gets to the nerdy stuff.

4

RADOS

A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes

LIBRADOS

A library allowingapps to directlyaccess RADOS,with support forC, C++, Java,Python, Ruby,and PHP

RBD

A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver

CEPH FS

A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE

RADOSGW

A bucket-based REST gateway, compatible with S3 and Swift

APP APP HOST/VM CLIENT

5

RADOS

A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes

LIBRADOS

A library allowingapps to directlyaccess RADOS,with support forC, C++, Java,Python, Ruby,and PHP

RBD

A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver

CEPH FS

A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE

RADOSGW

A bucket-based REST gateway, compatible with S3 and Swift

APP APP HOST/VM CLIENT

6

DISK

FS

DISK DISK

OSD

DISK DISK

OSD OSD OSD OSD

FS FS FSFS btrfsxfsext4

MMM

7

M

M

M

HUMAN

8

Monitors:• Maintain cluster

membership and state• Provide consensus for

distributed decision-making• Small, odd number• These do not serve stored

objects to clients

M

OSDs:• 10s to 10000s in a cluster• One per disk• (or one per SSD, RAID group…)• Serve stored objects to

clients• Intelligently peer to perform

replication and recovery tasks

9

RADOS

A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes

LIBRADOS

A library allowingapps to directlyaccess RADOS,with support forC, C++, Java,Python, Ruby,and PHP

RBD

A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver

CEPH FS

A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE

RADOSGW

A bucket-based REST gateway, compatible with S3 and Swift

APP APP HOST/VM CLIENT

10

LIBRADOS

M

M

M

APP

socket

LLIBRADOS• Provides direct access to

RADOS for applications• C, C++, Python, PHP, Java,

Erlang• Direct access to storage

nodes• No HTTP overhead

12

RADOS

A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes

LIBRADOS

A library allowingapps to directlyaccess RADOS,with support forC, C++, Java,Python, Ruby,and PHP

RBD

A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver

CEPH FS

A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE

RADOSGW

A bucket-based REST gateway, compatible with S3 and Swift

APP APP HOST/VM CLIENT

13

M

M

M

LIBRADOS

RADOSGW

APP

socket

REST

14

RADOS Gateway:• REST-based object

storage proxy• Uses RADOS to store

objects• API supports buckets,

accounts• Usage accounting for

billing• Compatible with S3 and

Swift applications

15

16

RADOS

A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes

LIBRADOS

A library allowingapps to directlyaccess RADOS,with support forC, C++, Java,Python, Ruby,and PHP

CEPH FS

A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE

RADOSGW

A bucket-based REST gateway, compatible with S3 and Swift

APP APP HOST/VM CLIENT

RBD

A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver

17

M

M

M

VM

LIBRADOS

LIBRBD

HYPERVISOR

18

LIBRADOS

M

M

M

LIBRBD

HYPERVISOR

LIBRADOS

LIBRBD

HYPERVISORVM

19

LIBRADOS

M

M

M

KRBD (KERNEL MODULE)

HOST

20

RADOS Block Device:• Storage of disk images in

RADOS• Decouples VMs from host• Images are striped across

the cluster (pool)• Snapshots• Copy-on-write clones• Support in:• Mainline Linux Kernel

(2.6.39+)• Qemu/KVM, native Xen

coming soon• OpenStack, CloudStack,

Nebula, Proxmox

21

RADOS

A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes

LIBRADOS

A library allowingapps to directlyaccess RADOS,with support forC, C++, Java,Python, Ruby,and PHP

RBD

A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver

CEPH FS

A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE

RADOSGW

A bucket-based REST gateway, compatible with S3 and Swift

APP APP HOST/VM CLIENT

22

M

M

M

CLIENT

0110

datametadata

23

Metadata Server• Manages metadata for a

POSIX-compliant shared filesystem• Directory hierarchy• File metadata (owner,

timestamps, mode, etc.)• Stores metadata in RADOS• Does not serve file data to

clients• Only required for shared

filesystem

24

What Makes Ceph Unique?Part one: it never, ever remembers where it puts stuff.

25

APP??

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

26How Long Did It Take You To Find Your Keys This Morning?azmeen, Flickr / CC BY 2.0

27

APP

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

28Dear Diary: Today I Put My Keys on the Kitchen CounterBarnaby, Flickr / CC BY 2.0

29

APP

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

A-G

H-N

O-T

U-Z

F*

30I Always Put My Keys on the Hook By the Doorvitamindave, Flickr / CC BY 2.0

31

HOW DO YOUFIND YOUR KEYS

WHEN YOUR HOUSEIS

INFINITELY BIGAND

ALWAYS CHANGING?

32The Answer: CRUSH!!!!!pasukaru76, Flickr / CC SA 2.0

33

OBJECT

10 10 01 01 10 10 01 11 01 10

hash(object name) % num pg

CRUSH(pg, cluster state, rule set)

34

OBJECT

10 10 01 01 10 10 01 11 01 10

35

CRUSH• Pseudo-random placement

algorithm• Fast calculation, no lookup• Repeatable, deterministic• Statistically uniform

distribution• Stable mapping• Limited data migration on

change• Rule-based configuration• Infrastructure topology aware• Adjustable replication• Weighting

36

CLIENT

??

37

38

39

40

CLIENT

??

41

What Makes Ceph UniquePart two: it has smart block devices for all those impatient, selfish VMs.

42

LIBRADOS

M

M

M

VM

LIBRBD

HYPERVISOR

43

HOW DO YOUSPIN UP

THOUSANDS OF VMsINSTANTLY

ANDEFFICIENTLY?

44

144 0 0 0 0

instant copy

= 144

45

4144

CLIENT

write

write

write

= 148

write

46

4144

CLIENTread

read

read

= 148

47

What Makes Ceph Unique?Part three: it has powerful friends, ones you probably already know.

48

M

M

M

APACHE CLOUDSTACK

HYPER-VISOR

PRIMARY STORAGE POOL

SECONDARY STORAGE POOL

snapshots

templates

images

49

M

M

M

OPENSTACK

KEYSTONE API

SWIFT API CINDER API GLANCE API

NOVAAPI

HYPER-VISOR

RADOSGW

50

What Makes Ceph Unique?Part three: clustered metadata

51

M

M

M

CLIENT

0110

52

M

M

M

53

one tree

three metadata servers

??

54

55

56

57

58

DYNAMIC SUBTREE PARTITIONING

59

Now What?

60

8 years & 26,000 commits

61

Getting Started With Ceph

Read about the latest version of Ceph.• The latest stuff is always at http://ceph.com/get

Deploy a test cluster using ceph-deploy.• Read the quick-start guide at http://ceph.com/qsg

Deploy a test cluster on the AWS free-tier using Juju.• Read the guide at http://ceph.com/juju

Read the rest of the docs!• Find docs for the latest release at http://ceph.com/docs

Have a working cluster up quickly.

62

Getting Involved With Ceph

Most project discussion happens on the mailing list.• Join or view archives at http://ceph.com/list

IRC is a great place to get help (or help others!)• Find details and historical logs at http://ceph.com/irc

The tracker manages our bugs and feature requests.• Register and start looking around at http://ceph.com/tracker

Doc updates and suggestions are always welcome.• Learn how to contribute docs at http://ceph.com/docwriting

Help build the best storage system around!

63

Questions?

Ross TurkVP Community, Inktank

ross@inktank.com@rossturk

inktank.com | ceph.com