Storing VMs with Cinder and Ceph RBD.pdf

Post on 04-Dec-2014

4.736 views 4 download

description

true

Transcript of Storing VMs with Cinder and Ceph RBD.pdf

Storing VMs with Cinder and

Ceph RBD

Growing With Hardware Appliances

First PB

•  Proprietary storage hardware

• Well-known storage vendor

$14 b’zillion

Second PB

•  Proprietary storage hardware

•  Same storage vendor

Another

$14 b’zillion

47

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

52

DC

DC

DC

DC

D

C

DC

DC

DC

DC

DC

DC

DC

C++

53

DC

DC

DC

DC

D

C

DC

DC

DC

DC

DC

DC

DC

C++ X

54

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

HUMAN [DEVELOPER]

!!

Hard Drives Are Tiny Record Players and They Fail Often jon_a_ross, Flickr / CC BY 2.0 71

72

D

55 times / day

= D

D D

x 1 MILLION

D D

D D

73

OPEN SOURCE

COMMUNITY-FOCUSED

SCALABLE

NO SINGLE POINT OF FAILURE

SOFTWARE BASED

SELF-MANAGING

philosophy design

79

RADOS A reliable, autonomous, distributed object store comprised of self-healing, self-managing,

intelligent storage nodes

LIBRADOS

A library allowing

apps to directly

access RADOS, with support for

C, C++, Java,

Python, Ruby,

and PHP

RBD A reliable and fully-

distributed block device, with a Linux

kernel client and a

QEMU/KVM driver

CEPH FS A POSIX-compliant

distributed file system, with a Linux

kernel client and

support for FUSE

RADOSGW A bucket-based REST

gateway, compatible with S3 and Swift

APP APP HOST/VM CLIENT

81

DISK

FS

DISK DISK

OSD

DISK DISK

OSD OSD OSD OSD

FS FS FS FS btrfs xfs

ext4

M M M

82

M

M

M

HUMAN

83

Monitors:

• Maintain cluster map

•  Provide consensus for distributed decision-making

• Must have an odd number

•  These do not serve stored objects to clients

M

OSDs: • One per disk (recommended)

•  At least three in a cluster

•  Serve stored objects to clients

•  Intelligently peer to perform replication tasks

•  Supports object classes

APP??

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

APP

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

APP

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

A-G

H-N

O-T

U-Z

F*

107

10 10 01 01 10 10 01 11 01 10

10 10 01 01 10 10 01 11 01 10

hash(object name) % num pg

CRUSH(pg, cluster state, rule set)

108

10 10 01 01 10 10 01 11 01 10

10 10 01 01 10 10 01 11 01 10

109

CRUSH

•  Pseudo-random placement algorithm

•  Ensures even distribution

•  Repeatable, deterministic

•  Rule-based configuration

•  Replica count

•  Infrastructure topology

•  Weighting

110

CLIENT

??

112

113

CLIENT

??

111

84

RADOS A reliable, autonomous, distributed object store comprised of self-healing, self-managing,

intelligent storage nodes

LIBRADOS

A library allowing

apps to directly

access RADOS, with support for

C, C++, Java,

Python, Ruby,

and PHP

RBD A reliable and fully-

distributed block device, with a Linux

kernel client and a

QEMU/KVM driver

CEPH FS A POSIX-compliant

distributed file system, with a Linux

kernel client and

support for FUSE

RADOSGW A bucket-based REST

gateway, compatible with S3 and Swift

APP APP HOST/VM CLIENT

LIBRADOS

M

M

M

85

APP

native

L

LIBRADOS

•  Provides direct access to RADOS for applications

•  C, C++, Python, PHP, Java

• No HTTP overhead

87

RADOS A reliable, autonomous, distributed object store comprised of self-healing, self-managing,

intelligent storage nodes

LIBRADOS

A library allowing

apps to directly

access RADOS, with support for

C, C++, Java,

Python, Ruby,

and PHP

RBD A reliable and fully-

distributed block device, with a Linux

kernel client and a

QEMU/KVM driver

CEPH FS A POSIX-compliant

distributed file system, with a Linux

kernel client and

support for FUSE

RADOSGW A bucket-based REST

gateway, compatible with S3 and Swift

APP APP HOST/VM CLIENT

88

M

M

M

LIBRADOS

RADOSGW

APP

native

REST

LIBRADOS

RADOSGW

APP

89

RADOS Gateway:

•  REST-based interface to RADOS

•  Supports buckets, accounting

•  Compatible with S3 and Swift applications

90

RADOS A reliable, autonomous, distributed object store comprised of self-healing, self-managing,

intelligent storage nodes

LIBRADOS

A library allowing

apps to directly

access RADOS, with support for

C, C++, Java,

Python, Ruby,

and PHP

CEPH FS A POSIX-compliant

distributed file system, with a Linux

kernel client and

support for FUSE

RADOSGW A bucket-based REST

gateway, compatible with S3 and Swift

APP APP HOST/VM CLIENT

RBD A reliable and fully-

distributed block device, with a Linux

kernel client and a

QEMU/KVM driver

91

M

M

M

VM

LIBRADOS LIBRBD

VIRTUALIZATION CONTAINER

LIBRADOS

92

M

M

M

LIBRBD

CONTAINER

LIBRADOS LIBRBD

CONTAINER VM

LIBRADOS

93

M

M

M

KRBD (KERNEL MODULE)

HOST

RADOS Block Device:

• Storage of virtual disks in RADOS

• Allows decoupling of VMs and

containers

• Live migration!

• Images are striped across the

cluster

• Thin-provisioning

• Snapshots and cloning

LIBRADOS

115

M

M

M

VM

LIBRBD

VIRTUALIZATION CONTAINER

HOW DO YOU

SPIN UP

THOUSANDS OF VMs

INSTANTLY

AND

EFFICIENTLY?

116

144

117

0 0 0 0

instant copy

= 144

4 144

118

CLIENT

write

write

write

= 148

write

4 144

119

CLIENT read

read

read

= 148

29

local disk(VM images)

Novacompute

Glance(templates)

read X

X

X'

old-style VM image creation

● ephemeral

● expensive to create

Why use block storage?

• Persistent• More familiar to users

• Not tied to a single host• Decouples compute and storage• Enables Live migration

• Extra capabilities of storage system• Efficient snapshots• Different types of storage available• Cloning for fast restore or scaling

31

CinderAPI

Cindervolume

create image from X

X

Cinder volume creation

Glance(templates)

volume driver

locate X

location of X

read X

X'

reference to X'

flexibility in where VM images are stored

32

CinderAPI

Cindervolume

create image from X

X

Efficient volume creation

Glance(templates)

volume driver

locate X

location of X

clone X to X'

X'

reference to X'

fast CoW clone

X' complete

Questions?

Josh Durgin

josh.durgin@inktank.com

jdurgin on freenode

inktank.com | ceph.com