Storing VMs with Cinder and Ceph RBD.pdf
-
Upload
openstack-foundation -
Category
Documents
-
view
4.736 -
download
4
description
Transcript of Storing VMs with Cinder and Ceph RBD.pdf
Storing VMs with Cinder and
Ceph RBD
Growing With Hardware Appliances
First PB
• Proprietary storage hardware
• Well-known storage vendor
$14 b’zillion
Second PB
• Proprietary storage hardware
• Same storage vendor
Another
$14 b’zillion
47
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
52
DC
DC
DC
DC
D
C
DC
DC
DC
DC
DC
DC
DC
C++
53
DC
DC
DC
DC
D
C
DC
DC
DC
DC
DC
DC
DC
C++ X
54
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
HUMAN [DEVELOPER]
!!
Hard Drives Are Tiny Record Players and They Fail Often jon_a_ross, Flickr / CC BY 2.0 71
72
D
55 times / day
= D
D D
x 1 MILLION
D D
D D
73
OPEN SOURCE
COMMUNITY-FOCUSED
SCALABLE
NO SINGLE POINT OF FAILURE
SOFTWARE BASED
SELF-MANAGING
philosophy design
79
RADOS A reliable, autonomous, distributed object store comprised of self-healing, self-managing,
intelligent storage nodes
LIBRADOS
A library allowing
apps to directly
access RADOS, with support for
C, C++, Java,
Python, Ruby,
and PHP
RBD A reliable and fully-
distributed block device, with a Linux
kernel client and a
QEMU/KVM driver
CEPH FS A POSIX-compliant
distributed file system, with a Linux
kernel client and
support for FUSE
RADOSGW A bucket-based REST
gateway, compatible with S3 and Swift
APP APP HOST/VM CLIENT
81
DISK
FS
DISK DISK
OSD
DISK DISK
OSD OSD OSD OSD
FS FS FS FS btrfs xfs
ext4
M M M
82
M
M
M
HUMAN
83
Monitors:
• Maintain cluster map
• Provide consensus for distributed decision-making
• Must have an odd number
• These do not serve stored objects to clients
M
OSDs: • One per disk (recommended)
• At least three in a cluster
• Serve stored objects to clients
• Intelligently peer to perform replication tasks
• Supports object classes
APP??
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
APP
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
APP
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
DC
A-G
H-N
O-T
U-Z
F*
107
10 10 01 01 10 10 01 11 01 10
10 10 01 01 10 10 01 11 01 10
hash(object name) % num pg
CRUSH(pg, cluster state, rule set)
108
10 10 01 01 10 10 01 11 01 10
10 10 01 01 10 10 01 11 01 10
109
CRUSH
• Pseudo-random placement algorithm
• Ensures even distribution
• Repeatable, deterministic
• Rule-based configuration
• Replica count
• Infrastructure topology
• Weighting
110
CLIENT
??
112
113
CLIENT
??
111
84
RADOS A reliable, autonomous, distributed object store comprised of self-healing, self-managing,
intelligent storage nodes
LIBRADOS
A library allowing
apps to directly
access RADOS, with support for
C, C++, Java,
Python, Ruby,
and PHP
RBD A reliable and fully-
distributed block device, with a Linux
kernel client and a
QEMU/KVM driver
CEPH FS A POSIX-compliant
distributed file system, with a Linux
kernel client and
support for FUSE
RADOSGW A bucket-based REST
gateway, compatible with S3 and Swift
APP APP HOST/VM CLIENT
LIBRADOS
M
M
M
85
APP
native
L
LIBRADOS
• Provides direct access to RADOS for applications
• C, C++, Python, PHP, Java
• No HTTP overhead
87
RADOS A reliable, autonomous, distributed object store comprised of self-healing, self-managing,
intelligent storage nodes
LIBRADOS
A library allowing
apps to directly
access RADOS, with support for
C, C++, Java,
Python, Ruby,
and PHP
RBD A reliable and fully-
distributed block device, with a Linux
kernel client and a
QEMU/KVM driver
CEPH FS A POSIX-compliant
distributed file system, with a Linux
kernel client and
support for FUSE
RADOSGW A bucket-based REST
gateway, compatible with S3 and Swift
APP APP HOST/VM CLIENT
88
M
M
M
LIBRADOS
RADOSGW
APP
native
REST
LIBRADOS
RADOSGW
APP
89
RADOS Gateway:
• REST-based interface to RADOS
• Supports buckets, accounting
• Compatible with S3 and Swift applications
90
RADOS A reliable, autonomous, distributed object store comprised of self-healing, self-managing,
intelligent storage nodes
LIBRADOS
A library allowing
apps to directly
access RADOS, with support for
C, C++, Java,
Python, Ruby,
and PHP
CEPH FS A POSIX-compliant
distributed file system, with a Linux
kernel client and
support for FUSE
RADOSGW A bucket-based REST
gateway, compatible with S3 and Swift
APP APP HOST/VM CLIENT
RBD A reliable and fully-
distributed block device, with a Linux
kernel client and a
QEMU/KVM driver
91
M
M
M
VM
LIBRADOS LIBRBD
VIRTUALIZATION CONTAINER
LIBRADOS
92
M
M
M
LIBRBD
CONTAINER
LIBRADOS LIBRBD
CONTAINER VM
LIBRADOS
93
M
M
M
KRBD (KERNEL MODULE)
HOST
RADOS Block Device:
• Storage of virtual disks in RADOS
• Allows decoupling of VMs and
containers
• Live migration!
• Images are striped across the
cluster
• Thin-provisioning
• Snapshots and cloning
LIBRADOS
115
M
M
M
VM
LIBRBD
VIRTUALIZATION CONTAINER
HOW DO YOU
SPIN UP
THOUSANDS OF VMs
INSTANTLY
AND
EFFICIENTLY?
116
144
117
0 0 0 0
instant copy
= 144
4 144
118
CLIENT
write
write
write
= 148
write
4 144
119
CLIENT read
read
read
= 148
29
local disk(VM images)
Novacompute
Glance(templates)
read X
X
X'
old-style VM image creation
● ephemeral
● expensive to create
Why use block storage?
• Persistent• More familiar to users
• Not tied to a single host• Decouples compute and storage• Enables Live migration
• Extra capabilities of storage system• Efficient snapshots• Different types of storage available• Cloning for fast restore or scaling
31
CinderAPI
Cindervolume
create image from X
X
Cinder volume creation
Glance(templates)
volume driver
locate X
location of X
read X
X'
reference to X'
flexibility in where VM images are stored
32
CinderAPI
Cindervolume
create image from X
X
Efficient volume creation
Glance(templates)
volume driver
locate X
location of X
clone X to X'
X'
reference to X'
fast CoW clone
X' complete
Questions?
Josh Durgin
jdurgin on freenode
inktank.com | ceph.com