Download - Avishay Traeger & Shimshon Zimmerman, Stratoscale - Deploying OpenStack Cinder in Production, OpenStack Israel 2015

Deploying Cinder in Production

Avishay Traeger, PhDShimshon Zimmerman

Copyright 2015

Goals

Deploy Cinder in a way that is scalable and resilient to failures

❖ Limited downtime

❖ Reduce admin intervention

Copyright 2015

Available resources

Good documentation exists for

❖ Deploying OpenStack

❖ Many OpenStack HA Guides, for example: http://docs.openstack.org/high-availability-guide

http://docs.openstack.org/high-availability-guide



Copyright 2015

What we’ll cover today

❖ What is Cinder?❖ How is it designed?❖ What issues in Cinder impact deploying with HA?❖ How can I best deploy Cinder today?

❖ HA Cinder - live demo

❖ Cinder and Neutron spun off of Nova➢ Similar architectures➢ Some lessons from Cinder may be generalized

Copyright 2015

Who are we?

❖ Stratoscale R&D❖ Former Cinder core member

A couple words on what we do at Stratoscale❖ Hyper-converged infrastructure:

➢ Compute: customized KVM & docker➢ Storage: high performance scale out block storage➢ Network: full-featured SDN

❖ Management plane based on OpenStack❖ Easy install, easy upgrade, no maintenance

Sound interesting? We’re hiring!

Copyright 2015

What is Cinder?

❖ Abstraction that enables uniform management of block storage➢ Exposes northbound API (e.g., create, list, delete volumes

and snapshots)➢ Storage “drivers” implement southbound API

❖ Enables connections between Nova and storage

Copyright 2015

High-level architecture

cinder client

cinder-api

cinder-volumedrivercinder-scheduler cinder-backup

driver

storage

REST

SQL DB

Components to make HA:● Storage● Database● RPC messaging● Cinder services

Copyright 2015

HA storage

❖ Almost all storage supported by Cinder is HA

❖ These may not be - by default they use local disks (SPOF)➢ LVM: Sets up an iSCSI target over LVM on device➢ NFS: Sets up an NFS server on a file system

❖ Must make sure you have redundant network paths from compute to the storage

Copyright 2015

HA database

❖ An SQL DB is used to store OpenStack metadata➢ For example, in Cinder, information about volumes,

snapshots, quotas, volume types, etc.

❖ This DB must be replicated➢ Galera + MySQL/MariaDB/Percona➢ PostgreSQL replication➢ Store DB on DRBD backend

❖ Scale-out architectures for SQL exist, but not been tried with OpenStack as far as we know:➢ https://github.com/youtube/vitess

Copyright 2015

Inter-service messaging

OpenStack projects use oslo messaging for RPCs which wraps

❖ RabbitMQ (AMQP)➢ By default queues are located on a single node (SPOF)➢ Configure mirroring: https://www.rabbitmq.com/ha.html

❖ Qpid (AMQP)➢ Similar to RabbitMQ (configure queue replicas)

❖ ZeroMQ➢ Allows broker-based reliability like the others➢ Also allows brokerless peer-to-peer model

Copyright 2015

HA Cinder services

Great open source project: www.consul.io

❖ Service discovery: DNS or HTTP interface❖ Health checking for services and hosts❖ Key-value store❖ Scalable, multi-datacenter

Other solutions exist (etcd, zookeeper, pacemaker).

https://www.consul.io

Copyright 2015

HA management

Our Solution

OpenStack services

RabbitMQ Galera & MariaDB

consulservice discovery

Copyright 2015

Command flow example 1: create

cinder-api:create DB

record

call cinder- scheduler

(RPC)

scheduler chooses backend

call cinder- volume (RPC)

cinder- volume works

driver creates

volume on storage

DB update →available

Copyright 2015

Command flow example 2: extend

cinder-api:check

volume state and

update

call cinder- volume (RPC)

cinder- volume works

driver creates

volume on storage

DB update →available

Copyright 2015

Best practices with Juno

❖ Run cinder-api in active/active mode with a load balancer in front➢ If you are worried about two processes modifying the same volume

simultaneously, you can work around it with UUID-based routing and local file locks

❖ Run cinder-scheduler in active/active mode

❖ One cinder-volume per backend in active/passive mode➢ Cannot run active/active because of local file locks

❖ Make sure you clean up the DB - objects in transientstates and deleted objects that are not purged

Copyright 2015

Fresh in Kilo

Kilo was just released

❖ Lots of work on driver stability, including CI

❖ Multi-attaching volumes has been merged into Cinder, but unfortunately the Nova bits haven’t gone in - will wait for Liberty

❖ Support for incremental backups, additional consistency group APIs

Copyright 2015

Liberty and beyond

❖ The community is aware of the issues raised here

❖ Some more localized issues like atomic state transitions should be addressed in Liberty

❖ Recovery and maintaining consistency is a problem with no clear roadmap at this point

Thank You

Avishay Traeger, PhDShimshon Zimmerman