Considerations for Operating An OpenStack Cloud

49
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 1 Mark T. Voelker, Technical Leader @ Cisco OpenStack ATC/StackForge Puppet Core/Foundation Member #54 All Things Open 2014

description

My talk from All Things Open 2014 Over the past four years, OpenStack has become a widely adopted cloud operating system. Cloud computing has made many tasks like creating new servers and networks easy for end users by creating abstractions above the infrastructure. However, cloud operators need to maintain not only the cloud operating system itself, but all of the underpinning systems beneath it. The challenges of managing a set of distributed systems isn’t small, but with proper tooling is well within reach. This talk will discuss considerations for cloud operators such as logging, storage, monitoring, high availability, configuration management with a focus on OpenStack clouds with a focus on open source solutions for common issues encountered when operating an OpenStack cloud. We’ll consider data gathered from the community and discuss “day 1″ and “day 2″ concerns as well as established patterns and technology choices among OpenStack deployers today.

Transcript of Considerations for Operating An OpenStack Cloud

Page 1: Considerations for Operating An OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 1

Mark T. Voelker, Technical Leader @ Cisco

OpenStack ATC/StackForge Puppet Core/Foundation Member #54

All Things Open 2014

Page 2: Considerations for Operating An OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 2

@marktvoelker

• Tech Lead at Cisco, StackForge Puppet core developer, OS Foundation Member #54

• Fact: can be bribed with doughnuts

• Currently works in Cisco’s Cloud & Virtualization Group

• In copious (hah!) spare time: OpenStack solutions, Big Data, Massively Scalable Data Centers, Devops, making sawdust with extreme prejudice

Page 3: Considerations for Operating An OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 3

• Tech lead, manager, software developer, architect

• Started in OpenStack in 2011 at the Diablo Design Summit

Page 4: Considerations for Operating An OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 4

The great thing about my job is that I get to have fun exploring a lot of new things…

Page 5: Considerations for Operating An OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 5

….and I get to help build a LOT of clouds.

Page 6: Considerations for Operating An OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 6

Today’s talk won’t be overly formal….

Page 7: Considerations for Operating An OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 7

…because I tend to get excited by this stuff.

Page 8: Considerations for Operating An OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 8

Page 9: Considerations for Operating An OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 9

Page 10: Considerations for Operating An OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 10

……then you know how to get to Day 1.

Now let’s talk about getting to Day 30…

Page 11: Considerations for Operating An OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 11

• Architecture

• Components

• High Availability

• Bare metal bring-up

• Config management

• CI/CD

• Packaging

• Automated test

• Monitoring

• Up/down alerting

• Trending data

• Logging and log search

Page 12: Considerations for Operating An OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 12

High

Availability?

Sounds

great--I’ll

take two!

Page 13: Considerations for Operating An OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 13

• Consider whether you want active/active or active/passive

• Setup and tooling differs a bit, but I generally like active/active

• Note that docs.openstack.org has an HA Guide

• A bit dated…patches welcome!

• Prioritize HA for the control plane

• That also means thinking about your database, network, and RPC bus

• Instance-level HA: there be dragons

• But yes, it’s being looked at

• Pets vs cattle

• Note: HA == more hardware

• Some components need at least 3 nodes

Page 14: Considerations for Operating An OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 14

• Stuff OpenStack needs to run: message brokers

• Check out RabbitMQ clustering and mirrored queues

• Check out Galera for MySQL/MariaDB

• I usually see Percona XtraDB

• Frontend with an HAProxy/Keepalived pair

Page 15: Considerations for Operating An OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 15

• Don’t do rabbit clustering

over a WAN

• Be aware of the SELECT…

FOR UPDATE issue

Page 16: Considerations for Operating An OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 16

• Long story short: Neutron and some parts of Nova invoke an SQL pattern known as “SELECT…FOR UPDATE” which Galeradoesn’t support due to issues with cross-node locking.

• Can cause deadlocks symptoms.

• Neutron/nova code being refactored to remove, but will likely not be done until at least Kilo.

• Meanwhile: use HAProxy to send writes to a single Galera node and you should be fine

• With the obvious scalability bottleneck

• More info here.

• Thank Jay Pipes & Peter Boros for

the find!

Page 17: Considerations for Operating An OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 17

• Use Swift, Ceph, or other highly available storage to back Glance

• Pick a highly available storage backend for Cinder too

• Use Keepalived/HAProxy to front-end multiple API servers

• Or another load balancer technology of your choice

• Can be deployed as dedicated nodes for scale, or cohabitate

• Network: DVR vs Provider Network Extensions

• Distributed Virtual Routers are a new experimental feature in Juno (not yet ready for production)

• Please go test it and report/fix bugs!

• Provider networks essentially punt the availability issue to your physical network

• Allows you to use standard tools like virtual port channels and VRRP

• Also highly performant

Page 18: Considerations for Operating An OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 18

• Architecture

• Components

• High Availability

• Bare metal bring-up

• Config management

• CI/CD• Packaging

• Automated test

• Monitoring• Up/down alerting

• Trending data

• Logging and log search

Page 19: Considerations for Operating An OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 19

We start with bare metal.

Page 20: Considerations for Operating An OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 20

• For a cloud of any real size, you don’t want to be installing operating systems by hand

• Remember that baremetal bringup actually isn’t something that just happens once…often recurs for upgrades, capacity expansion, etc.

• Baremetal bringup tools can also have other uses, like inventory or bootstrapping configuration management agents.

Page 21: Considerations for Operating An OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 21

• A simple (~15k lines of Python code) tool for managing baremetaldeployments

• Flexible usage (API, CLI, GUI)

• Allows you to define systems (actual machines) and profiles (what you want to do with them)

• Provides hooks for Puppet so you can then do further automation once the OS is up and running

• Provides control for power (via IPMI or other means), DHCP/PXE (for netbooting machines), and more.

Page 22: Considerations for Operating An OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 22

Page 23: Considerations for Operating An OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 23

Page 24: Considerations for Operating An OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 24

• Razor• Developed by EMC, managed by Puppet Labs (occasionally used with Chef

too)

• Initial release in 2012

• Uses a “microkernel” loaded onto the machine to gather facts before provisioning

• Tag + Policy model

• Crowbar• Originally written by Dell, now a community project

• Originally designed to deploy OpenStack on all the way from baremetal

• Now deploys other stuff too (namely, Hadoop)

• Uses Chef to handle everything after the OS install

• Foreman• Used by Red Hat among others

• Does baremetal bringup and serves as a Puppet ENC

Page 25: Considerations for Operating An OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 25

• Architecture

• Components

• High Availability

• Bare metal bring-up

• Config management

• CI/CD• Packaging

• Automated test

• Monitoring• Up/down alerting

• Trending data

• Logging and log search

Page 26: Considerations for Operating An OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 26

Page 27: Considerations for Operating An OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 27

“Cloud isn’t just an infrastructure technology….it’s a new operations model. And with OpenStack in particular, it’s one that’s very well suited to a DevOps style of management. Many companies aren’t just adopting cloud, they’re changing how they operate.”

“Besides, logging into servers to mess with config files makes me sad.”

--That ranty guy in Raleigh again

Page 28: Considerations for Operating An OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 28

• Remember, OpenStack is a set of interoperating distributed systems

• That means you’re going to have a lot of software to configure on a lot of machines

• You’re probably going to want to make changes over time

• You’re probably going to have more than one person touching your cloud

• CM tools help you treat configuration as code, so you can collaborate more easily

Page 29: Considerations for Operating An OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 29

Pile of

Bash

Scripts

Page 30: Considerations for Operating An OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 30

Page 31: Considerations for Operating An OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 31

• An increasingly common pattern:

• Puppet or Chef for configuration management, PLUS

• Ansible or Salt for cross-node orchestration

• Recommendation: use the tools that work for you!

• But remember: you don’t have to do it alone.

• Several CM tools have thriving collaborators in the OpenStack community

• Links for later:

• Puppet for OpenStack

• Chef for OpenStack

• Ansible for OpenStack

• SaltStack for OpenStack

• Pile of bash scripts for OpenStack

Page 32: Considerations for Operating An OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 32

• Unit tests for your deployment code are a good idea

• ServerSpec tests to make sure your config management system did what it was supposed to are great

Page 33: Considerations for Operating An OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 33

• Architecture

• Components

• High Availability

• Bare metal bring-up

• Config management

• CI/CD• Packaging

• Automated test

• Monitoring• Up/down alerting

• Trending data

• Logging and log search

Page 34: Considerations for Operating An OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 34

…well, haven’t you always wanted a butler?

Page 35: Considerations for Operating An OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 35

• DevOps: actually pretty handy

• OpenStack change velocity (community’s and yours)

• Anecdote: the majority of deployments I work with have some customizations or backports from future releases

• It’s not just OpenStack, it’s all the underpinning components and your CM code too!

Page 36: Considerations for Operating An OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 36

• OpenStack itself uses CI/CD tools in it’s development process…you should consider using them in your cloud buildouttoo!

• The OpenStack Infra team has created some awesome tools: JJB, Zuul, etc

• They’re all open source and you can even see how OpenStack’s own CI is set up (check out Elizabeth Joseph’s slides from yesterday for more!).

• The basics:

• An integration server (Jenkins, Go, Travis, etc)

• A code review and repository tool (Gerrit, Cgit, GitHub, etc)

• A battery of automated tests (lint checks, rspec-puppet, Tempest, Rally, etc)

• Some form of packaging (rpmbuild/mock, sbuilder/pbuilder, etc)

• An artifact repository (Artifactory, yum/apt repos, etc)

• Optionally, some deployment jobs (usually powered by your CM tool)

Page 37: Considerations for Operating An OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 37

• …you never intend to change the code yourself

• …building your own packages would violate a support contract with your distribution

• …you’ve never used a CI/CD pipeline before (but really: you should start learning)

• …you have a static environment that absolutely will not change, need to add capacity, etc.

Page 38: Considerations for Operating An OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 38

• Architecture

• Components

• High Availability

• Bare metal bring-up

• Config management

• CI/CD• Packaging

• Automated test

• Monitoring• Up/down alerting

• Trending data

• Logging and log search

Page 39: Considerations for Operating An OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 39

• Now that you have a cloud, you’ll probably want to know that all it’s parts stay in good working order.

Page 40: Considerations for Operating An OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 40

Page 41: Considerations for Operating An OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 41

Page 42: Considerations for Operating An OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 42

• I’ve worked on a lot of OpenStack clouds and almost everyone has their own preferred monitoring toolset.

• One possible exception: almost everybody seems to love Graphite.

• The golden rule is: use the tools that work for you!

• Very often this will be whatever you’re using in the rest of your infrastructure.

• Break it down into at least two buckets:

• Up/down and alerting (ex: Nagios or it’s derivatives…yes, there are OpenStack plugins out there on NagiosExchange)

• Trending data collection/plotting (ex: collectd/statsd feeding graphite)

• Also: use your peers!

• Check out Tong Li’s Monitoring as a Service talk later today!

• Operators often willing to share, so ask on the openstack-operators list.

Page 43: Considerations for Operating An OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 43

• Architecture

• Components

• High Availability

• Bare metal bring-up

• Config management

• CI/CD

• Packaging

• Automated test

• Monitoring

• Up/down alerting

• Trending data

• Logging and log search

Page 44: Considerations for Operating An OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 44

Page 45: Considerations for Operating An OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 45

• Distributed systems generate logs…..all over the place.

• Finding the root of problems may mean correlating logs from different machines…but which?

• OpenStack in particular *can* be pretty verbose

• You may also be dealing with logs from other distributed tools in your cloud (RabbitMQ, databases, etc)

• Generally you want to get logs together, be able to search them, and be able to visualize them.

Page 46: Considerations for Operating An OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 46

Unlike monitoring tools, there seems to be pretty broad consensus on good tools here in deployments I’ve worked with….

Page 47: Considerations for Operating An OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 47

http://www.elasticsearch.org/blog/openstack-elastic-recheck-powered-elk-stack/

(visualization)

(collection)

(search/analytics)

Page 48: Considerations for Operating An OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 48

Questions?@marktvoelker

http://openstack.org/

http://cisco.com/go/openstack/

(yes, we’re hiring!)

Page 49: Considerations for Operating An OpenStack Cloud