Using Ansible at Scale to Manage a Public Cloud

download Using Ansible at Scale to Manage a Public Cloud

If you can't read please download the document

Transcript of Using Ansible at Scale to Manage a Public Cloud

Jesse Keating Linux Systems Engineer IV Cloud Servers@iamjkeating

Using Ansible at Scale to Manage a Public Cloud

06/13/2013 AnsibleFest

I'm Jesse KeatingI work at RackspaceI'm going to talk about what Rackspace does with Ansible

Rackspace cares about scale

Scale of server systems

Scale of environments

Scale of engineers

At Rackspace we care about scale.

Scale of number of server systems

Scale of product environments

Scale of engineers doing awesome things at Rackspace.

Going to cover three scale challenges with three case studies that will highlight key Ansible features that have made it my go to tool in the box.

Scale of Server Systems

Rackspace Public Cloud

4 Production regions1 to 8 cells per region

250 to 500 nodes per cell

Nearly 15K systems in production

Another 500~ in CI/pre-production

Mixed use of copy-pasta pssh scripts, pre-configured agent actions, jenkins automation, and host-based config management

Managed by admins, engineers, developers

First is the scale of servers.

I work in the Rackspace Public Cloud product group.

We have...

It is a lot to handle.

Have existing inventory files for use with pssh/etc.

Admins worry about what's there, engineers work on growing capacity and automation, developers work on new code and new tools to deploy code. We all work together, DevOps.

Case study: Hotpatch One Production Environment

3900~ compute-nodesSpread across 8 cells

Out of 6000~ total hosts

Alerting will flood admins

Output is hard to parse

A real world example from a couple days ago

Needed to copy one file out to nova-compute Vms and restart nova-compute service

Want to avoid flooding the admins with alerts

Want easy to read output to know what happened.

Before would have been manual actions on nagios hosts, bash script around pssh, lots of output noise, repeat delays on inactive hosts

Ansible Key Features

Inventory plugin

Simple process flow

Reusable playbooks with variable adjustments

Avoids repeated actions on downed hosts

Cleaner output

Key things Ansible brings to the party

Need to change

Example of existing inventory contents. Regions with cells with groups

.. and

More

to...

Json output that ansible can use. Groups of groups, group_vars, addresses.

Fairly simple python script to hand to ansible (but it can be anything, so long as it hands back json)

So we can do...

Silly example of a one-off task

Or this

Actual playbook used to hot-patch production

Ansible Use

Replacing use of pssh for Random Tasks

Replacing use of pssh for Expected Tasks (outside config management)

Reuse existing inventory content

Easily bolt together processes such as disabling nagios alerts prior to execution

This is how we're using Ansible RIGHT NOW with our production environment

Building up a toolbox as we go

Scale of Environments

Rackspace OpenStack Development

At least 7 major software projectsDifferent feature schedules within each

One Continuous Integration environment

One Pre-production environment

One branch of code that can easily be deployed

New code deploys every two weeks

Next I want to talk about the scale of our environments. Again I'll be focusing on our public cloud, which is powered by OpenStack.Stop me when you spot the problem.

Servers, block storage, object storage, networks, auth, usage, etc...

CI is really just for automated tests to gauge health

Way too many moving parts for one pre-production environment, puts risk on deploying code in timely manner.

Not easy to deploy from personal branch/fork

Case Study: Create production like environment to test disruptive product code change

30~ virtual instancesDB servers

Rabbit servers

Service providers

40~ capacity nodesHypervisor + nova-compute VM

Mixed use of fabric, shell scripts, copy-pasta

No self service

What we want to do is build out preproduction environments for each group or individual developer. Big task

Before could be days or weeks before an environment could be created, then could sit unused for long periods of time. Devs couldn't do it, Engineers had to find time to fit it in.

Ansible Key Features

Intermix local actions and remote actions

External inventory plugin

Start from nothing

API to use directly within another application

Why we went with Ansible to back this service

Start with localhost prep

Apologize for puppet/mco stuff here, but that is what is pre-existing

Localhost actions to prepare files for new hosts

Local actions to boot instances

Use the host loop to parallelize host boot up in one of our internal Nova environments

Eventually this will use the rax module, which could do the DNS step for us

Remote actions on hosts

Now do some actions on the remote hosts.

Not showing everything

Still in development

Existing yaml for host vars

Inventory files look a little different here, more details per host. Making use of some yaml syntax to have defaults that can be overloaded.

Plugin to read the files, and use --host

Ansible Use

Replacing use of fabric, pssh, copy-pasta

Boot strapping environment to the point where existing config management can take over

Freeing up Engineer time by making it self-service

Freeing up resources by tearing down environments after use

Working toward using same process to build out production environments

What could take days/weeks to get done can now take minutes.

Automating the part that isn't already automated, filling the gap. Will hook it into a web service where developers can make a reservation and provide input as to what they want deployed.

Significant overlap with process to roll out new production environments, obvious next step

Scale of Engineers

Rackspace Engineering

Between 4K and 6K employees/contractors

Between 500 and 1K Engineer/Developer types

Many dozens of summer interns

Countless groups

Countless projects

Rapid team creation / shifting of resources

Mixed use of Mac OSX and Linux

Mixed use of automation, configuration, et al tools

Disjoint ownership of engineering onboarding

Finally lets talk about the scale of our Engineering organization(s)

No hard rules about what tech must be used. Best practices bubble up

A real challenge to bring on new employees, worse to bring on intern and make most use of their time

Case study: Ozone Onboard

30+ git repos

5+ utilities w/ configuration

Permissions to a plethora of services

Configuration for CI/preprod/prod environments

Details scattered throughout wiki pages and tribal knowledge

Once more talking about our cloud group, ozone. Not the full story, but some idea of what has to happen.

Took me weeks to get fully set up, and I think I'm still missing some stuff, exacerbated by being remote and off-hours from main group some times.

Ansible Key Features

Modular Roles

Minimal dependencies

OS agnostic

Idempotent

Fast

Easy to use and extend

How can Ansible help here?

Overview of Ansibox

Ansibox is a project I'm working on personally to help with onboarding. Taking inspiration from Github's Boxen project.

Roles are where the magic happens.

User edited file

Engineers should have to give limited input to Ansibox in order for Ansibox to be able to perform the setup.

These could be prompted for in the future.

Engineer names a role and provides a location to find that role.

Top level playbook

The top level playbook fetches all the roles, can update them optionally.Generates another playbook to actually go through and apply the roles to the host.

Generated playbook comes from a template and is very simple.

Generated Playbook

Here is a look at after it gets generated. Doing sudo no at this level, each task in each role can decide to do sudo if author wants it.

Making it go

A very simple start to a ansibox executable. Two playbooks are necessary due to Ansible design

Prompt is there for second play in case any role wants sudo

Ozone Tasks

This is the start of a task list for the ozone role. Repos get cloned, tools get installed, configuration files get put into place.

Here we could also check for permissions to services and prompt the engineer on what to do to gain access

Ansible Use

Developer bootstraps their own system by selecting roles and providing details

Teams own role definitions within a shared framework

Repeatable processAnsible playbook to clone/update roles

Second playbook to process roles

With this system it becomes easy for an engineer to boot strap a system, and easy for a group to own that process for the group.

Engineers can also add their own roles for personal setups, and be unafraid to refresh devices.

Engineers can also contribute to the system as gaps are found

Conclusion

Ansible solves many problems Rackspace faces

Chip away at edges with Ansible, perhaps one day replace existing config management systems with Ansible

Continue to assist in development of Ansible modules, plugins, and scale testing

Launch Ansibox soon!

RACKSPACE HOSTING | 5000 WALZEM ROAD | SAN ANTONIO, TX 78218US SALES: 1-800-961-2888 | US SUPPORT: 1-800-961-4454 | WWW.RACKSPACE.COMRACKSPACE HOSTING | RACKSPACE US, INC. | RACKSPACE AND FANATICAL SUPPORT ARE SERVICE MARKS OF RACKSPACE US, INC. REGISTERED IN THE UNITED STATES AND OTHER COUNTRIES. | WWW.RACKSPACE.COM

Click to edit the title text formatClick to edit Master title style

6/13/13

RACKSPACE HOSTING | WWW.RACKSPACE.COM

Click to edit the title text formatCLICK TO EDIT MASTER TITLE STYLE

RACKSPACE HOSTING | WWW.RACKSPACE.COM

Click to edit the title text formatCLICK TO EDIT MASTER TITLE STYLE

RACKSPACE HOSTING | WWW.RACKSPACE.COM

Click to edit the title text formatCLICK TO EDIT MASTER TITLE STYLE

Click to edit the title text formatClick to edit Master title style

Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline Level

Seventh Outline LevelClick to edit Master text styles

Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline Level

Seventh Outline LevelClick to edit Master text styles

Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline Level

Seventh Outline LevelClick to edit Master text styles

Second level

Third level

Fourth level

Fifth level

Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline Level

Seventh Outline LevelClick to edit Master text styles

Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline Level

Seventh Outline LevelClick to edit Master text styles

Second level

Third level

Fourth level

Fifth level

RACKSPACE HOSTING | WWW.RACKSPACE.COM

Click to edit the title text formatClick to edit Master title style

Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline Level

Seventh Outline LevelClick to edit Master text styles

Second level

Third level

Fourth level

Fifth level

RACKSPACE HOSTING | WWW.RACKSPACE.COM

Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline Level

Seventh Outline LevelClick to edit Master text styles

Second level

Third level

Fourth level

Fifth level

Click to edit the title text formatClick to edit Master title style

Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline Level

Seventh Outline LevelClick to edit Master text styles

RACKSPACE HOSTING | WWW.RACKSPACE.COM

Click to edit the title text formatClick to edit Master title style

Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline Level

Seventh Outline LevelClick to edit Master text styles

RACKSPACE HOSTING | WWW.RACKSPACE.COM

Click to edit the title text formatClick to edit Master title style

6/13/13