Self service build and deployment at Netflix (Agile 2013)

Post on 15-Jan-2015

2.659 views 0 download

Tags:

description

How Netflix lets all of our engineers build and deploy their own code to production.

Transcript of Self service build and deployment at Netflix (Agile 2013)

Tweet @garethbowles with feedback!

Self-Service Build & Deployment @Netflix

Monday, August 5, 13

Tweet @garethbowles with feedback!

• How would your organization be different if all of your engineers could build, test and deploy their own code ...

• ... and were responsible for fixing what they broke at 3am ?

Monday, August 5, 13

Tweet @garethbowles with feedback!

Gareth Bowles

Monday, August 5, 13

Tweet @garethbowles with feedback!

Monday, August 5, 13

Tweet @garethbowles with feedback!

Netflix is the world’s leading Internet television network with more than 36 million members in 40 countries enjoying more than one billion hours of TV

shows and movies per month, including original series.

Source: http://ir.netflix.com

Monday, August 5, 13

Tweet @garethbowles with feedback!

The Challenge

• We need to innovate rapidly, driven by:

• Global competition

• New connected devices

• Continuous customer feedback

• And we need fast rollback

Monday, August 5, 13

Tweet @garethbowles with feedback!

The Challenge• We need to scale to cope with:

• Growing customer base

• Peaks in demand:

• Special events: holidays, Oscars

• Daily fluctuations (weekdays vs. weekends, daytime vs. evening)

Monday, August 5, 13

Tweet @garethbowles with feedback!

Things That Help

• We can push out updates whenever we like

• Company culture

Monday, August 5, 13

Tweet @garethbowles with feedback!

Things That Got in Our Way

Monday, August 5, 13

Tweet @garethbowles with feedback!

A Few Short Years Ago ...

• Monolithic web app

• Single points of failure

• Releases were done by following runbooks

• DC-based infrastructure

• Different teams used different tools

Monday, August 5, 13

Tweet @garethbowles with feedback!

Meeting the Challenge

Monday, August 5, 13

Tweet @garethbowles with feedback!

http://www.slideshare.net/reed2001/culture-1798664

Monday, August 5, 13

Tweet @garethbowles with feedback!

Freedom and Responsibility

• Hire mature people who work well with others

• Give them the context for company success

• Then get out of their way

• But hold them responsible for results

Monday, August 5, 13

Tweet @garethbowles with feedback!

Context, not Control

• Be transparent about what the company needs to succeed

• Minimize the processes people need to go through to achieve success

• Value results, not planning and process

Monday, August 5, 13

Tweet @garethbowles with feedback!

Highly Aligned, Loosely Coupled

• Clear strategy and goals

• Team interactions focus on strategy, not tactics

• Minimal cross-functional meetings

• Occasional post-mortems to increase alignment

Monday, August 5, 13

Tweet @garethbowles with feedback!

What This Helped Us Achieve

• DVD to Streaming

• DC to cloud

• US-only to 40-plus countries

Monday, August 5, 13

Tweet @garethbowles with feedback!

Architecture

Credit: Steve Somers

Monday, August 5, 13

Tweet @garethbowles with feedback!

Key Changes

• Service oriented architecture

• Many small teams, each providing their own interconnected service

• Deploy on Amazon Web Services

• Increased reliance on open source

Monday, August 5, 13

Tweet @garethbowles with feedback!

Highly aligned, loosely coupled

• Services are built by different teams who work together to figure out what each service will provide.

• The service owner publishes an API that anyone can use.

Monday, August 5, 13

Tweet @garethbowles with feedback!

What AWS Provides

• Machine Images (AMI)

• Instances (EC2)

• Elastic IPs

• Load Balancers

• Security groups / Autoscaling groups

Monday, August 5, 13

Tweet @garethbowles with feedback!

Freedom and Responsibility

• Developers deploy when they want

• They also manage their own capacity and autoscaling

• And fix anything that breaks at 3am!

Monday, August 5, 13

Tweet @garethbowles with feedback!

Personaliza-­‐Eon  Engine User  Info Movie  

MetadataMovie  RaEngs

Similar  Movies

API

Reviews A/B  Test  Engine

2B  requests  per  day  

into  the  Ne3lix  API

12B  outbound  requests  per  day  to  API  

dependencies

Monday, August 5, 13

Tweet @garethbowles with feedback!

Personaliza-­‐Eon  Engine User  Info Movie  

MetadataMovie  RaEngs

Similar  Movies

API

Reviews A/B  Test  Engine

2B  requests  per  day  

into  the  Ne3lix  API

12B  outbound  requests  per  day  to  API  

dependencies

Monday, August 5, 13

Tweet @garethbowles with feedback!

Build and Deployment

Monday, August 5, 13

Tweet @garethbowles with feedback!

The Audience

• ~700 engineers

• Large majority are developers

• Test engineers

• Delivery teams

• Operations & reliability engineering

Monday, August 5, 13

Tweet @garethbowles with feedback!

Our Goal

• Lower the barriers to build, test and deployment until the entire process is accessible to every developer.

Monday, August 5, 13

Tweet @garethbowles with feedback!

The Team

• 11 engineers and 1 director (but we’re hiring !)

• Developers, build / release engineers, DevOps

• Specialize, but understand the full stack

• Service oriented

Monday, August 5, 13

Tweet @garethbowles with feedback!

Self-Service Build & Deployment

• Channel best practices

Monday, August 5, 13

Tweet @garethbowles with feedback!

Self-Service Build & Deployment

• Channel best practices

• Promote, don’t dictate

Monday, August 5, 13

Tweet @garethbowles with feedback!

Self-Service Build & Deployment

• Channel best practices

• Promote, don’t dictate

• Make adoption easy

Monday, August 5, 13

Tweet @garethbowles with feedback!

Self-Service Build & Deployment

• Channel best practices

• Promote, don’t dictate

• Make adoption easy

• Make tools flexible

Monday, August 5, 13

Tweet @garethbowles with feedback!

Building and Deploying

Perforce / Git

libraries

source

Ant targets

Ivy

Groovy all over

snapshot / release libraries / apps

Jenkins

sync

resolve

buildcompile report

publishtest

Artifactory yumAminator

Asgard

rpms

Monday, August 5, 13

Tweet @garethbowles with feedback!

Building and Deploying

Perforce / Git

libraries

source

Ant targets

Ivy

Groovy all over

snapshot / release libraries / apps

Jenkins

sync

resolve

buildcompile report

publishtest

Artifactory yumAminator

Asgard

rpms

Monday, August 5, 13

Tweet @garethbowles with feedback!

Is That Really Self-Service ?

Monday, August 5, 13

Tweet @garethbowles with feedback!

Common Build Framework

• Define a build with just a few lines of Ant code

• Templates for libraries and webapps

• Override standard targets if you need to

Monday, August 5, 13

Tweet @garethbowles with feedback!

Jenkins Job DSL• Define Jenkins build jobs using a domain

specific language (based on Groovy)

• Loop to create multiple jobs (e.g. for building different branches)

• Make one change and rerun to update all jobs

• The code is the configuration

• https://wiki.jenkins-ci.org/display/JENKINS/Job

Monday, August 5, 13

Tweet @garethbowles with feedback!

Jenkins Dynaslaves

• Create build slaves in AWS

• Dedicated slave pools for teams

• Scale slave pools up and down on demand

• https://github.com/Netflix-Skunkworks/dynaslave-plugin

Monday, August 5, 13

Tweet @garethbowles with feedback!

From Build to Deployment

Monday, August 5, 13

Tweet @garethbowles with feedback!

Aminator

• Create (“bake”) AMIs

• Image contains a service and everything needed to run it

• Can be automatically triggered as a build step

• https://github.com/Netflix/aminator

Monday, August 5, 13

Tweet @garethbowles with feedback!

How Baking is Different

https://github.com/Netflix/aminator

Monday, August 5, 13

Tweet @garethbowles with feedback!

How Baking is Different

Traditional:•launch OS•install packages•install app

https://github.com/Netflix/aminator

Monday, August 5, 13

Tweet @garethbowles with feedback!

How Baking is Different

Generic AMI

Instance

Traditional:•launch OS•install packages•install app

https://github.com/Netflix/aminator

Monday, August 5, 13

Tweet @garethbowles with feedback!

How Baking is Different

Generic AMI

Instance

Traditional:•launch OS•install packages•install app

https://github.com/Netflix/aminator

Monday, August 5, 13

Tweet @garethbowles with feedback!

How Baking is Different

Generic AMI

Instance

Traditional:•launch OS•install packages•install app

https://github.com/Netflix/aminator

Monday, August 5, 13

Tweet @garethbowles with feedback!

How Baking is Different

Generic AMI

Instance

Traditional:•launch OS•install packages•install app

Netflix:•launch OS+app

https://github.com/Netflix/aminator

Monday, August 5, 13

Tweet @garethbowles with feedback!

How Baking is Different

Generic AMI

Instance

Traditional:•launch OS•install packages•install app

Netflix:•launch OS+app

App AMI Instance

https://github.com/Netflix/aminator

Monday, August 5, 13

Tweet @garethbowles with feedback!

Linux Base AMI (CentOS or Ubuntu)

Java (JDK 6 or 7)

Tomcat

Optional Apache

Monitoring

Log Rotation to S3

Appdynamics Machine Agent

Appdynamics App Agent

monitoring

Application war file, base servlet, platform, interface

jars for dependent services

GC and thread dump logging

Healthcheck, status servlets, JMX interface,

Servo autoscale

Monday, August 5, 13

Tweet @garethbowles with feedback!

At Netflix, the AMI is the unit of deployment.

Monday, August 5, 13

Tweet @garethbowles with feedback!

Asgard• Web UI and REST API for service deployment

and management

• Manage ASGs, ELBs, security groups, ...

• Application -> cluster -> ASG

• Rapid deployment and rollback

• Available to all engineers

• https://github.com/Netflix/asgard

Monday, August 5, 13

Tweet @garethbowles with feedback!

Red / Black Deployment

Monday, August 5, 13

Tweet @garethbowles with feedback!

Netflix has moved the granularity from the

instance to the cluster.

Monday, August 5, 13

Tweet @garethbowles with feedback!

Simple Service Setup Effort• Write the code (variable :-))

• 15 minutes to write a build file and define dependencies

• 15 mins to create a Jenkins build, 2 to 10 mins to run it

• 5 mins to bake an AMI

• 10 mins to deploy in test, another 10 for prod

Monday, August 5, 13

Tweet @garethbowles with feedback!

Just a quick reminder...

(Some of) Netflix is open source:

https://github.com/netflix

Monday, August 5, 13

Tweet @garethbowles with feedback!

Why We Open Source

• Give back to Apache license OSS community

• Motivate, retain, hire top engineers

• Benefit from a shared ecosystem

• Make Netflix solutions into common standards

Monday, August 5, 13

Tweet @garethbowles with feedback!

The Netflix PlatformDiscovery (Eureka)Entrypoints (Edda)

Configuration (Archaius)Zookeeper (Exhibitor)logging (Blitz4j & Honu)

NIWS (Ribbon)GeoBase

Hystrix

Circuit Breakers (Hystrix)Cassandra (Priam &

Astyanax & CassJMeter) Cryptex AKMS

EvCacheZuuli18nL10n

Open Source

Monday, August 5, 13

Tweet @garethbowles with feedback!

https://github.com/Netflix/Cloud-Prize/wiki

Monday, August 5, 13

Tweet @garethbowles with feedback!

Thank You !

Email: gbowles@{gmail,netflix}.com

Twitter: @garethbowles

Linkedin: www.linkedin.com/in/garethbowles

Monday, August 5, 13