Scaling Continuous Deployment at IMVU

34
0 Scaling with Continuous Deployment Web 2.0 Expo New York, NY, September 29, 2010 Brett G. Durrett (@bdurrett) Vice President Engineering & Operations, IMVU, Inc.

description

Scaling with Continuous Deployment as presented at the Web 2.0 Expo, New York, September 29, 2010

Transcript of Scaling Continuous Deployment at IMVU

Page 1: Scaling Continuous Deployment at IMVU

0

Scaling with Continuous Deployment

Web 2.0 Expo

New York, NY, September 29, 2010

Brett G. Durrett (@bdurrett)

Vice President Engineering & Operations, IMVU, Inc.

Page 2: Scaling Continuous Deployment at IMVU

An online community where members use 3D avatars

to meet new people, chat, create

and have fun with their friends

Page 3: Scaling Continuous Deployment at IMVU

2

Who is my audience?

Mix of engineering / product?

Page 4: Scaling Continuous Deployment at IMVU

3

Who is my audience?

Mix of engineering / product?

How many from a startup?

Page 5: Scaling Continuous Deployment at IMVU

4

Who is my audience?

Mix of engineering / product?

How many from a startup?

How many believe iterating on your product

is critical to the success of your business?

Page 6: Scaling Continuous Deployment at IMVU

5

How quickly can your business iterate?

Page 7: Scaling Continuous Deployment at IMVU

6

Can I interest you in some

Continuous Deployment?

Page 8: Scaling Continuous Deployment at IMVU

7

In a Nutshell

What is Continuous Deployment?

• Engineer commits code

• 20 minutes later it is live in production

• Repeat for about 50 commits per day

Page 9: Scaling Continuous Deployment at IMVU

8

Does This Really Work?

“Maybe this is just viable for a single

developer … your site will be down. A lot.”

“It seems like the author either has no

customers or very understanding

customers”

Responses to February 2009 blog posting about Continuous Deployment at IMVU

(at the time IMVU had a $12 million run rate)

Page 10: Scaling Continuous Deployment at IMVU

9

Benefits

• Regressions easy to find, correct

• Releases have zero overhead

• Rapid iteration using real customer metrics

Page 11: Scaling Continuous Deployment at IMVU

Finding and Fixing Problems

• Each release has few

changes, 1-3 commits

• Production issues

correlate with check-

in timestamp

• No overhead to

producing a new

release to correct

issue

Identifying cause

takes minutes

Page 12: Scaling Continuous Deployment at IMVU

11

CD at IMVU: Simple Overview

All tests

pass?

Local tests

pass, engineer

commits code

Lots and lots of

tests run

Code deployed

to all servers

Metrics

good?

Code deployed

to % of servers

Metrics

still

good?

Rollback

(Blocks)

Revert commit

(Blocks)

No

Yes

No

Yes

No

Yes

Win!

Page 13: Scaling Continuous Deployment at IMVU

12

CD at IMVU: Detailed Overview

Page 14: Scaling Continuous Deployment at IMVU

13

Getting Started – Extreme Basics

1. Continuous integration system

2. Production monitoring and alerting

– System performance

– Business metrics

– Trending is nice too

3. Simple deploy / roll-back system

Page 15: Scaling Continuous Deployment at IMVU

14

Commit to Making Forward Progress

• Require coverage for all new code

• Add coverage for bugs / regressions

• Understand and fix root cause of failures

Page 16: Scaling Continuous Deployment at IMVU

Expect Some Hurdles

• Production outages

• New overhead

– Tests

– Build systems

• Production outages

• Frustration

• Production outages

(but well worth it)

Page 17: Scaling Continuous Deployment at IMVU

16

Dealing with SQL

Problems

• Difficult to roll-back schema

• Alter statements lock / impact customers

Solutions

• New schema has formal review process

• No alter on large tables, create new table

– Copy on read

– Complete migration with background job

Page 18: Scaling Continuous Deployment at IMVU

17

Big Features

• Developed on trunk, not branch

– “hidden” from customers by A/B experiment

– 100% control, add QA to experiment

• Deployed daily during development

• Slow roll-out by increasing experiment %

– Experiment closed = fully launched

Page 19: Scaling Continuous Deployment at IMVU

18

Test Speed

Slow tests burden to scaling

• Can’t run all tests in sandbox

• Faster to debug on build cluster

If possible…

• Keep tests fast

• Keep tests specific

Page 20: Scaling Continuous Deployment at IMVU

19

The cost of failing tests

As the team grows…

• More likely to have test failures

• More people blocked as a result

Intermittent failures very bad

Eliminate the root cause

Page 21: Scaling Continuous Deployment at IMVU

20

Other Issues

• Won’t catch issues that fail slowly– SELECT * FROM growing_table WHERE 1

• Some critical areas cause hard lock-ups

– MySQL

– Memcached

• Lack of test coverage of older code

– Not an issue if you start with test coverage

Page 22: Scaling Continuous Deployment at IMVU

21

Does Continuous Deployment Scale?

• Technical staff ~50 people

• 10 million monthly unique visitors

• Peak ~130K concurrent IM client logins

• It’s a real business!

– $40 million run rate

– Profitable and doubled revenue in 2009

Page 23: Scaling Continuous Deployment at IMVU

22

Newer Scaling Challenges

Biggest challenges come with growth of the

engineering organization

Page 24: Scaling Continuous Deployment at IMVU

23

SLA for Build Systems

Build systems are a critical service

Page 25: Scaling Continuous Deployment at IMVU

24

SLA for Build Systems

Build systems are a critical service

Run them that way

Page 26: Scaling Continuous Deployment at IMVU

25

Build and Push Times

Page 27: Scaling Continuous Deployment at IMVU

26

Overall Availability

Page 28: Scaling Continuous Deployment at IMVU

http://www.flickr.com/photos/onebigchickenman/4869442019/

Page 29: Scaling Continuous Deployment at IMVU

28

Build Throughput

• Initial implementation sequential builds

– Scaled okay to ~20 engineers

– Like trains running every 20 minutes

– One “red” blocks all following builds

• Solution: build isolation

– Enable testing single build without deploy

– “Red” build pulled, allow other builds to pass

Page 30: Scaling Continuous Deployment at IMVU

29

Web Build Software

• Custom test-file runner with JS GUI

• PHP SimpleTest

• Python's built-in unittest

• Selenium Core with in-house API wrapper

• YUITest for browser JS unit tests

• Erlang Eunit

• Buildbot

Page 31: Scaling Continuous Deployment at IMVU

30

Current Systems

• > 15,000 tests

• 86 web build servers

– 62 Linux

– 24 Windows

• ~ 10 minutes on build servers

• Deploy to cluster of ~700 servers

Page 32: Scaling Continuous Deployment at IMVU

31

Conclusion

• Continuous Deployment is possible!

• Starting earlier is easier - baby steps

• The value of being able to iterate

outweighs the challenges

Page 33: Scaling Continuous Deployment at IMVU

32

Questions?

Page 34: Scaling Continuous Deployment at IMVU

Thank You!

Brett G. Durrett

[email protected]

Twitter: @bdurrett

IMVU recognized as:

Inc. 500

http://bit.ly/dv52wK

Red Herring 100:

http://bit.ly/bbz5Ex

Best Place to Work:

http://bit.ly/aAVdp8

(and we're hiring)

http://www.imvu.com/jobs