Riding The N Train: How we dismantled Groupon's Ruby on Rails Monolith

39
Riding the N(ode) Train: Dismantling the Monoliths Tuesday, December 3, 2013 Sean McCullough – Engineer at Groupon @mcculloughsean

description

This is a story about how Groupon's business was changing and our technology couldn't keep up. We rewrote the web site using node.js and changed the way our company and culture.

Transcript of Riding The N Train: How we dismantled Groupon's Ruby on Rails Monolith

Page 1: Riding The N Train: How we dismantled Groupon's Ruby on Rails Monolith

Riding the N(ode) Train: Dismantling the Monoliths

Tuesday, December 3, 2013

Sean McCullough – Engineer at Groupon @mcculloughsean

Page 2: Riding The N Train: How we dismantled Groupon's Ruby on Rails Monolith

Part I

Broken Architecture and

A Changing Business

Page 3: Riding The N Train: How we dismantled Groupon's Ruby on Rails Monolith

Business in Early 2012

Page 3

Page 4: Riding The N Train: How we dismantled Groupon's Ruby on Rails Monolith

Architecture in 2012

Page 4

Page 5: Riding The N Train: How we dismantled Groupon's Ruby on Rails Monolith

0%

20%

40%

60%

80%

100%

January ‘11

January ‘13

October ’12

July ’12

April ’12

January ’12

October ‘11

July ’11

April ’11

March ‘13

June ‘13

Leading the Mobile Commerce Revolution

Page 5

Mobile Transaction Mix Monthly, January 2011 to September 2013 (% of transactions)

September ’13

Page 6: Riding The N Train: How we dismantled Groupon's Ruby on Rails Monolith

Product Engineering was Stuck

We couldn’t build features fast enough

We wanted to build features world-wide

Mobile and Web weren’t at feature parity

Page 6

Page 7: Riding The N Train: How we dismantled Groupon's Ruby on Rails Monolith

Part II

The Rewrite

Page 7

Page 8: Riding The N Train: How we dismantled Groupon's Ruby on Rails Monolith

The Rewrite

Page 8

Page 9: Riding The N Train: How we dismantled Groupon's Ruby on Rails Monolith

The Rewrite

Should ...

• be built on APIs for consistent contract with mobile

• be easy to hire developers

• allow for teams to work at their own pace

• allow teams to deploy their own code

• allow for global design changes

• have out of the box I18n/L13n support

• be optimized for our read-heavy traffic pattern

• be small Page 9

Page 10: Riding The N Train: How we dismantled Groupon's Ruby on Rails Monolith

How do we…?

• Deploy

• Authorize Users

• Share Sessions

• Route to different applications

• Manage distributed ops

• QA the whole site

Page 10

Page 11: Riding The N Train: How we dismantled Groupon's Ruby on Rails Monolith

We Tried This Before and Failed

• Rolled out a new site design in our monolith

• Too many things changed all at once

• Hard to evaluate performance of each feature

Page 11

Page 12: Riding The N Train: How we dismantled Groupon's Ruby on Rails Monolith

New Platform Evaluation

We evaluated:

• Node

• MRI Ruby/Rails, MRI Ruby/Sinatra

• JRuby/Rails, Sinatra

• MRI Ruby + Sinatra+EM

• Java/Play, Java/Vertx

• Python+Twisted

• PHPPage 12

Page 13: Riding The N Train: How we dismantled Groupon's Ruby on Rails Monolith

Why Node?

• Vibrant community

• NPM!

• Easy to hire JavaScript developers

• Had the minimum viable performance characteristic

• Easy scaling (process model)

Page 13

Page 14: Riding The N Train: How we dismantled Groupon's Ruby on Rails Monolith

The First App

Page 14

Page 15: Riding The N Train: How we dismantled Groupon's Ruby on Rails Monolith

Growing Pains

Page 15

Page 16: Riding The N Train: How we dismantled Groupon's Ruby on Rails Monolith

Poking Holes in our Infrastructure

• Longevity Test over two days

• Try to root out memory leaks

• Talking only to non-production systems

Page 16

Page 17: Riding The N Train: How we dismantled Groupon's Ruby on Rails Monolith

Poking Holes in our Infrastructure

Within 2 hours we had a major site outage

Page 17

Page 18: Riding The N Train: How we dismantled Groupon's Ruby on Rails Monolith

Poking Holes in our Infrastructure

• SSL termination on our hardware load balancer caused CPU to max out at 100%

• Production systems were using same LB as test and development systems

Page 18

Page 19: Riding The N Train: How we dismantled Groupon's Ruby on Rails Monolith

Lessons Learned

• You will run into problems with Node

• You will find problems with your infrastructure

• Don’t panic!

Page 19

Page 20: Riding The N Train: How we dismantled Groupon's Ruby on Rails Monolith

The Second App

• Looking for the next page

• Chose the “Browse” page

• Recently Built

• Built using mostly Backbone

• Experienced team of JS developers

Page 20

Page 21: Riding The N Train: How we dismantled Groupon's Ruby on Rails Monolith

The Second App

Page 21

Page 22: Riding The N Train: How we dismantled Groupon's Ruby on Rails Monolith

The Second App

New Problems:

• User authentication

• More service calls

• Complicated routing

• More traffic

• Needed to share look and feel

Page 22

Page 23: Riding The N Train: How we dismantled Groupon's Ruby on Rails Monolith

The Second App

• Cultural problems

• Change of workflow

• Feedback loop fell apart

3 rewrites

6 months to launch

Page 23

Page 24: Riding The N Train: How we dismantled Groupon's Ruby on Rails Monolith

Shared Layout

Maintain consistent look and feel across site:

• Distribute layout as library

• Use ESIs for top/bottom of page

• Apps are called through a “chrome service”

• Fetch templates from service

Page 24

Page 25: Riding The N Train: How we dismantled Groupon's Ruby on Rails Monolith

Groupon Interface Guidelines

Page 25

Page 26: Riding The N Train: How we dismantled Groupon's Ruby on Rails Monolith

Layout Service

• Uses semantic versioning

• Roll forward with bug fixes

• Stay locked on a specific version

• Enable Site-Wide ExperimentsPage 26

Page 27: Riding The N Train: How we dismantled Groupon's Ruby on Rails Monolith

Layout Service

Page 27

Page 28: Riding The N Train: How we dismantled Groupon's Ruby on Rails Monolith

Layout Service

Page 28

Page 29: Riding The N Train: How we dismantled Groupon's Ruby on Rails Monolith

Routing Service

Page 29

Page 30: Riding The N Train: How we dismantled Groupon's Ruby on Rails Monolith

The Big Push… or There’s No Going Back

Page 30

• Decided to get the whole company to move at once

• Supporting two platforms is hard – Rip off the band aid!

• End of June 2012 - move to I-Tier by September 1st

Page 31: Riding The N Train: How we dismantled Groupon's Ruby on Rails Monolith

The Big Push… or There’s No Going Back

Page 31

• ~150 developers

• Global effort

• Feature freeze – A/B testing against mostly the same features

Page 32: Riding The N Train: How we dismantled Groupon's Ruby on Rails Monolith

Part III

It Worked!

Page 32

Page 33: Riding The N Train: How we dismantled Groupon's Ruby on Rails Monolith

95% Consumer Traffic On Node

Page 33

Page 34: Riding The N Train: How we dismantled Groupon's Ruby on Rails Monolith

Sustained US Traffic Over 120k RPM

Page 34

Page 35: Riding The N Train: How we dismantled Groupon's Ruby on Rails Monolith

Our Pages Got Faster

Page 35

Page 36: Riding The N Train: How we dismantled Groupon's Ruby on Rails Monolith

It Worked!

Page 36

Page 37: Riding The N Train: How we dismantled Groupon's Ruby on Rails Monolith

Success?

Page 37

• Moving to a new platform is not a straight line

• Solving for old problems

• Solving for new problems

• Culture shift

Page 38: Riding The N Train: How we dismantled Groupon's Ruby on Rails Monolith

38

• Streaming responses for better performance

• Better resiliency to outages… circuit breakers, brownouts

• Distributed Tracing

• International

• Open Source

New I-Tier apps as we build new teams, products, ideas.

Latest technologies to help us drive our business.

Next Steps

Page 39: Riding The N Train: How we dismantled Groupon's Ruby on Rails Monolith

Q&A