Rent The Runway: Transitioning to Operations Driven Webservices

30
Operations Driven Web Services -A Case Study of Service Evolution at Rent the Runway Camille Fournier, Head of Engineering @skamille Carlo Barbara, Senior Engineer @CarloBarbara

Transcript of Rent The Runway: Transitioning to Operations Driven Webservices

Page 1: Rent The Runway: Transitioning to Operations Driven Webservices

Operations Driven Web Services-A Case Study of Service Evolution at Rent the Runway

Camille Fournier, Head of Engineering @skamille

Carlo Barbara, Senior Engineer @CarloBarbara

Page 2: Rent The Runway: Transitioning to Operations Driven Webservices

In The Beginning, There Was Drupal

Product Details

Filtering

View UsersProduct Creation

Order ManagementReservations

Login

Page 3: Rent The Runway: Transitioning to Operations Driven Webservices

There was also all of these folks…

Page 4: Rent The Runway: Transitioning to Operations Driven Webservices
Page 5: Rent The Runway: Transitioning to Operations Driven Webservices

ViewProduct Creation

Order ManagementReservations

Filtering

Product Details

Users

Login

Can’t Just Burn the World Down

Page 6: Rent The Runway: Transitioning to Operations Driven Webservices

ViewProduct Creation

Order ManagementReservations

Filtering

Product Details

Users

Login

Hollow It Out!

Page 7: Rent The Runway: Transitioning to Operations Driven Webservices

ViewProduct Creation

Order ManagementFiltering

Product Details

Users

Login

Hollow It Out!

Page 8: Rent The Runway: Transitioning to Operations Driven Webservices

ViewProduct Creation

Order ManagementFiltering

Users

Login

Hollow It Out!

Page 9: Rent The Runway: Transitioning to Operations Driven Webservices

ViewProduct Creation

Order Management

Users

Login

Hollow It Out!

Page 10: Rent The Runway: Transitioning to Operations Driven Webservices

Complexity

Dec-1

1

Feb-

12

Apr-1

2

Jun-

12

Aug-

12

Oct-1

2

Dec-1

2

Feb-

13

Apr-1

3

Jun-

1302468

101214

Number of Services in Production

Page 11: Rent The Runway: Transitioning to Operations Driven Webservices

Operations first…

Availability and performance of our services is critical to running our business

The software we develop has to make delivering on our SLAs possible

How (besides sane design): Healthchecks + Nagios Measurements Historical Data with Graphs

Page 12: Rent The Runway: Transitioning to Operations Driven Webservices

Metrics

Gauges – instantaneous value

Counters – counter with +/-

Meters – rate over time (mean, 1, 5, & 15 moving avg.)

Histograms – distribution of data (mean, median, max, std. div., 75th, 90th, 95th, 98th, 99th, & 99.9th percentiles)

Timers – Meter of requests & Histogram of duration (frequency & latency)

Page 13: Rent The Runway: Transitioning to Operations Driven Webservices

Metrics - Reporting

HTTP

JMX

Graphite

Page 14: Rent The Runway: Transitioning to Operations Driven Webservices

Dropwizard: What is it?

Quality open source Java webservice components glued together in a modular way

Eliminates the need for picking a platform stack, it’s all there

It’s opinionated. If you don’t like a Dropwizard core component, that’s too bad, don’t use Dropwizard

Developers focus on business logic, not framework

It’s easy, maintainable, and it works!

Page 15: Rent The Runway: Transitioning to Operations Driven Webservices

A Few Words from Coda…

“I had no one I had to toss a WAR to. I had no one to stand up a Tomcat server and fiddle with it until their eyes bled. I had no one who didn't trust me to spin up my own threads or connection pools. So I wrote something which worked as simply and in as straight-forward a manner as possible because my own ass was on the line if it didn't work.”

Page 16: Rent The Runway: Transitioning to Operations Driven Webservices

Dropwizard: The Ingredients

Jersey for REST

Jackson for JSON

Jetty for a webserver

Metrics for measuring

YAML for configuring

Dropwizard for weaving everything together

Page 17: Rent The Runway: Transitioning to Operations Driven Webservices

Dropwizard – Healthchecks

Register hooks that check the health of your app

An HTTP endpoint that iterates over all the hooks

“The meaning of healthy” is decided by you (i. e. Database Connections, Client Connections, DeadLock Count)

Page 18: Rent The Runway: Transitioning to Operations Driven Webservices

Dropwizard + Metrics

Dropwizard has lots of platform instrumentation baked in using Metrics, happens for free! (i.e. Jetty, JVM, Log Counts, etc…)

Ability to add Timers to your endpoints with @Timed

Ability to add arbitrary metrics as you see fit

Page 19: Rent The Runway: Transitioning to Operations Driven Webservices

Other Frameworks

Play 1.X Abandonware for Play 2.X, which was still beta Magic

Glassfish OSGI hell “standards”

Spring Everything and the kitchen sink Also I hate XML

Page 20: Rent The Runway: Transitioning to Operations Driven Webservices

What do I get out of it? Dev agenda

Story telling: causation & correlation

Integral piece of the operational excellence puzzle

State of the world – Dashboards

Developers focus on features, operations is mostly free lunch

Code review & demo

Disclaimer: You need graphite to really harness the value

Page 21: Rent The Runway: Transitioning to Operations Driven Webservices

Story telling

The grid is slow why? Is it load? Is it dependent service latency? How does that compare to yesterday

JVM throws out of memory, what’s the problem? What does the GC jigsaw look? When did it change? Is it correlated with increased load?

How is that new ‘performance’ tweak? If you never measured, then you didn’t tune. True story! What does my 5XX graph look like?

Page 22: Rent The Runway: Transitioning to Operations Driven Webservices

Operational Excellence: The ingredients

Application Instrumentation (Dropwizard)

Time Series Data & Graphing (Graphite, D3)

Centralized logging & log parsing (Rsyslog, Logstash, Nagios)

Automated alerting & escalation (Pagerduty)

DW & Graphite will get you very far, but if you want total control & visibility you need the rest. This is the stack that RTR is moving towards, rather than relying on basic java logging smtp appenders

Page 23: Rent The Runway: Transitioning to Operations Driven Webservices

OMG, we are on GMA, are we OK?

10+ services

Each services runs in a cluster behind an LB

‘OK’ is somewhat service specific

Basically you need a lot of info at your fingertips. Pictures are worth a thousand words. Get yourself some dashboards!

Page 24: Rent The Runway: Transitioning to Operations Driven Webservices

Graphite Dashboard

Page 25: Rent The Runway: Transitioning to Operations Driven Webservices

Tasseo dashboard (D3)

• Red, Yellow, & Green Lights• Realtime• Endless cool things: graphite + D3

If we see yellow or red, start diagnosing

Page 26: Rent The Runway: Transitioning to Operations Driven Webservices

Free Lunch? Really

DB connection pool monitoring

Http client connection pool monitoring

JVM Heap & GC info

Http Server response counts

Http Server connection info

Endpoint duration & throughput stats

Page 27: Rent The Runway: Transitioning to Operations Driven Webservices

Where do I sign up?

You install Graphite, one time hit + some TLC. Medium Difficulty

You annotate your endpoints and maybe add finer telemetry. Easy

You configure so your service is feeding into graphite. Hopefully consistently across services, via a ‘Bundle’. Easy

Page 28: Rent The Runway: Transitioning to Operations Driven Webservices

Demo

Show a simple dropwizard codebase 0.6.2 Slim Example: https://github.com/cab222/choco

0.7.0-SNAPSHOT Complete: https://github.com/dropwizard/dropwizard/tree/master/dropwizard-example

Do some curls

Show the admin endpoints

Page 29: Rent The Runway: Transitioning to Operations Driven Webservices

References

dropwizard.codahale.com

metrics.codahale.com

graphite.wikidot.com

Page 30: Rent The Runway: Transitioning to Operations Driven Webservices

Presenters

@CarloBarbara (www.cabkata.com)

@Skamille (whilefalse.blogspot.com)

Rent The Runway is hiring! (renttherunway.com/careers)