JUST EAT: Tools we use to enable our culture

Post on 22-Apr-2015

349 views 3 download

description

DevOps is not about tools, but good ones definitely make facilitate the ability to run fast.

Transcript of JUST EAT: Tools we use to enable our culture

JUST EAT: embracing DevOpsOr: How we make a Windows-based ecommerce platform work (with AWS)@petemounce & @justeat_tech

JUST EAT: Who are we?

● In business since 2001 in DK, 2005 in UK● Tech team is ~50 people in UK, ~20 people in Ukraine● Cloud native in AWS

○ Except for the bits that aren’t (yet)

● Very predictable load● ~1000 orders/minute at peak in UK

● We’re recruiting!○ http://tech.just-eat.com/jobs/○ http://tech.just-eat.com/jobs/senior-software-engineer-

platform-services/○ Lots of other roles

JUST EAT: Who are we?

Oh, yeah - we do online takeaway.

We’re an extra sales channel for our restaurant partners.

We do the online part.

Challenging!

We make this work.

What are we?

We do high-volume ecommerce.

Windows platform.

Most production code is C#, .NET 4 or 4.5.

Most automation is ruby 1.9.x. Some powershell.

Ongoing legacy transformation; no big rewrites.

Splitting up a monolithic system into SOA/APIs, incrementally.

Tech culture

“You ship it, you operate it”

Each team owns their own features, infrastructure-up.

Minimise dependencies between teams.

Each team has autonomy to work on what they want within some constraints.

Rules:● don’t break backwards compatibility● use what you want - but operate it yourself● other teams must be able to launch & verify your stuff in

their environments

But how?

Table-stakes for this to work (well):

1. Persistent group chat

2. Real-time monitoring

3. Real-time alerting

4. Centralised logging

Make it easier to debug in production without a debugger.

Persistent group chat

We use HipChat.

You could use IRC / Campfire / Hangouts.

● Persistent - jump in, read up

● Searchable history

● Integrate other tools to it

● hubot for fun and profit○ @jebot trg pd emergency with msg “we’re out of champagne in the

office fridge”

Real-time monitoring

Microsoft’s SCOM requires an AD

Publish OS-level performance counters with perftap - windows analogue of collectd we found and customised

Receive metrics into statsd

Visualise time-series data with graphite○ 10s granularity retained for 13 months○ AWS’ CloudWatch gives you 1min / 2 weeks

Addictive!

Real-time alerting

This is the 21st century; emailing someone their server is down doesn’t cut it.

seyren runs our checks.

Publishes to● HipChat● PagerDuty● SMS● statsd event metrics (coming soon, hopefully)

Centralised logging

Windows doesn’t have syslog.

Out of the box EventLog isn’t quite it.

Publish logs via nxlog agent.

Receive logs into logstash cluster.

Filter, transform and enrich into elasticsearch cluster.

Query, visualise and dashboard via kibana.

Without these things, operating a distributed system on Windows is hard.

Windows at scale assumes that you have an Active Directory.We don’t.

● No Windows network load-balancing.● No centrally trusted authentication.● No central monitoring (SCOM) to harvest performance

counters.● No easy remote command execution (WinRM wants an AD,

too)● Other stuff; these are the highlights.

The most important things

● Culture

● Principles that everyone lives by

● Devolve autonomy down to people on the ground

● (Tools)

Did we mention we’re hiring?

We’re pragmatic.

We’re successful.

We support each other.

We use sharp tools that we pick ourselves based on merit.

Join us!○ http://tech.just-eat.com/jobs/○ http://tech.just-eat.com/jobs/senior-software-engineer-

platform-services/○ Lots of other roles

Any questions?