Post on 22-Apr-2015
description
JUST EAT: embracing DevOpsOr: How we make a Windows-based ecommerce platform work (with AWS)@petemounce & @justeat_tech
JUST EAT: Who are we?
● In business since 2001 in DK, 2005 in UK● Tech team is ~50 people in UK, ~20 people in Ukraine● Cloud native in AWS
○ Except for the bits that aren’t (yet)
● Very predictable load● ~1000 orders/minute at peak in UK
● We’re recruiting!○ http://tech.just-eat.com/jobs/○ http://tech.just-eat.com/jobs/senior-software-engineer-
platform-services/○ Lots of other roles
JUST EAT: Who are we?
Oh, yeah - we do online takeaway.
We’re an extra sales channel for our restaurant partners.
We do the online part.
Challenging!
We make this work.
What are we?
We do high-volume ecommerce.
Windows platform.
Most production code is C#, .NET 4 or 4.5.
Most automation is ruby 1.9.x. Some powershell.
Ongoing legacy transformation; no big rewrites.
Splitting up a monolithic system into SOA/APIs, incrementally.
Tech culture
“You ship it, you operate it”
Each team owns their own features, infrastructure-up.
Minimise dependencies between teams.
Each team has autonomy to work on what they want within some constraints.
Rules:● don’t break backwards compatibility● use what you want - but operate it yourself● other teams must be able to launch & verify your stuff in
their environments
But how?
Table-stakes for this to work (well):
1. Persistent group chat
2. Real-time monitoring
3. Real-time alerting
4. Centralised logging
Make it easier to debug in production without a debugger.
Persistent group chat
We use HipChat.
You could use IRC / Campfire / Hangouts.
● Persistent - jump in, read up
● Searchable history
● Integrate other tools to it
● hubot for fun and profit○ @jebot trg pd emergency with msg “we’re out of champagne in the
office fridge”
Real-time monitoring
Microsoft’s SCOM requires an AD
Publish OS-level performance counters with perftap - windows analogue of collectd we found and customised
Receive metrics into statsd
Visualise time-series data with graphite○ 10s granularity retained for 13 months○ AWS’ CloudWatch gives you 1min / 2 weeks
Addictive!
Real-time alerting
This is the 21st century; emailing someone their server is down doesn’t cut it.
seyren runs our checks.
Publishes to● HipChat● PagerDuty● SMS● statsd event metrics (coming soon, hopefully)
Centralised logging
Windows doesn’t have syslog.
Out of the box EventLog isn’t quite it.
Publish logs via nxlog agent.
Receive logs into logstash cluster.
Filter, transform and enrich into elasticsearch cluster.
Query, visualise and dashboard via kibana.
Without these things, operating a distributed system on Windows is hard.
Windows at scale assumes that you have an Active Directory.We don’t.
● No Windows network load-balancing.● No centrally trusted authentication.● No central monitoring (SCOM) to harvest performance
counters.● No easy remote command execution (WinRM wants an AD,
too)● Other stuff; these are the highlights.
The most important things
● Culture
● Principles that everyone lives by
● Devolve autonomy down to people on the ground
● (Tools)
Did we mention we’re hiring?
We’re pragmatic.
We’re successful.
We support each other.
We use sharp tools that we pick ourselves based on merit.
Join us!○ http://tech.just-eat.com/jobs/○ http://tech.just-eat.com/jobs/senior-software-engineer-
platform-services/○ Lots of other roles
Any questions?