Containerizing the largest Dutch e-commerce site: The bol.com story
Transcript of Containerizing the largest Dutch e-commerce site: The bol.com story
•About me
•About bol.com
•Containers... in production
•Mayfly: the original container use case
•Choices, choices...
•Lessons learned
•Next steps
2
Content
About me• Maarten Dirkse (@mdirkse)
• Developer with a history degree, 9+ years of experience (mostly Java)
• Work on the bol.com tools team. We provide the platform for the organisation to build software: Jenkins, SCM, Mayfly (more on that later)
• Have been running containers in production* for almost 2 years. (bol.com has been running containers in production, no *, for a little over a year but really only for the past 5 months)
3
* production internally, for devs, not for customers
•Over 6,5 million active customers
•Virtual footprint of almost 1 visitors million per day
•Over 14,5 million products
•Moved to our own DC two years ago
•VM-based architecture: 1 node per app instance
•Everything is puppetized but was derived from a static config source (Racktables)
•We’re hiring! http://banen.bol.com4
About bol.com
> 95% > 75%
Brand awareness
Containers... in production• Several mission-critical apps running in containers... in VM’s
• Mesos + Marathon cluster that runs backend GUI for the webshop
• Home-grown spidering solution that runs on Google Container Engine(also Mesos on GCE)
• Mesos + Marathon cluster that runs Mayfly...
6
Mayfly: the original use case
7the shop for everyone
^^ http://mayflycd.github.io/mayfly-talks/
What is Mayfly?• Team had an idea for allowing teams to develop every service feature in isolation to remove bottleneck of shared test environment
• Needed isolated runtime environment for every feature branch (that’s a lot of environments)
• VM infrastructure was too static, too resource heavy, too slow
8
Containers to the rescue!• Instead of having every feature branch deploy as a VM, deploy it as a container
• Use of containers meant we could spin up environments in seconds and pack more of them onto the hardware
• And so it was that containers were introduced at bol.com. But...
9
DockerCon 2014: docker + ?Towards “peak container confusion”
10
MesosMarathon (or Aurora?)KubernetesSynapse & NervePaastaAWS EC2 CSCoreOS + FleetRancherOSSpotify Helios
wut?
The stack• After trying Fleet (buggy) and Kubernetes (5 min old) we settled on Mesos+Marathon running on CoreOS RHEL7 on bare metal
• Consul for service discovery, Kevlar for KV store.
• Choices made for Mayfly became the prototype for the bol.com container infrastructure
12
Dynamic infrastructure is the future!
13 13
As the limitations of our VM-based infrastructure became clear, the platform team became
convinced that the move to dynamic infrastructure was a necessary step to take in
order to keep scaling the IT-architecture.
But wait, we’re not finished!
• After you’re done installing your new, mind-blowing tech you realize a lot of loose ends still need to be tied up.
• Deploying docker to your machines? (and which version)?--> Docker puppet module (https://github.com/garethr/garethr-docker)
• What about logs?--> Logspout (https://github.com/gliderlabs/logspout)
• Zombie processes, SD registration? --> ContainerPilot (https://github.com/joyent/containerpilot)
14
But wait, we’re not finished!• How do you actually tell Marathon what to deploy?--> Marathon terraform provider (https://github.com/Banno/terraform-provider-marathon)
• Install a (properly secured) Docker registry. We went with the stock Docker registry behind a secured Nginx reverse-proxy
• Base images? We choose to use the RHEL7 base image as the root of everything (known quantity in terms of ops support and security vetting)
• And mind how you create images...
15
BOB• Needed a way to audit and vet images that would be run in our landscape
• Created BOB, a wrapper tool for docker build and docker push
• BOB checks your Dockerfile’s and images, ensuring that they meet company standards, before they’re pushed to the registry
• Nothing gets pushed to the registry if it hasn’t been built by BOB
16
Use cases• Mayfly (see above)
• BIZ: lots of small, independently deployable modules with back office functionality. Stateless, ideal for containerization.
• Spidering: horizontally scalable stateless processes that run in the cloud.
18
Lessons learned
19the shop for everyone
^^ nothing funny about this, most of ‘em were learned the hard way
Lessons learned 1/2
• Most of this stuff is relatively new or brand new, expect growing pains
• Don’t run your container orchestration software (Mesos, Marathon) in containers. So if Docker dies, your platform doesn’t degrade with it.
• Running your apps in a container can sometimes lead to interesting issues that don’t exist outside of containers (JVM memory issues, for instance)--> See https://www.youtube.com/watch?v=6ePUiQuaUos for example
20
Lessons learned 2/2
• Graphite-style metrics become problematic in a container world. Prometheus exists, but we can’t just switch from one day to the next
• HA-Proxy & consul template combo is pretty brittle, we now use Fabio-->https://github.com/eBay/fabio
• Keep it simple, make small changesStatic to dynamic is a sea change that is incredibly hard to oversee. Take small steps that deliver value immediately
21
The cultural shift
• Beware the mindset transition that dev teams will have to experience
• Devs: “what do you mean I can’t ssh into the container?”)
• It takes time for ops people to adjust to the idea of dynamic infrastructure. People tend to think from within their own constraints--> OPS control over the app runtime will no longer be absolute
22
Next steps
24
• IP-per-container(needed for per-container firewalls, aka to get security off our back)
• Per-app service descriptor that drives app infra and config (to replace hiera data and feed Terraform)
• Migrating ever more apps to the dynamic infrastructure