Roy Bauweraerts & Erwin de Keijzer · behold the distributed monolith 3 dedicated aws ec2 instances...

Post on 14-Jul-2020

1 views 0 download

Transcript of Roy Bauweraerts & Erwin de Keijzer · behold the distributed monolith 3 dedicated aws ec2 instances...

Running docker in production

Roy Bauweraerts & Erwin de Keijzer

Hello!Mijndomein

● Webhosting company founded in 2003● 572.870 Domains● 194.870 Customers

md3Behold the monolith

● Own iron● 1 release every 4 weeks (+ hotfixes)

○ mostly night releases● (mostly) manual process● Releases by Operations

µd3behold the distributed monolith

● 3 dedicated aws ec2 instances per microservice

● Multiple releases every week● Releases by Developers● Service discovery with Consul

ConsulService discovery

Consul

● Service discovery

● Failure detection

● Multi datacenter

● Key/Value storage

µd3Behold the distributed monolith

● Tightly coupled● Expensive● Complicated to introduce new services

○ write playbooks○ add instances○ add “service” to deploy server○ create healthchecks

● High overhead

The goalBut why?!

To migrate to a platform that allows us to quickly add, change or remove functionality

with high confidence, without compromising the user experience or availability.

Docker

Docker

Docker containers wrap a piece of software in a complete filesystem that contains everything

needed to run:

code, runtime, system tools, system libraries –

anything that can be installed on a server.

This guarantees that the software will always run the same, regardless of its environment.

Their own words

ChallengesDo you accept?

● Running docker containers

● Environment consistency & configuration

● Service discovery

● Logging

● Request routing

● Monitoring

● Updates without downtime

Challenge#1 Running docker containers

What is the easiest and most reliable method of managing your containers (CRUD & scale) with minimal effort and without affecting your customers?

Challenge

Kubernetes

#1 Running docker containers

Nomad Docker swarm Amazon ecs

Challenge

Kubernetes

#1 Running docker containers

Nomad Docker swarm Amazon ecs

Challenge

DC/OS

“DC/OS is an enterprise grade datacenter-scale

operating system,

providing a single platform for running containers,

big data, and distributed apps in production.”

#1 Running docker containers

Challenge

Pro:● It keeps the containers

running rather well● Easy bootstrap● CRUD web interface● Logging possibilites● Rolling updates based on

health checks

Con:● Lots of moving parts● Distributed state● No native consul integration● Webui has flaws● No internal name spacing● No way of running services on

all agents

#1 Running docker containersDC/OS

Challenge#1 Running docker containers

ChallengesDo you accept?

● Running docker containers

● Environment consistency & configuration

● Service discovery

● Logging

● Request routing

● Monitoring

● Updates without downtime

Challenge#2 Environment

How can you guarantee that your code behaves the same?

here

and here

Challenge#2 Environment

● Use the same artifact in all your environments

● Artifact combines all resources needed for running your code:

○ os

○ libraries

○ plugins

○ tooling

● Configuration is injected during runtime

● Use the same artifact in all your environments

● Artifact combines all resources needed for running your code:

○ os

○ libraries

○ plugins

○ tooling

● Configuration is injected during runtime

Challenge#2 Environment

} docker

● Use the same artifact in all your environments

● Artifact combines all resources needed for running your code:

○ os

○ libraries

○ plugins

○ tooling

● Configuration is injected during runtime

Challenge#2 Environment

} docker

}consul-template

environment consistencyChallenge#2 Environment

➭ cat parameters.yml.ctmpl ---{{ tree "config/mijndomein" | explode | toYAML }}

➭ consul-template -consul consul.service.consul:8500 -once -template "parameters.yml.ctmpl:parameters.yml"

➭ cat parameters.yml---example: data

environment consistencyChallenge#2 Environment

consul-template

parameters.yml

Challenge#2 Environment

ChallengesDo you accept?

● Running docker containers

● Environment consistency & configuration

● Service discovery

● Logging

● Request routing

● Monitoring

● Updates without downtime

Challenge#3 Service discovery

10.0.0.2 10.0.0.310.0.0.1

A AC DB FE AB

13:00

How do you let your containers discover other containers in continuously changing environment?

Challenge#3 Service discovery

10.0.0.2 10.0.0.310.0.0.1

13:10

D AC DE FB AC

How do you let your containers discover other containers in continuously changing environment?

Challenge#3 Service discovery

10.0.0.2 10.0.0.310.0.0.1

13:20

A BC AE FB DC

13:20

How do you let your containers discover other containers in continuously changing environment?

Challenge#3 Service discovery

Mesos DNSContainers need to communicate with services outside

of DC/OS.

DC/OS service portsOutside services also need to know the IP addresses.

Consul DNSDC/OS cannot communicate with consul.

Challenge#3 Service discovery

Mesos consul

“Mesos to Consul bridge for service discovery.”

Challenge#3 Service discovery

● Watches Mesos

● Registers tasks as applicationid.service.consul

○ (marathon labels can be used define your own servicename)

● Registers consul (http) health checks based on marathon labels

● Updates on a predefined interval

○ Not ideal, compromises between consistency and performance

Mesos consul

Challenge#3 Service discovery

ChallengesDo you accept?

● Running docker containers

● Environment consistency & configuration

● Service discovery

● Logging

● Request routing

● Monitoring

● Updates without downtime

Challenge#4 Logging

How do you determine what is happening

● at the application level● at the domain level

with minimal effort?

Challenge#4 Logging

Application

Stdout & stderr available through web interface for

realtime insights.

Also logged to Elasticsearch with rich metadata for

statistics and historical insights.

Challenge#4 Logging

Challenge#4 Logging

Domain

Events that are sent through RabbitMQ also get

stored in Elasticsearch

ChallengesDo you accept?

● Running docker containers

● Environment consistency & configuration

● Service discovery

● Logging

● Request routing

● Monitoring

● Updates without downtime

Challenge#5 Request routing

AWSELB / ALB

DCOSAgents

to

How do you make sure your requests reach the correct containers?

from

Challenge#5 Request routing

AWSELB / ALB DCOS

Agents

GET / HTTP/1.1Host: www.mijndomein.nl

10.0.0.1:32001

Challenge#5 Request routing

AWSELB / ALB DCOS

Agents

GET /producten HTTP/1.1Host: www.mijndomein.nl

10.0.0.2:32003

Challenge#5 Request routing

AWSELB / ALB DCOS

Agents

GET /login HTTP/1.1Host: auth.mijndomein.nl

10.0.0.3:32005

Challenge#5 Request routing

Register your containers in the AWS ALB Complex and mistake prone

Static proxy (NGINX, Apache2, HAProxy)Large featureset but a lot of manual labour

Dynamic proxy (Fabio/Traefik)Easy but limited

Challenge#5 Request routing

AWSELB / ALB DC/OS

Agentsnginx proxy

Host: w

ww

.mijndom

ein.nl

GET / HTTP/1.1

GET /producten HTTP/1.1

Host: auth.mijndomein.nl

Challenge#5 Request routing

ChallengesDo you accept?

● Running docker containers

● Environment consistency & configuration

● Service discovery

● Logging

● Request routing

● Monitoring

● Updates without downtime

Challenge#6 Monitoring

How do you automatically check (and fix) the health of your containers?

● Marathon checks● Consul health checks● Alerting with Datadog

Challenge#6 Monitoring

Marathon

Challenge#6 Monitoring

Challenge#6 Monitoring

Datadog

● Visualisation of all tracked metrics

● Alerting on predefined limits

○ hard thresholds (request rate == 0)

○ dynamic thresholds (disk usage suddenly grows

faster than before)

ChallengesDo you accept?

● Running docker containers

● Environment consistency & configuration

● Service discovery

● Logging

● Request routing

● Monitoring

● Updates without downtime

Challenge#7 Rolling updates

How do you update applications and servers without affecting your customers?

Challenge#7 Rolling updates

Applications

Challenge#7 Rolling updates

Challenge#7 Rolling updates

Servers

environment consistencyChallenge#7 Rolling updatesDC/OS Agents

1a

1b

1c

environment consistencyChallenge#7 Rolling updatesDC/OS Agents

1a

1b

1c

environment consistencyChallenge#7 Rolling updatesDC/OS Agents

1a

1b

1c

ChallengesDo you accept?

● Running docker containers

● Environment consistency & configuration

● Service discovery

● Logging

● Request routing

● Monitoring

● Updates without downtime

AfterthoughtsWould we do it again?

○ Entire environment has become more complex

then before.

○ DC/OS schedules single containers, which made us

create multi-process containers.

○ Lack of namespacing forces us to separate accept

and production environments and also allows more

internal communication than necessary.

○ Secrets and ACL are not part of the Free DC/OS.

AfterthoughtsWould we do it again?

○ Since setting up DC/OS we have had a 200%

increase in microservices.

○ Because Dev, Accept and Prod are so similar, we

have had nearly no bugs introduced by the

environment.

○ Introducing new microservices to production can

now be achieved in a few hours.

○ We now run over 40 unique microservices (about 75

containers) on 12 instances.

AfterthoughtsWould we do it again?

Components

DC/OSMasters

DC/OSAgents

AWSELB / ALB

MySQL

Redis

RabbitMQ

Elasticsearch

mesosconsul

Consul

Bye bye!That’s all folks

https://github.com/mijndomein/docker-in-production-talk