Running & Monitoring Docker at Scale

November 12th, 2014 | Las Vegas

Monitoring and Running Docker Containers at Scale Alexis Lê-Quôc, Datadog

@alq — CTO at Datadog

Datadog

•  Monitoring service •  Made for the cloud •  Aggregates everything •  Support for Docker

(since 1.0)

Goals 1. Present key Docker metrics 2. Explain operational complexity 3. Rethink monitoring of Docker containers

Agenda •  A (very) brief history of containers •  Docker containers on AWS •  Key Docker metrics •  Operational complexity •  Monitoring Docker effectively

•  Demo

A brief history of containers

Containers in a nutshell •  Been around for a long time

–  jails, zones, cgroups •  No full-virtualization overhead •  Used for runtime isolation (e.g. jails) •  Docker: escape from dependency hell

Escape from dependency hell a.out

shared libs

packages

omnibus

Docker ~

Container ~ single static binary Process Container Host

Source Dockerfile Chef/Puppet Kickstart

.TEXT /var/lib/docker Full distro

PID Name/ID Hostname

Docker on AWS: some numbers

(Some) Docker use cases •  Continous integration

–  eliminate dependency variance –  same code from dev laptop to production –  git-like workflow

•  Continuous delivery –  (quasi) stateless components –  web workers, video encoders, etc. –  not for data stores (Amazon RDS a better fit)

Instance types

20% 20% 19%

c3.2xl m3.medium m3.large m3.xlarge m1.large the rest

Source: Datadog, October 2014

Containers per instance •  Average: 5 (October 2014) •  Highly dependent on the workload •  This is just the beginning… •  Expect higher container density going forward

Source: Datadog, October 2014

Key Docker metrics

Monitoring fundamentals Work

Resource consumption

Measures the amount of value created

Measures the amount of resources consumed to create value

What your customers care about What your customers don’t care about

Database: queries answered Web server: requests served Queue: wait time distribution

Database: I/O throughput Web server: active connections OS: CPU utilization Container: memory footprint

Docker containers consume… •  Memory •  CPU •  I/O •  Network

Memory Name Why it matters

pgmajfault Paging to/from disk is slow

pgfault Context switches hurt application performance

resident set size (rss) Too much RSS causes paging and swapping

swap Swapping in/out is slow

CPU Name Why it matters

user Measures work being done

system System calls, a necessary evil

Block I/O Name Why it matters

blkio.io_service_bytes I/O is (often) bottleneck

blkio.io_queued Measures saturation

Network Name Why it matters

tx/rx_errors Because… errors are bad.

tx/rx_dropped Measures contention

tx/rx_bytes Measures traffic

How to collect metrics •  https://github.com/google/cadvisor

Operational complexity

Combinatorial multiplication

Hardware

Off-the-shelf

Your Application

Hardware

Hypervisor

Off-the-shelf

Hardware

Hypervisor

A A A A

Containers

O O O O

Operational complexity •  Average containers per instance: N (N=5, 10/2014) •  N-times as many “hosts” to manage •  Affects

–  provisioning: prep’ing & building containers –  configuration: passing config to containers –  orchestration: deciding where/when containers run –  monitoring: making sure containers run properly

Monitoring: metric counts on Amazon EC2

•  1 Amazon EC2 instance –  10 CloudWatch metrics

•  1 operating system (e.g. linux) –  100 metrics

•  1 Container –  50 metrics

•  1 off-the-shelf application –  ~50 metrics

100 500 instances containers

Assuming only 5 containers per instance

160 410 metrics per instance

metrics per instance

Assuming only 5 containers per instance

Velocity

hours, days, months

minutes, hours, days

EC2 instance half-life Container half-life

Aggravating factors •  Hub-based provisioning

–  new images every day •  Autonomic orchestration

–  from imperative to declarative –  automated –  individual containers don’t matter –  e.g. kubernetes, mesos

A lot more, A lot faster.

If your monitoring is still centered on individual hosts or instances…

Host-centric monitoring

Monitor

Hypervisor

A A A A

Containers

O O O O

A lot more pain, A lot faster.

Monitoring containers effectively

A new approach to container monitoring

Layers + Tags

Layers of monitoring

Monitor

Hypervisor

A A A A

Containers

O O O O

CloudWatch

Infrastructure Monitoring

Hypervisor

A A A A

Containers

O O O O

cpu/net/io

filesystem docker mem docker cpu db queries

web requests

app throughput

CloudWatch

Infrastructure Monitoring

Hypervisor

A A A A

Containers

O O O O

Layers of monitoring •  Access to metrics from all the layers •  Amazon CloudWatch, OS metrics, Docker metrics,

app metrics in 1 place •  Shared timeline

If your monitoring does not cover all layers, pain.

You use them already

Tags •  Monitoring is like Auto-Scaling Groups •  Monitoring is like Docker orchestration •  From imperative to declarative •  Query-based •  Queries operate on tags

Monitoring with tags and queries

“Monitor all Docker containers running image web” “… in region us-west-2 across all availability zones” “… and make sure resident set size < 1GB on c3.xl”

“Monitor all Docker containers running image web” “… in region us-west-2 across all availability zones” “… that use more than 1.5x the average on c3.xl”

“Dude, where’s my server?”

“Dude, where’s my container?”

If your monitoring is not tag-based, pain.

Take-aways 1. Docker increases operational complexity by an order

of magnitude unless… 2. You have layered monitoring, from the instance to

the container and to the application, and… 3. You monitor using tags and queries

Please give us your feedback on this presentation

Join the conversation on Twitter with #reinvent

Running & Monitoring Docker at Scale

Software

Transcript of Running & Monitoring Docker at Scale

Fluentd and Docker - running fluentd within a docker container

Running aground: Debugging Docker in production

Running Docker clusters on AWS (November 2016)

Monitoring large scale Docker production environments

Tytuł oryginału: Docker: Up & Running: Shipping Reliable Containers in Production … · 2019. 6. 10. · docker exec 131 nsenter 132 docker volume 134 Logi 136 Polecenie docker

Docker Orchestration at Production Scale

Running Galaxy in a Secure Environment using Docker

Scale your docker containers with Mesos

Docker Networking – Running multi-host applications

Getting up and running with Docker

Scale Big With Docker — Moboom 2014

CNA1699BE Running Docker on your Existing Infrastructure ... · Running Docker on your Existing Infrastructure with vSphere Integrated Containers VMworld 2017 Content: Not for publication

Containers - docs.freitagsrunde.org · Docker Client docker build Build an image from a Dockerfile docker exec Run a command in a running container docker inspect Return low-level

Containers at Scale – Kubernetes and Docker - Red Hatpeople.redhat.com/mskinner/rhug/q1.2015/docker-and-kubernetes.pdf · Containers at Scale – Kubernetes and Docker ... registry=registry-host:5000

Running and Scaling Docker Containers with Kontena

Running Docker on Lustre - OpenSFScdn.opensfs.org/wp-content/uploads/2016/04/LUG2016D2_An... · Running Docker on Lustre An architectural overview ... ... PowerPoint Presentation

Lessons Learned From Running Spark On Docker

Running Docker in Development & Production (#ndcoslo 2015)

Orchestrating Docker containers at scale - … · - Orchestration at scale Agenda. Maciej Lasyk, Docker containers at scale ... - Each container gets own network stack - control groups

CNA1699BU Running Docker on your Existing Infrastructure or …€¦ · Running Docker on your Existing Infrastructure with vSphere Integrated Containers VMworld 2017 Content: Not