CoreOS, or How I Learned to Stop Worrying and Love Systemd
-
Upload
richard-lister -
Category
Software
-
view
266 -
download
0
Transcript of CoreOS, or How I Learned to Stop Worrying and Love Systemd
![Page 1: CoreOS, or How I Learned to Stop Worrying and Love Systemd](https://reader031.fdocuments.in/reader031/viewer/2022032616/55a7b00f1a28ab4f418b47be/html5/thumbnails/1.jpg)
CoreOS
![Page 2: CoreOS, or How I Learned to Stop Worrying and Love Systemd](https://reader031.fdocuments.in/reader031/viewer/2022032616/55a7b00f1a28ab4f418b47be/html5/thumbnails/2.jpg)
or, How I Learned to Stop Worrying and Love Systemdor, some pragmatic patterns for running docker in production
![Page 3: CoreOS, or How I Learned to Stop Worrying and Love Systemd](https://reader031.fdocuments.in/reader031/viewer/2022032616/55a7b00f1a28ab4f418b47be/html5/thumbnails/3.jpg)
Hello!
I AM RIC LISTERdirector of devops at spree commerce
@bnzmnzhnz
github.com/rlister
![Page 4: CoreOS, or How I Learned to Stop Worrying and Love Systemd](https://reader031.fdocuments.in/reader031/viewer/2022032616/55a7b00f1a28ab4f418b47be/html5/thumbnails/4.jpg)
open-source
Spreecomplete open-source e-commerce for rails
github.com/spree/spree
599 contributors 6181 stars
![Page 5: CoreOS, or How I Learned to Stop Worrying and Love Systemd](https://reader031.fdocuments.in/reader031/viewer/2022032616/55a7b00f1a28ab4f418b47be/html5/thumbnails/5.jpg)
e-commerce platform
Wombatconnect any store to any service
wombat.co
![Page 6: CoreOS, or How I Learned to Stop Worrying and Love Systemd](https://reader031.fdocuments.in/reader031/viewer/2022032616/55a7b00f1a28ab4f418b47be/html5/thumbnails/6.jpg)
systemd
Resistance is futile.
![Page 7: CoreOS, or How I Learned to Stop Worrying and Love Systemd](https://reader031.fdocuments.in/reader031/viewer/2022032616/55a7b00f1a28ab4f418b47be/html5/thumbnails/7.jpg)
Docker frees us from the operating system
No more dependency hell.
Since the OS no longer needs to support our app, we can go minimalist.
Which makes it easier to patch, and more secure.
![Page 8: CoreOS, or How I Learned to Stop Worrying and Love Systemd](https://reader031.fdocuments.in/reader031/viewer/2022032616/55a7b00f1a28ab4f418b47be/html5/thumbnails/8.jpg)
What do we need?
Some way to run containers:◦ docker pull, start, stop, rm◦ set environment variables◦ restart policies◦ capture output
And an OS that can update itself in a sane way.
And some orchestration …
![Page 9: CoreOS, or How I Learned to Stop Worrying and Love Systemd](https://reader031.fdocuments.in/reader031/viewer/2022032616/55a7b00f1a28ab4f418b47be/html5/thumbnails/9.jpg)
CoreOS
Originally based on ChromiumOS.
Which is based on Gentoo.
No packaging system.
Well ... there is: docker.
![Page 10: CoreOS, or How I Learned to Stop Worrying and Love Systemd](https://reader031.fdocuments.in/reader031/viewer/2022032616/55a7b00f1a28ab4f418b47be/html5/thumbnails/10.jpg)
orchestration
![Page 11: CoreOS, or How I Learned to Stop Worrying and Love Systemd](https://reader031.fdocuments.in/reader031/viewer/2022032616/55a7b00f1a28ab4f418b47be/html5/thumbnails/11.jpg)
Atomic updates (Omaha)
In the event of boot failure, rollback to A
System running off read-only /usr on A
OS update downloads to B, system reboots when ready *
![Page 12: CoreOS, or How I Learned to Stop Worrying and Love Systemd](https://reader031.fdocuments.in/reader031/viewer/2022032616/55a7b00f1a28ab4f418b47be/html5/thumbnails/12.jpg)
Update strategies
Before reboot host requests a global lock using magic. *
By default one host per cluster can hold a reboot lock.
Can turn off reboots.
Define strategy in cloud-config:
#cloud-config
coreos:
update:
group: stable
reboot-strategy: off
* not actual magic
![Page 13: CoreOS, or How I Learned to Stop Worrying and Love Systemd](https://reader031.fdocuments.in/reader031/viewer/2022032616/55a7b00f1a28ab4f418b47be/html5/thumbnails/13.jpg)
Release channels: choose your pain tolerance
Stable
Production clusters, all software tested in alpha and beta first.
Beta
Promoted alpha releases. Run a few beta hosts to catch problems early.
Alpha
Tracks dev and gets newest docker, etcd and fleet. Frequent releases.
https://coreos.com/releases/
![Page 14: CoreOS, or How I Learned to Stop Worrying and Love Systemd](https://reader031.fdocuments.in/reader031/viewer/2022032616/55a7b00f1a28ab4f418b47be/html5/thumbnails/14.jpg)
ETCDOpen-source distributed key-value store. Uses Raft protocol (consensus).
Provides shared configuration and service discovery.
![Page 15: CoreOS, or How I Learned to Stop Worrying and Love Systemd](https://reader031.fdocuments.in/reader031/viewer/2022032616/55a7b00f1a28ab4f418b47be/html5/thumbnails/15.jpg)
Features of etcd
Useful features like TTL, locks.
Simple HTTP API. Read and write values with curl or etcdctl.
Keys and values stored in directories like filesystem.Watch a key or directory for changes.
![Page 16: CoreOS, or How I Learned to Stop Worrying and Love Systemd](https://reader031.fdocuments.in/reader031/viewer/2022032616/55a7b00f1a28ab4f418b47be/html5/thumbnails/16.jpg)
Setting up an etcd cluster
Get a discovery token: $ curl https://discovery.etcd.io/new
https://discovery.etcd.io/d88814387d940b36dbc2b4393c3d3a94
Boot 3 machines with cloud-config:#cloud-configcoreos: etcd: discovery: https://discovery.etcd.io/d88814387d940b36dbc2b4393c3d3a94
addr: $private_ip4:4001
peer-addr: $private_ip4:7001
units:
- name: etcd.service
command: start
![Page 17: CoreOS, or How I Learned to Stop Worrying and Love Systemd](https://reader031.fdocuments.in/reader031/viewer/2022032616/55a7b00f1a28ab4f418b47be/html5/thumbnails/17.jpg)
Using etcd keys
set a key
$ ssh 10.10.1.1
CoreOS stable (607.0.0)
$ etcdctl set /foo "Hello world"
Hello world
$ curl -L -X PUT http://127.0.0.1:4001/v2/keys/bar -d value="Hello world"
{"action":"set","node":{"key":"/bar","value":"Hello world","modifiedIndex":42103694,"createdIndex":42103694}}
![Page 18: CoreOS, or How I Learned to Stop Worrying and Love Systemd](https://reader031.fdocuments.in/reader031/viewer/2022032616/55a7b00f1a28ab4f418b47be/html5/thumbnails/18.jpg)
Using etcd keys
get a key
$ ssh 10.10.1.1
CoreOS stable (607.0.0)
$ etcdctl get /foo
Hello world
$ curl -L http://127.0.0.1:4001/v2/keys/bar
{"action":"get","node":{"key":"/bar","value":"Hello world","modifiedIndex":40004310,"createdIndex":40004310}}
![Page 19: CoreOS, or How I Learned to Stop Worrying and Love Systemd](https://reader031.fdocuments.in/reader031/viewer/2022032616/55a7b00f1a28ab4f418b47be/html5/thumbnails/19.jpg)
If you lose quorum the cluster may get split brain.
•
This cluster is finished. You must create a new one.
•
This is not cool.
etcd gotchas
Use an odd number of hosts.
•
Adding one to make an even number does not increase redundancy.
Use Elastic IPs.
•
If an instance reboots with a new IP it may fail to rejoin the cluster.
![Page 20: CoreOS, or How I Learned to Stop Worrying and Love Systemd](https://reader031.fdocuments.in/reader031/viewer/2022032616/55a7b00f1a28ab4f418b47be/html5/thumbnails/20.jpg)
… however, earlier today ...
![Page 21: CoreOS, or How I Learned to Stop Worrying and Love Systemd](https://reader031.fdocuments.in/reader031/viewer/2022032616/55a7b00f1a28ab4f418b47be/html5/thumbnails/21.jpg)
FLEETOpen-source distributed init system based on etcd.
Think of it as cluster-wide
systemd.
![Page 22: CoreOS, or How I Learned to Stop Worrying and Love Systemd](https://reader031.fdocuments.in/reader031/viewer/2022032616/55a7b00f1a28ab4f418b47be/html5/thumbnails/22.jpg)
Setting up a fleet cluster
Add fleet to the cloud-config#cloud-configcoreos: etcd: discovery: https://discovery.etcd.io/d88814387d940b36dbc2b4393c3d3a94
addr: $private_ip4:4001
peer-addr: $private_ip4:7001
fleet:
metadata: role=web,region=us-east-1,type=m3.medium
units:
- name: etcd.service
command: start
- name: fleet.service
command: start
![Page 23: CoreOS, or How I Learned to Stop Worrying and Love Systemd](https://reader031.fdocuments.in/reader031/viewer/2022032616/55a7b00f1a28ab4f418b47be/html5/thumbnails/23.jpg)
Using fleetctl
List machines in cluster
$ brew install fleetctl
$ fleetctl -tunnel 10.10.1.1 list-machinesMACHINE IP METADATA148a18ff-6e95-4cd8-92da-c9de9bb90d5a 10.10.1.1 -491586a6-508f-4583-a71d-bfc4d146e996 10.10.1.2 -c9de9451-6a6f-1d80-b7e6-46e996bfc4d1 10.10.1.3 -
![Page 24: CoreOS, or How I Learned to Stop Worrying and Love Systemd](https://reader031.fdocuments.in/reader031/viewer/2022032616/55a7b00f1a28ab4f418b47be/html5/thumbnails/24.jpg)
Launching containers with fleet
If a host goes down, fleet will reschedule units.
Fleet submits systemd unit files to the cluster, using etcd as backing-store.
Fleet-specific metadata controls scheduling of units.
![Page 25: CoreOS, or How I Learned to Stop Worrying and Love Systemd](https://reader031.fdocuments.in/reader031/viewer/2022032616/55a7b00f1a28ab4f418b47be/html5/thumbnails/25.jpg)
Example unit
[Unit]
Description=Hello world
After=docker.service
Requires=docker.service
[Service]
TimeoutStartSec=0
ExecStartPre=-/usr/bin/docker rm hello
ExecStartPre=/usr/bin/docker pull busybox
ExecStart=/usr/bin/docker run \
--name hello \
busybox /bin/sh -c "while true; do echo Hello World; sleep 1; done"
ExecStop=/usr/bin/docker stop hello
![Page 26: CoreOS, or How I Learned to Stop Worrying and Love Systemd](https://reader031.fdocuments.in/reader031/viewer/2022032616/55a7b00f1a28ab4f418b47be/html5/thumbnails/26.jpg)
Running our example unit
Load and start the unit
$ fleetctl -tunnel 10.10.1.1 start hello
$ fleetctl -tunnel 10.10.1.1 list-unitsUNIT MACHINE ACTIVE SUBhello.service c9de9451.../10.10.1.3 active running
$ fleetctl -tunnel 10.10.1.1 journal hello
hello
hello
$ fleetctl -tunnel 10.10.1.1 destroy hello
![Page 27: CoreOS, or How I Learned to Stop Worrying and Love Systemd](https://reader031.fdocuments.in/reader031/viewer/2022032616/55a7b00f1a28ab4f418b47be/html5/thumbnails/27.jpg)
Example global unit
[Unit]
Description=Hello world
After=docker.service
Requires=docker.service
[Service]
TimeoutStartSec=0
ExecStartPre=-/usr/bin/docker rm hello
ExecStartPre=/usr/bin/docker pull busybox
ExecStart=/usr/bin/docker run --name hello busybox /bin/sh -c "while
true; do echo Hello World; sleep 1; done"
ExecStop=/usr/bin/docker stop hello
[X-Fleet]
MachineMetadata=region=us-east-1
Global=true
Run on all instances with this fleet metadata
![Page 28: CoreOS, or How I Learned to Stop Worrying and Love Systemd](https://reader031.fdocuments.in/reader031/viewer/2022032616/55a7b00f1a28ab4f418b47be/html5/thumbnails/28.jpg)
Running a global unit
Load and start the unit
$ fleetctl -tunnel 10.10.1.1 start hello
$ fleetctl -tunnel 10.10.1.1 list-units
UNIT MACHINE ACTIVE SUB
hello.service 148a18ff.../10.10.1.1 active running
hello.service 491586a6.../10.10.1.2 active running
hello.service c9de9451.../10.10.1.3 active running
$ fleetctl -tunnel 10.10.1.1 destroy hello
![Page 29: CoreOS, or How I Learned to Stop Worrying and Love Systemd](https://reader031.fdocuments.in/reader031/viewer/2022032616/55a7b00f1a28ab4f418b47be/html5/thumbnails/29.jpg)
Fleet metadata
Option Description
Global Schedule on all units in the cluster
MachineID Schedule to one specific machine
MachineOf Limit to machines that are running specified unit
MachineMetadata Limit to machines with specific metadata
Conflicts Prevent from running on same machine as matching units
![Page 30: CoreOS, or How I Learned to Stop Worrying and Love Systemd](https://reader031.fdocuments.in/reader031/viewer/2022032616/55a7b00f1a28ab4f418b47be/html5/thumbnails/30.jpg)
Start a specific number of units
Refer to them in unit files using systemd templates.
Create a unit file like:[email protected]
Start specific instances named like: [email protected]@2.service
![Page 31: CoreOS, or How I Learned to Stop Worrying and Love Systemd](https://reader031.fdocuments.in/reader031/viewer/2022032616/55a7b00f1a28ab4f418b47be/html5/thumbnails/31.jpg)
Example template unit
[Unit]
Description=Hello world
After=docker.service
Requires=docker.service
[Service]
TimeoutStartSec=0
ExecStartPre=-/usr/bin/docker rm hello
ExecStartPre=/usr/bin/docker pull busybox
ExecStart=/usr/bin/docker run --name hello busybox /bin/sh -c "while
true; do echo Hello World; sleep 1; done"
ExecStop=/usr/bin/docker stop hello
[X-Fleet]
Conflicts=hello@*
Ensure there is only one of these on each instance
![Page 32: CoreOS, or How I Learned to Stop Worrying and Love Systemd](https://reader031.fdocuments.in/reader031/viewer/2022032616/55a7b00f1a28ab4f418b47be/html5/thumbnails/32.jpg)
Running template units
Start 2 instances
$ fleetctl -tunnel 10.10.1.1 start hello@{1..2}
$ fleetctl -tunnel 10.10.1.1 list-unitsUNIT MACHINE ACTIVE [email protected] c9de9451.../10.10.1.3 active [email protected] c9de9451.../10.10.1.1 active running
$ fleetctl -tunnel 10.10.1.1 journal hello@1
hello
hello
![Page 33: CoreOS, or How I Learned to Stop Worrying and Love Systemd](https://reader031.fdocuments.in/reader031/viewer/2022032616/55a7b00f1a28ab4f418b47be/html5/thumbnails/33.jpg)
To change a unit definition, you must destroy and restart it.
•
For global units this means the whole cluster.
•
Which means downtime.
fleet gotchas
Fleet does not do resource-based scheduling.
•
Intended as a low-level system to build more advanced systems on.
When moving units around you must do discovery to route traffic.
•
For example sidekick patterns and etcd-aware proxies.
![Page 34: CoreOS, or How I Learned to Stop Worrying and Love Systemd](https://reader031.fdocuments.in/reader031/viewer/2022032616/55a7b00f1a28ab4f418b47be/html5/thumbnails/34.jpg)
puppy break
Any questions so far?
![Page 35: CoreOS, or How I Learned to Stop Worrying and Love Systemd](https://reader031.fdocuments.in/reader031/viewer/2022032616/55a7b00f1a28ab4f418b47be/html5/thumbnails/35.jpg)
PATTERNSHow can I use CoreOS for real?
Here are three patterns I use in production today ...
![Page 36: CoreOS, or How I Learned to Stop Worrying and Love Systemd](https://reader031.fdocuments.in/reader031/viewer/2022032616/55a7b00f1a28ab4f418b47be/html5/thumbnails/36.jpg)
Simple homogeneous ops clusterThis is the most textbook “toy” cluster you will see in CoreOS docs.
It is suitable for all those random little internal tools that can tolerate brief downtime.
1
![Page 37: CoreOS, or How I Learned to Stop Worrying and Love Systemd](https://reader031.fdocuments.in/reader031/viewer/2022032616/55a7b00f1a28ab4f418b47be/html5/thumbnails/37.jpg)
Small cluster
Long-lived hosts run etcd.
Submit app to cluster, sidekick announces app.
Reverse proxy discovers app host from etcd.
![Page 38: CoreOS, or How I Learned to Stop Worrying and Love Systemd](https://reader031.fdocuments.in/reader031/viewer/2022032616/55a7b00f1a28ab4f418b47be/html5/thumbnails/38.jpg)
Sidekick units
When app goes down, sidekick removes key from etcd.
Sidekick unit sets etcd key for app container host:port when app starts. Write your own, calling etcdctl, or use something like github.com/gliderlabs/registrator
Reverse proxy or load-balancer container listens for changes in etcd keys. Reconfigures to proxy to app host:port.
Write config files with github.com/kelseyhightower/confd, or use etcd-specific proxy like github.com/mailgun/vulcand
![Page 39: CoreOS, or How I Learned to Stop Worrying and Love Systemd](https://reader031.fdocuments.in/reader031/viewer/2022032616/55a7b00f1a28ab4f418b47be/html5/thumbnails/39.jpg)
Etcd + workersGreat for low-traffic websites that need a couple of instances behind a load-balancer.
Works well with autoscaling.
2
![Page 40: CoreOS, or How I Learned to Stop Worrying and Love Systemd](https://reader031.fdocuments.in/reader031/viewer/2022032616/55a7b00f1a28ab4f418b47be/html5/thumbnails/40.jpg)
Etcd + workers
Elastic workers connect to etcd cluster and discover their units based on fleet metadata.
Works well with autoscaling + ELB.
![Page 41: CoreOS, or How I Learned to Stop Worrying and Love Systemd](https://reader031.fdocuments.in/reader031/viewer/2022032616/55a7b00f1a28ab4f418b47be/html5/thumbnails/41.jpg)
Immutable servers with no etcdWe use this for a high-traffic cluster of micro-services that demands very high availability and strict change control.
Systemd units are hard-coded into cloud-config with user-data.
Demands some orchestration such as autoscaling groups.
3
![Page 42: CoreOS, or How I Learned to Stop Worrying and Love Systemd](https://reader031.fdocuments.in/reader031/viewer/2022032616/55a7b00f1a28ab4f418b47be/html5/thumbnails/42.jpg)
Do not do OS updates.
Deploy code or OS update by changing launch config and replacing all hosts.
Immutable servers with no etcd
No etcd, no cluster.
Workers spun up by autoscaling.
Hard-code systemd units in launch config.
![Page 43: CoreOS, or How I Learned to Stop Worrying and Love Systemd](https://reader031.fdocuments.in/reader031/viewer/2022032616/55a7b00f1a28ab4f418b47be/html5/thumbnails/43.jpg)
LogsGet ‘em off the host ASAP.
github.com/gliderlabs/logspout is a tiny docker container that ships all other container output to udp/514.
Send to logstash/splunk/papertrail ...
![Page 44: CoreOS, or How I Learned to Stop Worrying and Love Systemd](https://reader031.fdocuments.in/reader031/viewer/2022032616/55a7b00f1a28ab4f418b47be/html5/thumbnails/44.jpg)
Monitoring
◦ AWS cloudwatch◦ newrelic for apps◦ newrelic-sysmond for instances◦ … but it doesn’t understand cgroups◦ datadog has better container
support◦ cadvisor presents container stats
over http
![Page 45: CoreOS, or How I Learned to Stop Worrying and Love Systemd](https://reader031.fdocuments.in/reader031/viewer/2022032616/55a7b00f1a28ab4f418b47be/html5/thumbnails/45.jpg)
Alternative operating systems
RancherOS: no systemd … system docker runs at PID 1: runs user docker container containing app containers
RedHat Project Atomic:rpm-ostree merges updates to read-only /usr and /var
Ubuntu Snappy Core: transactional updates with snappy packages.
![Page 46: CoreOS, or How I Learned to Stop Worrying and Love Systemd](https://reader031.fdocuments.in/reader031/viewer/2022032616/55a7b00f1a28ab4f418b47be/html5/thumbnails/46.jpg)
SchedulersFleet is intentionally simple. Build on it for more sophistication:
◦ Google’s Kubernetes◦ Apache Mesos/Marathon◦ paz.sh … PaaS based-on CoreOS◦ Deis … private heroku-like on CoreOS
It seems like something new pops up every day at the moment ...
![Page 47: CoreOS, or How I Learned to Stop Worrying and Love Systemd](https://reader031.fdocuments.in/reader031/viewer/2022032616/55a7b00f1a28ab4f418b47be/html5/thumbnails/47.jpg)
ok, I’m done
Any questions?
![Page 48: CoreOS, or How I Learned to Stop Worrying and Love Systemd](https://reader031.fdocuments.in/reader031/viewer/2022032616/55a7b00f1a28ab4f418b47be/html5/thumbnails/48.jpg)
Place your screenshot here
We’re hiring
DevOps
Ruby dev
UI/UX design
Product