Immutable Infrastructure With Docker and EC2 Docker Conf 2014 Michael Bryzek
Test Driven Infrastructure · • immutable infrastructure • treat your running environment as a...
Transcript of Test Driven Infrastructure · • immutable infrastructure • treat your running environment as a...
Test Driven Infrastructure
Jess My pronouns are he/him/his;
I identify as 'Ops'.
As Ops I make users happy keeping the service upby
As Ops I
make sure the site is availableensure resource efficiency
protect uptimetroubleshoot issues
keep an eye on service latency
validating responsesmonitoring the networkproviding canary environmentsmeasuring resource consumption practicing chaos engineeringstudying post-mortemsaggregating logs
by
user pages only display partial content -- what's wrong?
curl
curl
curl
mysql
aws s3 ls
Make a controlled change:curl
1) Observe the current state
2) Ask yourself how your change will manifest3) Make your change4) Validate change; Observe new state
5) Did your change impact all of, and only, what you wanted to change?
curl
OpsDev
OpsDev
photo by: Marcel Quinan
OpsDev
What we've learned• software development lifecycle
• central, controlled environment for repeatable infrastructure builds• immutable infrastructure
• treat your running environment as a black box: easier to version, easier to replace• a unit of composition that's easier to reason about because it's composition is known
and certain• version control
• time travel!• a shared repository allows everyone to see (and declare) what the state of the
environment should be• can compare intended configuration to the running environment to find discrepancies
• infrastructure-as-code• repeatable infrastructure
Test Driven Development
Write a failing test
Write enough code
to make it pass
Refactor
Write a failing test
Write enough code
to make it pass
Refactor
import unittest from mycode import * class MyFirstTests(unittest.TestCase): def test_hello(self): self.assertEqual(hello_world(), 'hello world')
def hello_world(): pass
def hello_world(): return 'hello world'
def hello_world(lang='en'): if (lang == 'en'): return 'hello world'
Write a failing test
Write enough code
to make it pass
Refactor
import unittest from mycode import * class MyFirstTests(unittest.TestCase): def test_hello_en(self): self.assertEqual(hello_world(), 'hello world') def test_hello_es(self, lang='es'): self.assertEqual(hello_world(), 'hola mundo') def hello_world(lang='en'):
if (lang == 'en'): return 'hello world'
def hello_world(lang='en'): if (lang == 'en'): return 'hello world' if (lang == 'es'): return 'hola mundo'
Write a failing test
Write enough code
to make it pass
Refactor
Make a controlled change:curl
1) Observe the current state
2) Ask yourself how your change will manifest3) Make your change4) Observe new state
5) Did your change impact all of, and only, what you wanted to change?
curl
Write a failing test
Write enough code
to make it pass
Refactor
$ curl -sko /dev/null -w @status_format \ https://example.com/path/to/microservice {'http_code': '404', 'time_total': '0.005953'}
$ curl -sko /dev/null -w @status_format \ https://example.com/path/to/microservice {'http_code': '200', 'time_total': '0.030772'}
$ curl -sko /dev/null -w @status_format \ https://example.com/path/to/microservice {'http_code': '200', 'time_total': '0.012402'}
rspec# https://puppet.com/blog/unit-testing-rspec-puppet-for-beginners require 'spec_helper' describe 'nginx' do let(:title) { 'nginx' } let(:node) { 'example.com' }
it { is_expected.to contain_package(‘nginx’).with(ensure: 'present') } it { is_expected.to contain_file(‘/var/www/index.html') .with( :ensure => 'file', :require => 'Package[nginx]', ) } it { is_expected.to contain_service(‘nginx') .with( :ensure => 'running', :enabled => true, ) } end
inspec# https://github.com/inspec/inspec/blob/master/examples/kitchen-chef/test/integration/default/web_spec.rb describe package('nginx') do it { should be_installed } end # extend tests with metadata control '01' do impact 0.7 title 'Verify nginx service' desc 'Ensures nginx service is up and running' describe service('nginx') do it { should be_enabled } it { should be_installed } it { should be_running } end end # implement os dependent tests web_user = 'www-data' web_user = 'nginx' if os[:family] == 'centos' describe user(web_user) do it { should exist } end
goss# https://github.com/aelsabbahy/goss # `goss validate` for a one-time check # `goss serve` for a local http endpoint port: tcp:22: listening: true ip: - 0.0.0.0 service: sshd: enabled: true running: true process: sshd: running: true
We have tools to unit test configuration management
kubernetes liveness probes# https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/ --- apiVersion: v1 kind: Pod spec: containers: - name: liveness image: k8s.gcr.io/liveness args: - /server livenessProbe: httpGet: path: /healthz port: 8080 httpHeaders: - name: Custom-Header value: Awesome initialDelaySeconds: 3 periodSeconds: 3
kubernetes readiness probes# https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/ --- apiVersion: v1 kind: Pod spec: containers: - name: liveness image: k8s.gcr.io/liveness args: - /server readinessProbe: httpGet: path: /healthz port: 8080 httpHeaders: - name: Custom-Header value: Awesome initialDelaySeconds: 5 periodSeconds: 5
kubernetes liveness probes w/goss--- apiVersion: v1 kind: Pod spec: containers: - name: goss_liveness image: goss_liveness livenessProbe: exec: command: [“goss”, “validate”, “-g”, “goss.yaml”] initialDelaySeconds: 3 periodSeconds: 3
We have tools for functional testing
Smoke Test Deployments$ cat Jenkinsfile ... stage("Validate") { steps { container('inspec') { // Initiate validation tests script { env.TARGET = “app-${params.CLUSTER}.example"
sh """ inspec exec validate/endpoints.rb """ } } } ...
Inspec HTTP Resourcetests = { ‘test site response’ => { 'host' => 'validate.example.com', 'path' => '/' }, } nginx_proxy = ENV[‘TARGET'] || ‘localhost’ # point at deployed location tests.each do |testname, testdata| control testname.to_s do impact 1.0 host = testdata['host'] path = testdata['path'] title "curl -k https://#{nginx_proxy}#{path} -H 'host: #{host}'" desc "#{path} with #{host} should work." describe http("http://#{nginx_proxy}#{path}", headers: { 'host' => host, 'User-Agent' => "jenkins" }) do its('status') { should cmp 200 } end end end
We have tools for functional testing
user pages only display partial content -- what's wrong?
curl
curl
curl
mysql
aws s3 ls
Write a failing test
Write enough code
to make it pass
Refactor
{ "query": "avg(last_5m):avg:http.2xx_responses{endpoint:location-service} by {host} / avg(last_5m):avg:http.total_responses{endpoint:location-service} < 0.95", "message": "Healthy response volume dropped\n@slack-demo-monitors-nonprod", "name": "Demo: response volume", "type": "metric alert" }
Infrastructure
$ # https://github.com/DataDog/datadogpy $ dog monitor show <monitor_id> # dumps json $ dog monitor show <monitor_id> > monitor_id.json $ dog monitor fupdate monitor_id.json
Implementation:
monitorfile$ ls -1aF ./ ../ .git/ .gitignore Dockerfile Jenkinsfile README.md app/ monitorfile
{ "tags": [ "app:demo", "endpoint:location-service", "environment:non-prod" ], "query": "avg(last_5m):avg:http.2xx_responses{endpoint:location-service} by {host} / avg(last_5m):avg:http.total_responses{endpoint:location-service} < 0.95", "message": "Healthy response volume dropped\n@slack-demo-monitors-nonprod", "name": "Demo: response volume", "type": "metric alert", "options": { "thresholds": { "critical": 0.95, "warning": 0.9 } } }
jenkinsfile$ ls -1aF ./ ../ .git/ .gitignore Dockerfile Jenkinsfile README.md app/ monitorfile
... stage(“configure monitor") { steps { container('datadog') { script { sh """ dog monitor fupdate monitorfile """ } } } } ...
We monitor services to understand their performance.
Picking appropriate monitors that represent that represent that service well creates a Service Level Indicator (SLI).
A Service Level Objective is simply stating what target levels we want for that SLI.
Service Level Agreements are published to users; they describe intentions for the service, and recourse for missed service.
SLOs with Datadog$ dog screenboard show k9m-b2s-df3 { "widgets": [ { "title_text": "Demo: order service error rate (non-2xx)", "source": "single_monitor", "type": "uptime", "showErrorBudget": true "sliType": "time", "monitorIds": [ 7117003 ], "timeframes": [ "7 days" ], "rules": { "0": { "threshold": 98, "color": "red", "timeframe": "7 days" } }, "scaleFactor": 1, ...
Automated Visibility
T E L S
T E L S
T E L S
T E L S
T E L S
New Dev• Immediately sees application Architecture• Immediately understands critical functionality for microservices• Immediately starts building a sense of scale/performance/load
Operability should be a design consideration —build testing in from the start