Terraform at Scale
-
Upload
calvin-french-owen -
Category
Engineering
-
view
50 -
download
1
Transcript of Terraform at Scale
Terraform at ScaleHashiconf
Calvin French-OwenCo-Founder of Segment
@calvinfo
September 7, 2016
💖
Scaling vectors
Complexity
People
Complexity
People
Complexity ❌
People
Complexity
✅
How do we move nimbly–while adding people?
This talk- Terraform at Segment- What makes “good” Terraform- What’s next
Terraform at Segment
By the numbers- 16 developers working with Terraform- 94 microservices- thousands of AWS resources
A year with TerraformDecember 2012 – Launch dayApril 2015 – Terraform first attempt (v1)November 2015 – Terraform “redux” (v2)
Before Terraform
😱
Terraform
Migrating to TerraformApril 2015
Migrating to Terraform
Migrating to Terraform1. AWS accounts per environment
dev stage prod old prodvpc peering
dev stage prod old prodvpc peering
managed by Terraform
Separate accounts- confidence to apply ‘at will’- test the waters without screwing up the old
account- any sort of ‘global’ configs are okay
Migrating to Terraform1. AWS accounts per environment2. Docker and ECS
Terraform: First Attempt
Terraform (our first attempt)├── Makefile├── README.md└── environments ├── dev ├── production └── stage
Terraform (our first attempt)├── Makefile├── README.md└── environments ├── dev ├── production └── stage
Terraform (our first attempt)environments/stage├── api.tf├── bastion.tf├── dns.tf├── elasticache.tf├── elbs.tf├── iam.tf├── outputs.tf├── redis.tf├── s3.tf├── terraform.tfstate├── terraform.tfvars└── vpc.tf
Terraform (our first attempt)resource "aws_ecs_task_definition" "app" { family = "app"
container_definitions = <<EOF[ { "cpu": 1024, "memory": 768, "environment": [ { "name": "NODE_ENV", "value": "stage" } ], "image": "segment/app:1.54.14", "name": "app", "portMappings": [ { "containerPort": 8000, "hostPort": 8000 } ] }]EOF}
Life was better
Life was better!Life was better…
Life was better!Life was better…
but notgood.
1. environment drift
Terraform first attempt├── Makefile├── README.md└── environments ├── ops ├── production └── stage
resource "aws_ecs_task_definition" "app" { family = "app"
container_definitions = <<EOF[ { "cpu": 1024, "memory": 768, "environment": [ { "name": "NODE_ENV", "value": "stage" } ], "image": "segment/app:1.54.14", "name": "app", "portMappings": [ { "containerPort": 8000, "hostPort": 8000 } ] }]EOF}
<= stage
resource "aws_ecs_task_definition" "app" { family = "app"
container_definitions = <<EOF[ { "cpu": 1024, "memory": 768, "environment": [ { "name": "NODE_ENV", "value": "stage" } ], "image": "segment/app:1.54.14", "name": "app", "portMappings": [ { "containerPort": 8000, "hostPort": 8000 } ] }]EOF}
<= stage
resource "aws_ecs_task_definition" "app" { family = "app"
container_definitions = <<EOF[ { "cpu": 1024, "memory": 3072, "environment": [ { "name": "NODE_ENV", "value": "production”, } ], "image": "segment/app:1.54.17", "name": "app", "portMappings": [ { "containerPort": 8000, "hostPort": 3000 } ] }]EOF}
prod =>
2. one massive local state
3. production drift
$ terraform plan –target=aws_elb.feels_so_easy
$ terraform plan –target=aws_elb.oh_no_what_have_we_done
Terraform Redux (v2)
Terraform v1 Problems1. massive shared state2. locally stored state3. drift between environments
Terraform v1 Problems1. massive shared state: split states2. locally stored state: remote state3. drift between environments: modules
v2: state management
core(vpc, networking, security groups, asgs)
auth api site db cdn
services
core(vpc, networking, security groups, asgs)
auth api site db cdn
services→
read
onl
y →
/** * Remote state. */
resource "terraform_remote_state" "state" { backend = "s3" config { bucket = "segment-ops" key = "terraform/${var.environment}/terraform.tfstate" }}
data "template_file" ”test" { template = "${file("${path.module}/init.tpl")}"
vars { zone_id = "${terraform_remote_state.state.zone_id}" }}
/** * Remote state. */
resource "terraform_remote_state" "state" { backend = "s3" config { bucket = "segment-ops" key = "terraform/${var.environment}/terraform.tfstate" }}
data "template_file" ”test" { template = "${file("${path.module}/init.tpl")}"
vars { zone_id = "${terraform_remote_state.state.zone_id}" }}
read only!
/** * Remote state. */
resource "terraform_remote_state" "state" { backend = "s3" config { bucket = "segment-ops" key = "terraform/${var.environment}/terraform.tfstate" }}
data "template_file" ”test" { template = "${file("${path.module}/init.tpl")}"
vars { zone_id = "${terraform_remote_state.state.zone_id}" }}
read only!
reference
v2: modules
Modules enforce configuration parity.
What makes good* Terraform?
*for some definitions of good
Docker AMIs by Packer
Service Config by Terraform
1. Variables2. Composition3. State4. Versioning
1. Variables- anything a user might want to override should be
a variable- use defaults liberally
1. Variablesresource "aws_instance" "bastion" { ami = "${module.ami.ami_id}" source_dest_check = false instance_type = "${var.instance_type}" subnet_id = "${var.subnet_id}" key_name = "${var.key_name}" vpc_security_group_ids = ["${split(",",var.security_groups)}"] monitoring = true tags { Name = "bastion" Environment = "${var.environment}" }}
configurableconfigurable
configurableconfigurable
configurable
1. Variablesresource "aws_instance" "bastion" { ami = "${module.ami.ami_id}" source_dest_check = false instance_type = "${var.instance_type}" subnet_id = "${var.subnet_id}" key_name = "${var.key_name}" vpc_security_group_ids = ["${split(",",var.security_groups)}"] monitoring = true tags { Name = "bastion" Environment = "${var.environment}" }}
configurableconfigurable
configurableconfigurable
configurable
non-configurablenon-configurable
non-configurable
1. Variablesresource "aws_instance" "bastion" { ami = "${module.ami.ami_id}" source_dest_check = ${var.source_dest_check} instance_type = "${var.instance_type}" subnet_id = "${var.subnet_id}" key_name = "${var.key_name}" vpc_security_group_ids = ["${split(",",var.security_groups)}"] monitoring = ${var.monitoring} tags { Name = "bastion" Environment = "${var.environment}" }}
2. Composition- build modules as you need them- it’s okay if not everything fits the abstraction
2. Composition – “full stack”module “stack” { source = “github.com/segmentio/stack” name = “my-stack” environment = “production”}
2. Composition – inside stackmodule "vpc" { source = "./vpc” …}
module "security_groups" { source = "./security-groups” …}
module "bastion" { source = "./bastion” …}
module "dhcp" { source = "./dhcp” …}
2. Composition – byo editionmodule “cluster” { source = “github.com/segmentio/stack//ecs-cluster”
environment = “prod” name = “cdn” vpc_id = “vpc-eff2eada” image_id = “ami-204faaf3”}
3. State management- separate core from services- states per service- use atlas or s3- use binary plans
core(vpc, networking, security groups, asgs)
auth api site db cdn
services→
read
onl
y →
4. Versioningmodule “stack” { source = “github.com/segmentio/stack?ref=v1.x”}
What’s next
What’s next- Applying in CI- Atlas- Data sources- Terraform generation
People
Complexity
✅
Fin
Prior ArtStack: github.com/segmentio/stackAtlas Examples: github.com/hashicorp/atlas-examples