Case Study: Lucidchart's Migration to VPC

21
www.lucidchart.com/jobs Case Study: Lucidchart's Migration to VPC by Matthew Barlocker

description

Originally presented at CloudConnect 2013 in Chicago, IL.

Transcript of Case Study: Lucidchart's Migration to VPC

Page 1: Case Study: Lucidchart's Migration to VPC

www.lucidchart.com/jobs

Case Study:

Lucidchart's Migration to VPC

by Matthew Barlocker

Page 2: Case Study: Lucidchart's Migration to VPC

www.lucidchart.com/jobs

“The Barlocker”

• Chief Architect at Lucid Software

Inc since 2011

• Bachelors in CS from BYU

• Managed data center,

Rackspace and AWS

deployments

• Love to play board games, go 4-

wheeling, wrestle my sons, and

fly airplanes

• nineofclouds.blogspot.com

Page 3: Case Study: Lucidchart's Migration to VPC

www.lucidchart.com/jobs

Why Lucid Chose VPC

• Same price as EC2 Classic

• Interoperability with existing AWS services

(S3, Route53, etc)

• New features like Internal ELBs and on-the-fly security

group changes

• Heightened security using only private IPs

Page 4: Case Study: Lucidchart's Migration to VPC

www.lucidchart.com/jobs

Other Benefits

• All ELBs have security groups

• Additional security layer with Network ACLs

• Elastic IPs stay associated with stopped instances

• VPN support for common hardware

• Reserved instances can be transferred between EC2

classic and VPC

Page 5: Case Study: Lucidchart's Migration to VPC

www.lucidchart.com/jobs

Drawbacks

• Cost & maintenance of NAT instance(s)

• Setup time

• New terminology

• VPN or SSH tunnel is required to access instances on

private subnets

• Internal DNS names are disabled by default

Page 6: Case Study: Lucidchart's Migration to VPC

www.lucidchart.com/jobs

Things You Should Know

• Instances in the public subnets must have an elastic IP to

communicate with the internet

• NAT instances are just normal instances that are

configured to be routers

• NAT instances must be in a public subnet

• Public & private subnets are defined by their route tables,

network ACLs, and DHCP options

Page 7: Case Study: Lucidchart's Migration to VPC

www.lucidchart.com/jobs

Migration Plan

Page 8: Case Study: Lucidchart's Migration to VPC

www.lucidchart.com/jobs

Migration Constraints

• EC2 cannot connect to private VPC servers

• Private VPC server connections must go through the NAT

instances

• EC2 & VPC have different security groups, load balancers,

autoscale groups

• EC2 & VPC share EBS volumes, snapshots, instance sizes,

zones, regions

Page 9: Case Study: Lucidchart's Migration to VPC

www.lucidchart.com/jobs

Migration Plan

• Move top layer first

• Move one layer at a time

• Meticulously manage security groups

• Move monitoring/utility servers last

• http://nineofclouds.blogspot.com/search/label/VPC

Page 10: Case Study: Lucidchart's Migration to VPC

www.lucidchart.com/jobs

Starting Layout

Page 11: Case Study: Lucidchart's Migration to VPC

www.lucidchart.com/jobs

Move Webservers First

Page 12: Case Study: Lucidchart's Migration to VPC

www.lucidchart.com/jobs

Move Next Layer

Page 13: Case Study: Lucidchart's Migration to VPC

www.lucidchart.com/jobs

Move Databases Next

Page 14: Case Study: Lucidchart's Migration to VPC

www.lucidchart.com/jobs

Top 5 Pain Points

Page 15: Case Study: Lucidchart's Migration to VPC

www.lucidchart.com/jobs

5. Setup & Terminology

• Took time to determine which VPC configuration we wanted

• Took time to troubleshoot network ACL and security group

issues

• It took us 3 days with 1 person

• We have not had to revisit the configuration since we got it

working

• Unavoidable

Page 16: Case Study: Lucidchart's Migration to VPC

www.lucidchart.com/jobs

4. Security Groups

• Private VPC instances communicate through the NAT

instances

• EC2 instances only see traffic from the NAT

• EC2 security groups were open to entire VPC

• Avoidable by doing 2 moves – one to public VPC, one to

private VPC

Page 17: Case Study: Lucidchart's Migration to VPC

www.lucidchart.com/jobs

3. VPN

• Highly available configuration supported for some

hardware

• We chose OpenVPN, which took 3 days to configure and

test properly

• Avoidable in a number of different ways

Page 18: Case Study: Lucidchart's Migration to VPC

www.lucidchart.com/jobs

2. MongoDB Election = Downtime

• MongoDB has an election process to determine primary

and secondaries

• To elect a primary, a majority of servers must vote

• Because EC2 cannot speak to VPC, we had to move each

server to the public subnet, and then to the private

afterward

• During move from public to private, MongoDB died for 15

minutes

• Avoidable by not using MongoDB

Page 19: Case Study: Lucidchart's Migration to VPC

www.lucidchart.com/jobs

1. NAT Bandwidth

• The traffic between private VPC and EC2 exceeded the

capacity for our NAT instances

• Requests timed out as throughput maxed out

• Downtime of 30 minutes on some services

• Completely avoidable! During the migration, increase size

of NAT instances. Decrease after the migration is done.

Page 20: Case Study: Lucidchart's Migration to VPC

www.lucidchart.com/jobs

Thank You

Page 21: Case Study: Lucidchart's Migration to VPC

www.lucidchart.com/jobs