Anynines - Running Cloud Foundry for 12 months - An experience report
Anynines - Cloud Foundry on OpenStack - An Experience Report
-
Upload
anynines -
Category
Technology
-
view
196 -
download
3
description
Transcript of Anynines - Cloud Foundry on OpenStack - An Experience Report
Cloud Foundry on OpenStack An Experience Report
Introduction
about.me/fischerjulian
The anynines Stack
Hardware
OpenStack
Cloud Foundry
VMware
We migrated from a Rented VMware to a
self-hosted OpenStack.
For more details on this: http://rh.gd/a9vmw2sos
Things we had to think about
OpenStack Upgrades
Before Grizzly OpenStack was
not ready for production
• The upgrade process included a lot manual work
• No script driven upgrades
• Manual DB schema migrations
• Manual configuration file changes, etc.
„I scheduled a week of total downtime with all instances offline.“ - jon@jonproulx , http://rh.gd/1sNhiiz
Upcoming Upgrade Havanna > Icehouse
• Chef is used to roll-out Icehouse (incl. configuration changes)
• The upgrade is well tested on a separate multi-server OpenStack staging system
Goal: <30 min downtime.
Let’s cross fingers :)
Looking forward to rolling Upgrades with OpenStack
Icehouse http://rh.gd/1ymhViL
• No need to shutdown VMs during upgrade
• No downtime of the entire cloud
VM availability
What killes VMs?
• Random kernel panics (kernel bug) http://rh.gd/1oBUeCc
• Hardware outages (hw & power failures)
• …
Availability Zones
• Build disjunct networks, racks, etc.
• Each disjunct zone = availability zone
• Tell OpenStack about availability zones
• On provision you can choose the AZ
• Build Bosh releases accordingly
Aggregates
• Similar to AZ
• Not about failover
• Select hosts with certain attributes
• E.g. SSD-aggregate
• On provision choose host with SSD disks
Load Balancing
• Not inherently clustered
• LBaaS failover can be realized using
• Pacemaker/corosync
• GlusterFS (share LB configuration)
VM Failover Strategies
Resurrect
• Monitor VM
• Re-Build VMs automatically
• e.g. using Cloud Foundry Bosh
• + Easy
• - Takes long (minutes not seconds)
• - Open Stack doesn’t release persistent disks automatically
Failover to Standby VM
• Provide stand-by VM
• Monitor VM and perform failover
• e.g. using Pacemaker
• + Fast failover (seconds)
• - Pacemaker is not easy to use
• - Increased resource usage by stdby VM(s)
IP Failover
Three ways to failover IPs
Load Balancer
• + Fast
• + Easy (use lb weights)
• - LB becomes a bottleneck
• When OS supports HA Proxy (L3) this drawback can be eliminated
Floating-IPs
• + Easy
• + Fast
• - Only for public networking
NIC Re-attachment
• + No network bottleneck
• + No dependencies to other services
• - Slightly higher failover time (several seconds)
Implications for Cloud Foundry
Accept that VMs are ephemeral
Distribute CF components across OS availability zones
• 2 * UAA
• 2 * CC
• 2 * n * DEAs
• 2 * Health Manager
• …
UAA & CC DB =
SPOF
HA Postgres
• UAA and Cloud Controller database
• Single point of failure for Cloud Foundry
• Postgres not inherently clusterable > failover with standby vm
• Master/slave replication
• Pacemaker/corosync
• IP-Failover using NIC-reattachment
That’s half way towards a PostgreSQL CF Service
• Add a V2 Service Broker
• Add a provisioning logic
• Provision 2-node db cluster on cf create service postgres medium-cluster
CF Service Design
• Use clusterable services if possible
• Implement automatic failover if not
• Autoprovisioning using Bosh
• Organize self-healing
• (Semi-)Automatic recovery from degraded mode
Summary
• VMware’s high availability options are nice
• OpenStack helped us to save 50% costs
• OS is stable enough to run Cloud Foundry on top
• OS hardening is required and feasible
Open Source OpenStack and Open Source Cloud Foundry are SME’s best
friends!
Questions?
Thank you!
Preparing for disaster recovery
• Cinder Volume Snapshots
OpenStack Backups
OpenStack Swift
• Open source Amazon S3 replacement
• Object store with RESTful interface
• Scales horizontally to petabyte dimensions
• Fully redundant, highly available
• CF service > App Asset Storage
Coderequire "fileutils" require "find" require "fog" !class Blobstore def initialize(connection_config, directory_key, cdn=nil, root_dir=nil) @root_dir = root_dir @connection_config = connection_config @directory_key = directory_key @cdn = cdn end ! def local? @connection_config[:provider].downcase == "local" end ! def exists?(key) !file(key).nil? end