www.nimbusproject.org
Infrastructure Clouds for Science and
Education: Platform Tools
1 7/16/2012
Kate Keahey, Renato J. Figueiredo, John Bresnahan, Mike Wilde, David LaBissoniere
Argonne National Laboratory
Computation Institute, University of Chicago
University of Florida
www.nimbusproject.org
The Power of Infrastructure Clouds
Virtualization opens the flood gates
7/16/2012 2
• Outsourcing
• Virtual appliances
– Freeze your stack in time
– Run it anywhere
• Multi-cloud applications
– Run many copies all over the world!
• Elasticity
www.nimbusproject.org
Harnessing The Power
• Organization tools and techniques
7/16/2012 3
www.nimbusproject.org
Towards a Power Adapter
7/16/2012 4
www.nimbusproject.org
What Needs To Be Harnessed
• VM (appliance) creation and development – configuration management tools (chef, puppet)
• VM hypervisors – Infrastructure-as-a-Service (IaaS)
• Cloud applications – virtual clusters, cloudinit.d, CloudFormation,
• Elasticity – Auto-scaling tools, phantom
• Workflow – Swift, etc
7/16/2012 5
www.nimbusproject.org
What Needs To Be Organized?
• VM (appliance) creation and development – configuration management tools (chef, puppet)
• VM hypervisors – Infrastructure-as-a-Service (IaaS)
• Cloud applications – virtual clusters, cloudinit.d, CloudFormation,
• Elasticity – Auto-scaling tools, phantom
• Workflow – Swift, etc
7/16/2012 6
www.nimbusproject.org
VM Applications
• An entire system frozen in time
– Full software stacks (versions)
– Configuration files
– Important for science!
• A dedicated modular service
– Web service, database, AMQP node, etc
• Demos
• A binary single file (or set of files)
– Easy to freeze
7/16/2012 7
www.nimbusproject.org
Developing Appliances
• A single binary image?
– Many developers?
– Version control?
– Merging conflicts?
• Base image with a description
– Ex: Ubuntu 11.04 base images plus a set of scripts
• Configuration Management Software
– Chef, Puppet, FG Rain, etc
7/16/2012 8
www.nimbusproject.org
• Software stack description – ruby and json
• A library of cookbooks
• Cookbooks contain recipes – Ex: apache2 server with php4
• Attributes to customize each recipe – Ex: on what port will apache listen
• Templates for configuration files
• Appliance developers make recipes – Version control can be done with git/svn/cvs…
7/16/2012 9
Chef
www.nimbusproject.org
Example Recipe
7/16/2012 10
app_dir = node[:appdir] ve_dir = node[:virtualenv][:path] git app_dir do repository node[:autoscale][:git_repo] reference node[:autoscale][:git_branch] action :sync user node[:username] group node[:groupname] end execute "run install" do cwd app_dir user node[:username] group node[:groupname] command "python setup.py install" end
www.nimbusproject.org
Example Template
7/16/2012 11
phantom: system: type: epu rabbit: <%= node[:autoscale][:rabbit_host] %> rabbit_port: <%= node[:autoscale][:rabbit_port] %> rabbit_ssl: False rabbit_user: <%= node[:autoscale][:rabbit_username] %> rabbit_pw: <%= node[:autoscale][:rabbit_password] %> rabbit_exchange: <%= node[:autoscale][:rabbit_exchange] %> authz: type: sqldb dburl: <%= node[:autoscale][:dburl] %>
phantom: system: type: epu rabbit: vm-102.uc.futuregrid.org rabbit_port: 5672 rabbit_ssl: False rabbit_user: XXX rabbit_pw: PPPPPP rabbit_exchange: default_dashi_exchange authz: type: sqldb dburl: mysql://nimbus:[email protected]/testphantom
www.nimbusproject.org
What Needs To Be Organized?
• VM (appliance) creation and development – configuration management tools (chef, puppet)
• VM hypervisors – Infrastructure-as-a-Service (IaaS)
• Cloud applications – virtual clusters, cloudinit.d, CloudFormation,
• Elasticity – Auto-scaling tools, phantom
• Workflow – Swift, etc
7/16/2012 12
www.nimbusproject.org
Cloud Applications
• More than 1 VM needed for the job
• Information exchange is needed
– Manual information exchange
• Multi-cloud
– Cloud independence required
7/16/2012 13
Web Server database
Web Web Web Server
nginx
Web Servers
www.nimbusproject.org
Cloud Management Tools
• Architecture description
– VM type, location, count
– Volumes
– Networks
– Other services
• Contextualization
– Exchange dynamically determined information • IP addrs, security information.
– Bootstrap component connections • Ex: mount NFS, connect to DB, etc
7/16/2012 14
www.nimbusproject.org
A Simplified Deployment Scenario
7/16/2012 15
www.nimbusproject.org
A Grid in Your Pocket…
7/16/2012 16
Pierre
EC2
www.nimbusproject.org
A Grid in Your Pocket…
7/16/2012 17
Jamie
EC2 OOI private cloud
Pierre
www.nimbusproject.org 7/16/2012 18
Jamie David
EC2 OOI private cloud FutureGrid
A Grid in Your Pocket…
Pierre
www.nimbusproject.org
CloudFormation
• Assemble AWS services
– Run AMIs.
– Connect EBS volumes to AMIs
– Associate and SQS queue, etc
• JSON descriptions
• AWS only
• No configuration management software integration
– Manual integration with Chef
7/16/2012 19
www.nimbusproject.org
cloudinit.d
• Multicloud VM dependency management – Uses the libcloud abstraction library
• Integrated with chef solo
• ini file format descriptions – Coupled with any executable script
• Launch plan end-users/operators – Lightweight
– Copy launch plan and “one click” action
– Easily reconfigured for various clouds
• Launch plan/application developers: – Minimal software assumptions (ssh)
– “Stem cell” deployment approach
– Incremental launch plan development
7/16/2012 20
[svc-alamoHTTP]
iaas_key: XXXXXX
iaas_secret: XXXX
iaas_host: alamo.futuregrid.org
iaas_port: 8443
iaas: Nimbus
image: ubunut10.10
ssh_username: ubuntu
localsshkeypath: ~/.ssh/fg.pem
readypgm: http-test.py
bootpgm: http-boot.sh
www.nimbusproject.org
cloudinit.d Overview
• Services
• Run Levels
– Collections of
services without
dependencies on
each other
• Launch Plan
– An ordered set of
run levels
7/16/2012 21
www.nimbusproject.org
Cloudinit.d Features
7/16/2012 22
database
Web Server Web Server Web Server
• Repeatability: write a launch plan once, deploy many times
Launch plan
www.nimbusproject.org
Cloudinit.d Features
7/16/2012 23
database
Web Server Web Server Web Server
• Deploy on cloud and non-cloud resources from many providers
Launch plan
www.nimbusproject.org
Cloudinit.d Features
7/16/2012 24
database
Web Server Web Server Web Server
• Coordination of interdependent launches
Launch plan
Ru
n-level 1
R
un
-level 2
www.nimbusproject.org
Cloudinit.d Features
7/16/2012 25
database
Web Server Web Server Web Server
Launch plan
Ru
n-level 1
R
un
-level 2
• User-defined launch tests
www.nimbusproject.org
Cloudinit.d Features
7/16/2012 26
database
Web Server Web Server Web Server
Launch plan
Ru
n-level 1
R
un
-level 2
• Test-based monitoring and repair
www.nimbusproject.org
Cloudinit.d Features
7/16/2012 27
database
Web Server Web Server Web Server
Launch plan
Ru
n-level 1
R
un
-level 2
• Test-based monitoring and repair
www.nimbusproject.org
Cloudinit.d Iaas
Interface
A Single Service Application Boot
Infrastructure Cloud
Request a new VM
Check Status New VM
sshd Verify ssh works
bootpgm
Run the boot program….
VM HTTP Server readypgm
Run the ready program…
If the has a successful exit code (0), then the new simple cloud application is set to go!
The VM is running Now the VM has been contextualized to be a web server
scp over the boot contextualization program…
scp over the ready program
Poll the IaaS service to determine when the VM is running…
sshd needs to startup and be accessible on the new VM
Here we show how cloudinit.d automatically creates a HTTP server from a simple distribution base image
www.nimbusproject.org
What Needs To Be Organized?
• VM (appliance) creation and development – configuration management tools (chef, puppet)
• VM hypervisors – Infrastructure-as-a-Service (IaaS)
• Cloud applications – virtual clusters, cloudinit.d, CloudFormation,
• Elasticity – Auto-scaling tools, phantom
• Workflow – Swift, etc
7/16/2012 29
www.nimbusproject.org
Escalation Pattern
7/16/2012 30
Operational Units
User Domain (configuration and security)
Domain Management: Monitor and regulate domain properties based on
system-specific and application-specific metrics
• Challenge: leverage on-demand, large but unreliable provider pool – Applications that absorb resources
– Applications that tolerate failures
www.nimbusproject.org
Scaling Considerations
• Reasons to scale – Business vs science
• Cost vs quota
• Lossy environment – VMs fail more often than bare metal
– N preserving
• Spot instances – If the price is right
• Backfill – If resources are idle
7/16/2012 31
www.nimbusproject.org
Amazon Auto Scaling and CloudWatch
• Auto Scaling in EC2 – Policies to scale up and down servers
• Min, Max, and desired size
• Integrated with AWS CloudWatch Sensors – Triggers
– CPU load, disk capacity, load balancer loads,etc
– Custom sensors
• No contextualization
• REST API
• AWS only
7/16/2012 32
www.nimbusproject.org
Phantom Scaling Services • Multi-cloud
– Fail-over and even distribution policies
• Monitor scaling factors and failures – Generic/system qualities: deployment status,
load, bank account, etc.
– Application-specific qualities, e.g., a workload queue for ALiEn, PBS, AMQP, and others
• Evaluate against policies
• Scale and/or recover – For user components
– For system components
– Across different cloud providers
• Release as a Service
• 0.1 running on FutureGrid now – Initially available as a service on FutureGrid
resources
– Provides high availability
7/16/2012 33
Sensor information
Reliably provision, manage and contextualize resources
Apply Policy
www.nimbusproject.org
Infrastructure Platform Goals • Multi-cloud
– Work across private, community and commercial clouds
• Any Scale – Scale in response to a diverse set of sensors/triggers
– Both system and application sensors
• High Availability – “Any VM can die”: system or user VMs
– Minimizing time to recovery (TTR)
• Your Polices, Our Enactment – User-defined sensors/triggers and policies
• Engineered from the ground up to work with infrastructure clouds
• Easy on the user
7/16/2012 34
www.nimbusproject.org
How Can Science Plug Into This Power
Example Embarrassingly Parallel
Scientific Application
Demonstration
7/16/2012 36
www.nimbusproject.org
…
M subtask messages
Task Queue
Application Start the workers
Using Nimbus Domains
www.nimbusproject.org
…
Preserve N worker VMs
M subtask messages
Cumulus/S3
Message Queue
“N preserving” policy
Infrastructure Compute Cloud
Get task
Results/Checkpoints
Application Start the workers
Using Nimbus Domains
www.nimbusproject.org
Phantom Architecture
7/16/2012 39
MySQL
nginx
REST HTTPS
Web Application HTTPS REST Service
Web Application
FutureGrid Clouds
RabbitMQ EPUM
Provisioner
DTRS
Zookeeper Cluster
REST Service REST Service IaaS
Clouds
www.nimbusproject.org
Adventures in Availability
• Time to scale (TTS)
– PENDING (request)
– STARTED (deployment)
– RUNNING
(contextualization)
7/16/2012 40
TTS: preliminary results for 2,000 VMs provisioned on AWS EC2
www.nimbusproject.org
Application adaptation:
Applications
7/16/2012 41
Infrastructure Platform Contextualization, multi-cloud bridge, repeatable launches, scaling, elasticity
and High Availability
Schedulers
Elastic MapReduce
Workflow Systems (Swift) Data Transfer Systems
Science Gateways Custom Applications (OOI)
Library of generic sensors
Application-specific sensors
Policies Decision Engine
Top Related