Windows Azure Compute
description
Transcript of Windows Azure Compute
![Page 1: Windows Azure Compute](https://reader035.fdocuments.in/reader035/viewer/2022062411/5681685d550346895dde98aa/html5/thumbnails/1.jpg)
Windows Azure ComputeBrad Calder
General ManagerWindows Azure
![Page 2: Windows Azure Compute](https://reader035.fdocuments.in/reader035/viewer/2022062411/5681685d550346895dde98aa/html5/thumbnails/2.jpg)
Usage
Com
pu te
Time
Average
InactivityPeriod
“On and Off “ RiskMetrics
On and off workloads (e.g. batch job)Over provisioned capacity is wasted
Com
pu te
Time
“Unpredictable Bursting“
Average Usage
Unexpected/unplanned peak in demand Sudden spike impacts performance Hard/costly to over provision for extreme cases
Average Usage
Com
pu te
Time
“Growing Fast“Docs.com on Facebook
Successful services needs to grow/scale Keeping up w/growth is big IT challenge Can be hard to predict growth
Com
pu te
Time
Average Usage
“Predictable Bursting“Walmart
Services with micro seasonality trends Peaks due to periodic increased demandWasted capacity off season
Large-Scale Workload Patterns
![Page 3: Windows Azure Compute](https://reader035.fdocuments.in/reader035/viewer/2022062411/5681685d550346895dde98aa/html5/thumbnails/3.jpg)
12:00 AM 1:54 AM 3:48 AM 5:42 AM 7:36 AM 9:30 AM11:24 AM 1:18 PM 3:12 PM 5:06 PM 7:00 PM 8:54 PM 10:48 PM
Japan Great Britain
BING SEARCHES – JAPAN VS. UK
Source: Microsoft
Computing Demand Daily FluctuationQu
ery
Volu
me
![Page 4: Windows Azure Compute](https://reader035.fdocuments.in/reader035/viewer/2022062411/5681685d550346895dde98aa/html5/thumbnails/4.jpg)
• turbotax.com • taxcut.com• hrblock.com • taxact.com
Source: Alexa
~4x normal load(Holiday shopping)
~10x normal load(Tax season)
• target.com • walmart.com• toysrus.com • barnesandnoble.com
Jan 2009 Jan 2010 Jan 2009 Jan 2010Source: Alexa
Computing Demand Yearly Variability
![Page 5: Windows Azure Compute](https://reader035.fdocuments.in/reader035/viewer/2022062411/5681685d550346895dde98aa/html5/thumbnails/5.jpg)
Time
Dem
and
What is a “Cloud”?• Cloud: on-demand, scalable, compute and storage
resources
TimeDe
man
dSelf Server Provisioning Cloud Provisioning
OverprovisionedUnderprovisioned
![Page 6: Windows Azure Compute](https://reader035.fdocuments.in/reader035/viewer/2022062411/5681685d550346895dde98aa/html5/thumbnails/6.jpg)
What is Under the Covers of a Service
Business logic
Datacenter (Power and Cooling)
Respond to hardware failures
Monitoring and alerting infrastructureReliable/Secure storage and computation
Metering and billing infrastructureLive upgrades and OS patches
Add compute/storage capacity on the flyOverprovision for peak traffic
Service “glue”
…
Buy and provision hardware
![Page 7: Windows Azure Compute](https://reader035.fdocuments.in/reader035/viewer/2022062411/5681685d550346895dde98aa/html5/thumbnails/7.jpg)
What is Windows Azure?An operating system for the cloud:
….Service 1 Service 2 Service NService 3
……
![Page 8: Windows Azure Compute](https://reader035.fdocuments.in/reader035/viewer/2022062411/5681685d550346895dde98aa/html5/thumbnails/8.jpg)
Cloud Terminology• Infrastructure as a Service (IaaS):
basic compute and storage resources• On-demand servers• Amazon EC2, VMWare vCloud, etc
• Platform as a Service (PaaS): cloud application infrastructure• On-demand application-hosting environment• Google AppEngine, Salesforce.com, Windows Azure, etc
• Software as a Service (SaaS): cloud applications• On-demand applications• Office 365, GMail, etc
![Page 9: Windows Azure Compute](https://reader035.fdocuments.in/reader035/viewer/2022062411/5681685d550346895dde98aa/html5/thumbnails/9.jpg)
Operating System
Operating System
VM
WebServer
Operating System
VM
DBMS
2) Choose image, then create and configure VM(s) for
application
1) Choose image, then
create VM for DBMS and
configure DBMS
IaaS
Library
VM Images
Developer/Ops
ApplicationDataLoad
Balancer
5) Config
ure load
balancer
6) Manage VMs and DBMS (e.g.,
deploying new OS images in VMs)
3) Provision database,
then create tables and add data
4) Install
application
![Page 10: Windows Azure Compute](https://reader035.fdocuments.in/reader035/viewer/2022062411/5681685d550346895dde98aa/html5/thumbnails/10.jpg)
Operating System
Operating System
VM
Operating System
VM
DBMS
PaaS Developer/Ops
ApplicationDataLoad
Balancer
2) Deploy applicati
on w/ service model
WebServer
1) Provision database,
then create tables and add data
3) Automated Service Managem
ent
![Page 11: Windows Azure Compute](https://reader035.fdocuments.in/reader035/viewer/2022062411/5681685d550346895dde98aa/html5/thumbnails/11.jpg)
Windows Azure• Windows Azure is an OS for the data center• Handles resource management, provisioning, and
monitoring• Manages application lifecycle• Allows developers to concentrate on business logic
• Provides common building blocks for distributed applications• Reliable queuing• Simple unstructured and structured storage• SQL storage• Application services like access control, caching, and
connectivity
![Page 12: Windows Azure Compute](https://reader035.fdocuments.in/reader035/viewer/2022062411/5681685d550346895dde98aa/html5/thumbnails/12.jpg)
Windows Azure Platform
Fabric Controller Windows Azure Networking
AppFabric Caching
AppFabric Access Control Server
SQL Azure
AppFabric Service Bus
WindowsAzure
Compute
WindowsAzure
Middleware Services
Windows Azure Applications
Windows Azure Storage
Windows Azure CDN
WindowsAzure
Data Services
![Page 13: Windows Azure Compute](https://reader035.fdocuments.in/reader035/viewer/2022062411/5681685d550346895dde98aa/html5/thumbnails/13.jpg)
• Owns all the hardware in the data center• Uses the inventory to host services• Similar to what a per machine operating system
does with applications• Provisions the hardware as necessary• Maintains the health of the hardware• Deploys applications to free resources• Maintains the health of those applications
Fabric Controller
![Page 14: Windows Azure Compute](https://reader035.fdocuments.in/reader035/viewer/2022062411/5681685d550346895dde98aa/html5/thumbnails/14.jpg)
Windows Azure Fabric Controller
Highly-availableFabric Controller
Hardware control Software control
WS08 Hypervisor
VMVM
VM
Fabric
Agent
Switches
Load-balancers
![Page 15: Windows Azure Compute](https://reader035.fdocuments.in/reader035/viewer/2022062411/5681685d550346895dde98aa/html5/thumbnails/15.jpg)
Scaling with the Fabric Controller Service Model
![Page 16: Windows Azure Compute](https://reader035.fdocuments.in/reader035/viewer/2022062411/5681685d550346895dde98aa/html5/thumbnails/16.jpg)
Scaling• There are two basic scaling models:
Compute
Compute
Compute
Compute
Scale Up Scale Out
Compute
![Page 17: Windows Azure Compute](https://reader035.fdocuments.in/reader035/viewer/2022062411/5681685d550346895dde98aa/html5/thumbnails/17.jpg)
Scaling Lessons• Use few, well-defined scaling units• Define scaling boundaries• Scale out those units as needed
![Page 18: Windows Azure Compute](https://reader035.fdocuments.in/reader035/viewer/2022062411/5681685d550346895dde98aa/html5/thumbnails/18.jpg)
Scale-Out ApplicationsNetwork Load Balancer
Stateless ‘Worker’
Stateless Front End
Shared Filesystem
(Azure Blobs)
Partitioned RDBMS
(SQL Azure)
Key/ValueDatastore
(Azure Tables)
AzureQueues
Scale Out
Scale Out
AlreadyProvided ScalableStorage
![Page 19: Windows Azure Compute](https://reader035.fdocuments.in/reader035/viewer/2022062411/5681685d550346895dde98aa/html5/thumbnails/19.jpg)
The Windows Azure Service Model• A Windows Azure application is called a “service”• Definition information• Configuration information• At least one “role”
• A role is the scaling boundary withina service• Roles are like DLLs in the service “process”• Collection of code with an entry point
that runs in its own virtual machine• Virtual machine is scale unit • Role code runs in a virtual machine • Role scales by instances of a virtual machine size
LB
Durable
Store
Front End
Middle Tier
![Page 20: Windows Azure Compute](https://reader035.fdocuments.in/reader035/viewer/2022062411/5681685d550346895dde98aa/html5/thumbnails/20.jpg)
Multi-Tier Cloud Application• A cloud application is typically made up of different
components• Front end: e.g. load-balanced stateless web servers• Middle worker tier: e.g. order processing, encoding• Backend storage: e.g. Azure Blobs, Azure Tables, SQL
Azure• Multiple instances of each for scalability and availability• Requires at least 2 instances of each to achieve the SLA
Front-End
Cloud Application
Front-End
HTTP/HTTPS
Windows
AzureStorag
e,SQL
Azure
Load Balancer Middle-
Tier
![Page 21: Windows Azure Compute](https://reader035.fdocuments.in/reader035/viewer/2022062411/5681685d550346895dde98aa/html5/thumbnails/21.jpg)
Service Model and Role Contents• Definition:
• Role name• Role type • VM size (e.g. small, medium, etc.)• Network endpoints
• Configuration:• Number of instances• Number of update and fault domains
• Code: • Web/Worker Role: Hosted DLL
and other executables• VM Role: VHD
Service ModelRole: Front-End
DefinitionType: WebVM Size: SmallEndpoints: External-1ConfigurationInstances: 2Update Domains: 3Fault Domains: 2
Role: Middle-Tier
DefinitionType: WorkerVM Size: LargeEndpoints: Internal-1ConfigurationInstances: 3Update Domains: 3Fault Domains: 2
![Page 22: Windows Azure Compute](https://reader035.fdocuments.in/reader035/viewer/2022062411/5681685d550346895dde98aa/html5/thumbnails/22.jpg)
Service Model Files• Service definition is in ServiceDefinition.csdef
• Service configuration is in ServiceConfiguration.cscfg
• CSPack program Zips service binaries and definition into Service Package File (service.cscfg)
![Page 23: Windows Azure Compute](https://reader035.fdocuments.in/reader035/viewer/2022062411/5681685d550346895dde98aa/html5/thumbnails/23.jpg)
ServiceDefinition.csdef<?xml version="1.0" encoding="utf-8"?><ServiceDefinition name="Sample" xmlns="http://schemas.microsoft.com/ServiceHosting/2008/10/ServiceDefinition" upgradeDomainCount=“3"> <WorkerRole name="Middle-Tier" vmsize="Large"> <Endpoints> <InternalEndpoint name="Internal-1" protocol="tcp" /> </Endpoints> </WorkerRole> <WebRole name="Front-End" vmsize="Small"> <Sites> <Site name="Web"> <Bindings> <Binding name="Endpoint1" endpointName="External-1" /> </Bindings> </Site> </Sites> <Endpoints> <InputEndpoint name="External-1" protocol="http" port="80" /> </Endpoints> <Imports> <Import moduleName="Diagnostics" /> </Imports> </WebRole></ServiceDefinition>
![Page 24: Windows Azure Compute](https://reader035.fdocuments.in/reader035/viewer/2022062411/5681685d550346895dde98aa/html5/thumbnails/24.jpg)
ServiceConfiguration.cscfg<?xml version="1.0" encoding="utf-8"?><ServiceConfiguration serviceName="Sample" xmlns="http://schemas.microsoft.com/ServiceHosting/2008/10/ServiceConfiguration" osFamily=“2"
osVersion="*"> <Role name="Middle-Tier"> <Instances count="3" /> </Role> <Role name="Front-End"> <Instances count="2" /> <ConfigurationSettings> <Setting name="Microsoft.WindowsAzure.Plugins.Diagnostics.ConnectionString" value="DefaultEndpointsProtocol=https;AccountName=myaccount;AccountKey=key" /> </ConfigurationSettings> </Role></ServiceConfiguration>
![Page 25: Windows Azure Compute](https://reader035.fdocuments.in/reader035/viewer/2022062411/5681685d550346895dde98aa/html5/thumbnails/25.jpg)
Windows Azure Push-button Deployment• Step 1: Allocate VMs/nodes,
VDIPs/VIPs• Across fault domains• Across update domains
• Step 2: Place role images on nodes
• Step 3: Start roles in VM instances
• Step 4: Configure load-balancers• Step 5: Maintain desired number
of role instances• Failed roles automatically
restarted• Node failure results in new VMs
automatically allocated
Allocation across fault and update domains
Load-balancers
![Page 26: Windows Azure Compute](https://reader035.fdocuments.in/reader035/viewer/2022062411/5681685d550346895dde98aa/html5/thumbnails/26.jpg)
• Windows Azure FC monitors the health of roles• FC detects if a role dies• Restart the role to bring it back to a healthy state
• If a failed node can’t be recovered, FC migrates role instances to a new node• A suitable replacement location is found• Existing role instances are notified of the
configuration change
FC Automated Management
![Page 27: Windows Azure Compute](https://reader035.fdocuments.in/reader035/viewer/2022062411/5681685d550346895dde98aa/html5/thumbnails/27.jpg)
Availability andFault/Upgrade Domains
![Page 28: Windows Azure Compute](https://reader035.fdocuments.in/reader035/viewer/2022062411/5681685d550346895dde98aa/html5/thumbnails/28.jpg)
Availability Service Level Agreements (SLA)
• Windows Azure Platform SLAs:• Compute External Connectivity: 99.95% (2 or more
instances)• Storage Availability: 99.9%• SQL Azure Availability: 99.9%
Availability % Downtime per year Downtime per month* Downtime per week
99% ("two nines") 3.65 days 7.20 hours 1.68 hours99.9% ("three nines") 8.76 hours 43.2 minutes 10.1 minutes99.99% ("four nines") 52.56 minutes 4.32 minutes 1.01 minutes99.999% ("five nines") 5.26 minutes 25.9 seconds 6.05 seconds
![Page 29: Windows Azure Compute](https://reader035.fdocuments.in/reader035/viewer/2022062411/5681685d550346895dde98aa/html5/thumbnails/29.jpg)
Maintaining Availability: Assume Failure• Hardware fails • 3-5% of servers experience failures annually
• Software fails• Inevitable in any evolving, complex system
• Tolerating failure means:• Redundancy where possible• Need to build in retries and backoff• Fast recovery• Big red buttons
![Page 30: Windows Azure Compute](https://reader035.fdocuments.in/reader035/viewer/2022062411/5681685d550346895dde98aa/html5/thumbnails/30.jpg)
Hardware Redundancy
TOR
LB LBAgg
PDU
LB LBAgg LB LB
Agg LB LB
Agg LB LB
Agg LB LB
Agg
Racks
Datacenter
RoutersAggregation Routers and
Load Balancers
TOR
PDU
TOR
PDU
TOR
PDU
TOR
PDU
TOR
PDU
TOR
PDU
TOR
PDU
TOR
PDU
TOR
PDU
TOR
PDU
TOR
PDU
TOR
PDU
TOR
PDU
TOR
PDU
……… … …
Top of RackSwitches
Power Distribution
Units
…Node
s
Node
s
Node
s
Node
s
Node
s
Node
s
Node
s
Node
s
Node
s
Node
s
Node
s
Node
s
Node
s
Node
s
Node
s
Top of Rack Switch is a Single Point of Failure
![Page 31: Windows Azure Compute](https://reader035.fdocuments.in/reader035/viewer/2022062411/5681685d550346895dde98aa/html5/thumbnails/31.jpg)
Maintaining Availability: Fault Domains• Avoid single points of failure• Unit of failure based on
data center topology is a rack• E.g. top-of-rack switch on a rack of
machines• Windows Azure considers
fault domains when allocating service roles• At least 2 fault domains per service• Will try and spread roles out across
more
Front-End-1
Fault Domain 1
Fault Domain
2
Front-End-2
Middle Tier-2
Middle Tier-1
Fault Domain 3
Middle Tier-3
Front-End-1
Middle Tier-1
Front-End-
2Middl
e Tier-
2
Middle
Tier-3
![Page 32: Windows Azure Compute](https://reader035.fdocuments.in/reader035/viewer/2022062411/5681685d550346895dde98aa/html5/thumbnails/32.jpg)
• Update domains specifies what percentage of your service you will take offline for an upgrade• Specify the # of update domains for
your service• Default is 5 and max is 20
• Roles are evenly assigned an update domain
• Used to update only one domain at a time• Rolling update
Update Domains
Upgrade domains
allocated across fault domains
Fault domains
![Page 33: Windows Azure Compute](https://reader035.fdocuments.in/reader035/viewer/2022062411/5681685d550346895dde98aa/html5/thumbnails/33.jpg)
Service Deployment and Maintenance
![Page 34: Windows Azure Compute](https://reader035.fdocuments.in/reader035/viewer/2022062411/5681685d550346895dde98aa/html5/thumbnails/34.jpg)
Containing Failure: Datacenter Clusters• Datacenters are divided into “clusters”
• Approximately 1000 rack-mounted servers (we call them “nodes”)• Provides a unit of fault isolation
• Each cluster is managed by a Fabric Controller (FC)
Cluster1
Cluster2
Clustern
…Datacenter network
FC FC FC
![Page 35: Windows Azure Compute](https://reader035.fdocuments.in/reader035/viewer/2022062411/5681685d550346895dde98aa/html5/thumbnails/35.jpg)
The Fabric Controller (FC)• The “kernel” of the cloud operating system
• Manages datacenter hardware• Manages Windows Azure services
• Four main responsibilities:• Datacenter resource allocation• Datacenter resource provisioning• Service lifecycle management• Service health management
• Inputs:• Description of the hardware and network resources it will
control• Service model and binaries for cloud applications
![Page 36: Windows Azure Compute](https://reader035.fdocuments.in/reader035/viewer/2022062411/5681685d550346895dde98aa/html5/thumbnails/36.jpg)
Service Deployment Steps• Process service model files
• Determine resource requirements• Create role images
• Allocate compute and network resources• Prepare nodes
• Place role images on nodes• Create virtual machines• Start virtual machines and roles
• Configure networking• Dynamic IP addresses (DIPs) assigned to nodes• Virtual IP addresses (VIPs) + ports allocated and mapped to sets of
DIPs• Configure packet filter for VM to VM traffic within service• Program load balancers to allow traffic to external endpoints
![Page 37: Windows Azure Compute](https://reader035.fdocuments.in/reader035/viewer/2022062411/5681685d550346895dde98aa/html5/thumbnails/37.jpg)
Service Resource Allocation• Goal: allocate service components to available resources
while satisfying all hard constraints • Size of VM
• HW requirements: CPU, Memory, Storage, Network• Upgrade domains• Fault domains
• Secondary goal: Satisfy soft constraints • Optimize network proximity: pack different roles into same node
![Page 38: Windows Azure Compute](https://reader035.fdocuments.in/reader035/viewer/2022062411/5681685d550346895dde98aa/html5/thumbnails/38.jpg)
Deploying a ServiceRole B
Middle-Tier RoleCount: 3
Update Domains: 3Size: Large
Role AFront-End Role
(Front End)Count: 2
Update Domains: 3Size: Medium
LoadBalance
r10.100.0.36
10.100.0.122
www.mycloudapp.net
www.mycloudapp.net
Fault domain
Upgrade domain
![Page 39: Windows Azure Compute](https://reader035.fdocuments.in/reader035/viewer/2022062411/5681685d550346895dde98aa/html5/thumbnails/39.jpg)
Inside a Deployed Node
Fabric Controller (Primary)
FC Host Agent
Host Partition
Guest Partitio
nGuest Agent
Guest Partitio
nGuest Agent
Guest Partitio
nGuest Agent
Guest Partitio
nGuest Agent
Physical Node
Fabric Controller (Replica)
Fabric Controller (Replica)…
Role Instance
Role Instance
Role Instance
Role Instance
Trust boundary Image Repository
(OS VHDs, role ZIP files)
![Page 40: Windows Azure Compute](https://reader035.fdocuments.in/reader035/viewer/2022062411/5681685d550346895dde98aa/html5/thumbnails/40.jpg)
Detection: Load Balancer Operation• FC programs load balancers (LB) to “probe” guest
agent (GA) every 15 seconds• If the guest misses two probes, the LB stops forwarding
traffic• The role can report “busy” status to the GA • GA stops responding to probes
• LB keeps an idle connection open for 60s• Use keep-alive commands if the connection needs to be
open longer
![Page 41: Windows Azure Compute](https://reader035.fdocuments.in/reader035/viewer/2022062411/5681685d550346895dde98aa/html5/thumbnails/41.jpg)
Recovery: Server and Role Health• FC maintains service availability by monitoring the
software and hardware health• Based primarily on heartbeats • Automatically “heals” affected rolesProblem Fabric Detection Fabric Response
Role instance crashes FC guest agent monitors role termination FC restarts role
Guest VM or agent crashes FC host agent notices missing guest agent heartbeats
FC restarts VM and hosted role
Host OS or agent crashes FC notices missing host agent heartbeat Tries to recover nodeFC reallocates roles to other nodes
Detected node hardware issue Host agent informs FC FC migrates roles to other nodesMarks node “out for repair”
![Page 42: Windows Azure Compute](https://reader035.fdocuments.in/reader035/viewer/2022062411/5681685d550346895dde98aa/html5/thumbnails/42.jpg)
Updating Your Service• There are two update types:• In-place: used for large scale services and used to
updated services with local state• VIP swap: for ease of testing and fail-back for smaller
services• In-place (rolling) update:• Role instances updated one update domain at a time• Two modes: automatic and manual
![Page 43: Windows Azure Compute](https://reader035.fdocuments.in/reader035/viewer/2022062411/5681685d550346895dde98aa/html5/thumbnails/43.jpg)
In-Place Update• Purpose: Ensure service stays up
while updating• Used by Windows Azure OS updates
• System considers update domains when upgrading a service• 1/Update domains = percent of
service that will be offline• Default is 5 and max is 20
• The Windows Azure SLA is based on at least two update domains and two role instances of each role
Front-End-
1
Front-End-
2
Update Domain 1
Update Domain
2
Middle
Tier-1
Middle
Tier-2
Middle
Tier-3
Update Domain
3
Middle Tier-
3Front-End-2Front-End-
1
Middle Tier-
2
Middle
Tier-1
![Page 44: Windows Azure Compute](https://reader035.fdocuments.in/reader035/viewer/2022062411/5681685d550346895dde98aa/html5/thumbnails/44.jpg)
Windows Azure Compute Summary• Platform as a Service is all about reducing
management and operations overhead• The Windows Azure Fabric Controller is the
foundation for Windows Azure’s PaaS• Provisions machines• Deploys services• Configures hardware for services• Monitors service and hardware health
![Page 45: Windows Azure Compute](https://reader035.fdocuments.in/reader035/viewer/2022062411/5681685d550346895dde98aa/html5/thumbnails/45.jpg)