This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative...
-
Upload
isaiah-eaves -
Category
Documents
-
view
222 -
download
8
Transcript of This work is licensed under a Creative Commons Attribution 3.0 United States License.Creative...
This work is licensed under a Creative Commons Attribution 3.0 United States License.
Roger BargaArchitect, Cloud Computing Futures GroupMicrosoft Research (MSR)
Cloud Computing – A Microsoft Research Perspective
Contributors to this presentation include
Dan Reed, Dennis Gannon, Navendu Jain, and Tony Hey (MSR)
This work is licensed under a Creative Commons Attribution 3.0 United States License.
eXtreme Computing, MSR
CCF
CISGFS
Rethink the nature of computing at extreme scale, from alternative, quantum computing models, through the transformative effects of manycore parallelism on programming systems and architectures, through massive cloud computing infrastructure designs.
eXtreme Computing DivisionDan Reed, CVP Microsoft Research
ab initio research and development on cloud hardware and software infrastructure. Investigate cloud computing for research empowerment with worldwide government & academic partnerships.
Cloud Computing Futures Group
This work is licensed under a Creative Commons Attribution 3.0 United States License.
Talk Outline
• Data Center landscape• Rise of the cloud computing platform.• Data intensive research, role of cloud computingKey takeaways… • Data centers and HPC, like twins separated at birth [Dan Reed]
• Data centers evolving at a blistering pace, driven by economics
• The Application Model for Cloud Computing Is Evolving• Economic landscape increasingly favors ‘pay as you go’• There are many obstacles, but economic forces will dominate
the obstacles• Emergence of the Fourth Paradigm, synergistic with cloud
computing
This work is licensed under a Creative Commons Attribution 3.0 United States License.
HPC and Clouds – Select Comparisons
Node and system architectures Communication fabricStorage systems and analyticsPhysical plant and operationsReliability and resilienceProgramming models
This work is licensed under a Creative Commons Attribution 3.0 United States License.
HPC Node ArchitectureMoore’s “Law” favored commodity systems
Specialized processors and systems faltered“Killer micros” and industry standard blades ledInexpensive clusters now dominate
www.top500.org
This work is licensed under a Creative Commons Attribution 3.0 United States License.
HPC InterconnectsEthernet for low end (cost sensitive)High end expectations
{Nearly} flat networks and very large switchesOperating system bypass for low latency (microseconds)
www.top500.org
This work is licensed under a Creative Commons Attribution 3.0 United States License.
Modern Data Center Network
InternetInternetCR CR
AR AR AR AR…
SSLB LB
Data CenterLayer 3
Internet
SS
A AA …
SS
A AA …
…
Layer 2
Key:• CR (L3 Border Router)• AR (L3 Access Router)• S (L2 Switch)• LB (Load Balancer)• A (20 Server Rack/TOR)
Source: Albert Greenberg and Cisco
GigE
10 GigE
Monsoon network with Valiant routing
This work is licensed under a Creative Commons Attribution 3.0 United States License.
HPC Storage SystemsLocal disk
Scratch or non-existent
Secondary storageSAN and parallel file systemsHundreds of TBs (at most)
Tertiary storageTape robot(s)3-5 GB/s bandwidth
www.nersc.gov
~60 PB capacity
This work is licensed under a Creative Commons Attribution 3.0 United States License.
I/O Implications and ScaleTypical HPC scenario
MPI computationDomain decompositionSAN-based parallel file systemPeriodic checkpoints
Scaling challengesSystem MTBF approaching zeroCheckpoint frequency increasingI/O demand becoming intolerable
ImplicationsUnlikely to extend to exascaleLoosely consistent models required
0
20
40
60
80
100
120
140
0.9999
0.99999
0.999999
Slide by Dan Reed
This work is licensed under a Creative Commons Attribution 3.0 United States License.
Cloud/HPC Hardware Comparison
Predominate differencesNetwork architecture and SAN storageEfficient virtualization
Attribute HPC Cloud
Processor High-end x86 x86
Memory 1-8 GB 8 GB+
Local Disk Scratch only Permanent storage
SAN Storage Common Rare
Tertiary Storage Common Rare
Interconnect Infiniband or 10 GigE 1 GigE/10GigE
Network Flat Hierarchical
Physical Plant Traditional Optimized
Efficient
Virtualization
This work is licensed under a Creative Commons Attribution 3.0 United States License.
Virtualization as EnablerEMULATION OF EXISTING APPS
Resource utilization pool concrete resourcesDecouples concrete resources enables migrationExtend existing abstractions e.g. LUN expansion
ENABLEMENT OF NEW SERVICES
Hardware via existing ISA, memory mapped ports, etc.Storage via SCSI LUN or other disk interfaceApplication via underlying API
This work is licensed under a Creative Commons Attribution 3.0 United States License.
HPC Physical PlantFacilities
Co-located with operating institutionStandard raised floor and CRAC unitsLimited UPS supportTypically constrained to 3-5 MW
Designed as Lab showpiecesLBL
LANL ORNLANL
38,640 cores
150,152 cores 163,840 cores ~130,000 cores
This work is licensed under a Creative Commons Attribution 3.0 United States License.
The Data Center LandscapeRange in size from “edge” facilities to megascale.Unprecedented economies of scaleApproximate costs for a medium
size center (1000 servers) and large, 50K server center.
Each data center is 11.5 times
the size of a football field
Technology Cost in Medium-sized Data Center
Cost in Very Large Data Center
Ratio
Network $95 per Mbps/month
$13 per Mbps/month
7.1
Storage $2.20 per GB/month
$0.40 per GB/month
5.7
Administration
~140 servers/Administrator
>1000 Servers/Administrator
7.1James Hamilton, LADIS ‘08
Economies of ScaleElectricity
Put Datacenters at Cheap Power
Network
Put Datacenters on Main Trunks
OperationsStandardize
and Automate Ops
HardwareContainerized
Low-Cost Servers
This work is licensed under a Creative Commons Attribution 3.0 United States License.
Modern Data Center: Containers Separating Concerns
This work is licensed under a Creative Commons Attribution 3.0 United States License.
Data Center Design IssuesWhere are the costs? Mid-sized facility (20 containers)
Cost of power ($/kwh): $0.07Cost of facility: $200,000,000 (amortize 15 years)Number of Servers: 50,000 (3 year life) @$2K eachPower critical load 15MWPower Usage Effectiveness (PUE) 1.7
Observe:Fully burdened cost of power =power consumed + cost of cooling and power distribution infrastructureAs cost of servers drops and power costs rise, power will dominate all other costs.
$2,997,090$1,296,9
02
$1,042,440
$284,686
ServersPower & Cooling InfrastructurePowerOther Infra-structure
Monthly Costs
3yr server & 15 yr infrastructure amortization
This work is licensed under a Creative Commons Attribution 3.0 United States License.
PowerEPA released a report saying:
In 2006 data centers used 61 Terawatt-hours of powerTotal power bill: $4.5 billion7 GW peak load (15 power plants)This was 1.5 % of all US electrical energy use.Expected to double by 2011.
Power accounts for 30% of Data Center costsOnly 20%-30% CPU utilizationCauses: Uneven app fit, demand varies, over-provisioning, etc.
A deeper look and a few ideas ….
This work is licensed under a Creative Commons Attribution 3.0 United States License.
Power and Cooling Is Expensive!
• Infrastructure for power & cooling cost a lot…• Infrastructure PLUS Energy > Server Costs
Since 2001• Infrastructure Alone > Server Costs Since
2004• Energy Alone > Server Cost Since 2008
•Cost Effective to discard energy inefficient servers• Power Savings Infrastructure Savings!
Like Airlines Retiring Fuel-Guzzling Airplanes
This work is licensed under a Creative Commons Attribution 3.0 United States License.
What can we do about power costs?Data Centers use 1.5% of US electricity
$4.5 billion annually7 GW peak load (15 power plants)44.4 million mt CO2 (0.8% emissions)
Rethink EnvironmentalsRun them in a wider rage of conditionsChristian Belady’s “In Tent” data center experiment.
Rethink UPSGoogle’s battery per server.
Rethink ArchitectureIntel Atom and power states.Marlowe Project
This work is licensed under a Creative Commons Attribution 3.0 United States License.
Marlowe & the Big SleepAdaptive Resource Management
Monitor the data center and its apps.Use rules engine & fuzzy logic to control resourcesfor most current workloadsSpare capacity availableSleep/hibernate 3 – 4 watts (vs. 28 – 36 watts for Atom servers)5 – 45 sec. to reactivate server
Created by Navendu Jain, CJ Williams, Dan Reed and Jim Larus
This work is licensed under a Creative Commons Attribution 3.0 United States License.
Server
Capacity
Generation 1
Data Center Collocation
Rack
Density and Sustainability
Generation 2
Containers
ScalabilityThousands of Servers
Generation 3 Generation 4 (future)Modular Data Center
Right Time to Market,Lower TCO (PUE)
Scalable Data Centers
Pre-Assembled Components
Microsoft’s Data Center Evolution
Deployment Scale Unit
Data Center Evolution
This work is licensed under a Creative Commons Attribution 3.0 United States License.
What is a "cloud computing"?
“…data as a service…”
“…software as a service…”
“cloud computing journal reports that…”
“…everything as a service...”
This work is licensed under a Creative Commons Attribution 3.0 United States License.
Cloud Computing
Using a remote data center to manage scalable, reliable, on-demand access to application services and data.
Scalable means• Possibly millions of simultaneous users of
app.• Exploiting thousand-fold parallelism in the
app.
Reliable means on-demand means 5 “nines” available right now
Three New Aspects to Cloud ComputingIllusion of infinite computing resources available on demand
Elimination of an upfront commitment
Ability to pay for use of computing resources on a short-term basis as needed
This work is licensed under a Creative Commons Attribution 3.0 United States License.
Platform Extension to the Cloud is a Continuum
Server Hosted Server
Cloud Fabric
Windows (or Linux) Hosted OS Compute Fabric
DB Server Hosted DB Server Storage Fabric
…… …… ……
What You’ve Been Using So Far
• Hosted version of what you have been using so far• Requires few changes if any
to what you know and do
• New capabilities• New cost structure• Requires embracing a
specific app model
This work is licensed under a Creative Commons Attribution 3.0 United States License.
Spectrum of Application Models
This work is licensed under a Creative Commons Attribution 3.0 United States License.
Azure Programming Model
Azure Services (storage)
Load Balancer
Public Internet
Worker Role(s)
Front-endWeb Role
Switches
Highly-availableFabric Controller
In-band communication – software control
Load-balancers
Abstract Programming Model
This work is licensed under a Creative Commons Attribution 3.0 United States License.
Consists of a (large) group of machines, all of which are managed by software called the fabric controller
The fabric controller is replicated across a group of five to seven machines, and it owns all of the resources in the fabric
Because it can communicate with a fabric agent on every computer, it’s also aware of every Windows Azure application in this fabric
The Azure Fabric
This work is licensed under a Creative Commons Attribution 3.0 United States License.
RolesScalable, Fault Tolerant, Stateless
Roles are a mostly stateless process running in a Windows Server 2008 VM on one or more cores Web Roles provide web service
access to app Web roles generate tasks for worker roles
Worker Roles do “heavy lifting” and manage data in tables/blobs
Communication is through queues. The number of instances can scale
with load.
A Scalable architecture is critical to take advantage of scalable infrastructure
• Queues decouple different parts of app, making it easier to scale app parts independently;
• Flexible resource allocation, different priority queues and separation of backend servers to process different queues.
• Queues mask faults in worker roles.
This work is licensed under a Creative Commons Attribution 3.0 United States License.
The simplest way to store data in Azure storage is to use blobs• A blob contains binary data, up to 50GB
Each table holds some number of entities. An entity contains zero or more properties
SQL Data Services – provide the SQL data platform in the cloud
Storage Blobs, Tables and Queues, and a full relational database
Blobs can be big—up to 50 gigabytes each
They can also have associated metadata
This work is licensed under a Creative Commons Attribution 3.0 United States License.
Back to the Future (again….)Mid 1980's: The invention of client/server databases
Data locked up in mainframe DBsClosed & monolithic trust boundaryPCs?: Spreadsheets and terminal emulationNetworks – lots of them: DECNet, IPX, SNA, Banyan Vines, TCP/IP
Client / Server database challengesHad to invent: network abstraction layer, formats, protocolsHad to consider: latency, concurrency controlHad to move: trust boundaryWound up with only 60% of the incumbent's capability…could have been easily dismissed as a failure
End resultData was made accessible where it could be used in a new wayClient / Server databases are now viewed as being tremendously successful
This work is licensed under a Creative Commons Attribution 3.0 United States License.
Data in a Cloud Services World
Cloud database service challengesSame as Client / Server DBMS shift
Formats, protocols, authentication, authorization, latency, trust boundary
Will not do 100% of what client / server databases can do
Cloud database service capabilitiesData boundary moves from corporate LAN to internetUtility DBMS for cloud applicationsExpect new capabilities, new value proposition
This work is licensed under a Creative Commons Attribution 3.0 United States License.
Cloud Platform: Strategic Differentiator and Economics
Com
peti
tive A
dvanta
ge
Strategic
Motive: Competitive Advantage
Utilitarian
Motive: Resource & Cost Optimization
Innovation introduced by third firm
Innovation introduced by second firm
Innovation introduced by first firm
Time
Competitive advantageAND economics
This work is licensed under a Creative Commons Attribution 3.0 United States License.
The Economics of Elasticity – by the numbers…
Assume Our Service
Peaks at 500 Servers at Noon
Trough Requires 100 Servers at Midnight
Average Utilization Is 300 Servers
Actual Utilization
300 × 24 = 7200Server Hours / Day
ProvisionedResources
500 × 24 = 12000Servers Hours / Day
Pay as You Go Break-Even Point
12000 = 7200 × 1.667
Cheaper When Pay as You Go Servers Are Less than 1.667 Times Purchased Servers
Elasticity May Be More Cost-Effective Even with a Higher Per-Hour Charge!
E-Commerce Peaks December
Photo-Sharing Peaks January
Takes Weeks to Acquire and Install Equipment
Example of Elasticity
Seasonal Demands Require Significant Provisioning
This Example Underestimates the Benefits of Elasticity
This work is licensed under a Creative Commons Attribution 3.0 United States License.
Research Funding
1. Have good idea
2. Write proposal
3. Wait 6 months4. If successful, wait 3
months to get $$$
5. Install Computers
6. Start Work
Science Start-ups
1. Have good idea
2. Write Business Plan
3. Ask VCs to fund
4. If successful...
5. Install Computers
6. Start Work
Cloud Computing Model
1. Have good idea
2. Grab nodes from Cloud provider
3. Start Work
4. Pay for what you actually used
The Cloud Empowers the Long Tail of Research
Slide compliments of Paul Watson, University of Newcastle (UK)
Poised to reach a broad class of new users
This work is licensed under a Creative Commons Attribution 3.0 United States License.
Thousand years ago – Experimental Science
Description of natural phenomena
Last few hundred years – Theoretical Science
Newton’s Laws, Maxwell’s Equations…
Last few decades – Computational Science
Simulation of complex phenomena
Today – Data-Intensive ScienceScientists overwhelmed with data sets from a variety of different sources
Data captured by instruments, sensor networksData generated by simulationsData generated by computational models
Emergence of a Fourth Research Paradigm
With thanks to Jim Gray
Astronomy was one of the first disciplines to embrace data-intensive science with the Virtual Observatory (VO), enabling highly efficient access to data and analysis tools at a centralized site. The image shows the Pleiades star cluster form the Digitized Sky Survey combined with an image of the moon, synthesized within the WorldWide Telescope
This work is licensed under a Creative Commons Attribution 3.0 United States License.
Typical job, 10 – 20 CPU hours, extreme jobs require 1K – 2K CPU hours– Requires a large number of test runs for a given job (1 – 10M tests)– Highly compressed data per job ( ~100 KB per job)
Science ExamplePhyloD as an Azure Service
Cover of PLoS Biology
November 2008
• Statistical tool used to analyze DNA of HIV from large studies of infected patients
• PhyloD was developed by Microsoft Research and has been highly impactful
• Small but important group of researchers
100’s of HIV and HepC researchers actively use it
1000’s of research communities rely on results
This work is licensed under a Creative Commons Attribution 3.0 United States License.
Metagenomics Atop Azure
MetagenomicsEcosystem characterization
Map Reduce-styleParallel BLAST50 roles, speedup 45100 roles, speedup 94
BLAST user selects DBs and input sequence
BlastWeb Role
InputSplitterWorker
Role
BLASTExecutionWorker Role #n…
.
CombinerWorker Role
GenomeDB 1
GenomeDB K
BLAST DBConfiguration
Azure Blob Storage
BLASTExecutionWorker Role #1
Basic Map-Reduce - 2 GB database per worker - 500 MB input file.
This work is licensed under a Creative Commons Attribution 3.0 United States License.
Reference Data on AzureOcean Science data on Azure SDS-relational
Two terabytes of coastal and model data
Computational finance data on SDS-relationalBATS, daily tick data for stocks (10 years)XBRL call report for banks (10,000 banks)
Storing select seismic data on Azure, NSF funded consortium that collects and distributes global seismological data.• Data sets requested by researchers
worldwide• Includes HD videos, seismograms, images,
data from major seismic events.
This work is licensed under a Creative Commons Attribution 3.0 United States License.
TakeawaysCloud Computing: Apps Delivered as Services over the Internet and the Datacenter Hardware and Software Providing Them
Software as a Service: Application Services Delivered over the InternetUtility Computing: Virtualized Hardware and Compute Resources Delivered over the Internet
The Economics Are Changing towards Cloud ComputingBig Datacenters Offer Big Economies of ScaleCloud Computing Transfers Risks Away from the Application Providers
The Application Model for Cloud Computing Is EvolvingAdvantages to Being “Close to the Metal” versus Advantages to Higher LevelApplications Typically Cannot Port TransparentlyJust Because the Infrastructure Is Scalable Doesn’t Mean the App Is!!
There Are Many Obstacles to Ubiquitous Cloud ComputingTechnical Obstacles to Adoption and GrowthPolicy and Business Obstacles to Adoption
The Economic Forces Will Dominate the ObstaclesThere’s Too Much to Gain… It Will Grow!
This work is licensed under a Creative Commons Attribution 3.0 United States License.
Q &
A
Roger BargaArchitect, Cloud Computing Futures GroupMicrosoft Research (MSR)
Cloud Computing for Research