Data Centers and Cloud Computinglass.cs.umass.edu/course_notes/677_spring16/Lec25.pdf · Data...
Transcript of Data Centers and Cloud Computinglass.cs.umass.edu/course_notes/677_spring16/Lec25.pdf · Data...
Data Centers and Cloud Computing
1
Prateek Sharma
CS677 Guest Lecture
Computer Science Lecture 19, page
Today’s Class
2
Data Centers
Middleware
Clouds
Physical Infrastructure
Virtualization, Management
IaaS, PaaS
Computer Science Lecture 19, page
Data Centers• Large server and storage farms
• 1000s of servers• Many TBs or PBs of data• 100s of MW power consumption
• Used by • Enterprises for server applications• Internet companies
• Some of the biggest DCs are owned by Google, Facebook, etc
• Used for• Data processing• Web sites• Business apps
3
Computer Science Lecture 19, page
Data Centers are Huge
4
Computer Science Lecture 19, page
Traditional vs “Modern”• Data Center architecture and uses have been
changing
• Traditional - static• Applications run on physical servers• System administrators monitor and manually manage servers• Use Storage Array Networks (SAN) or Network Attached
Storage (NAS) to hold data
• Modern - dynamic, larger scale• Run applications inside virtual machines• Flexible mapping from virtual to physical resources• Increased automation allows larger scale
5
Computer Science Lecture 19, page
Inside a Data Center• Giant warehouse filled with:• Racks of servers• Storage arrays
• Cooling infrastructure• Power converters• Backup generators
6
Computer Science Lecture 19, page
Power Consumption• Metric: Power Usage Efficiency • PUE = Total Facility Power / IT Equipment Power• PUE : 1.1 to 1.2 range (20% energy spent on non-compute stuff)• PUE of new Facebook facility : 1.08
7
Computer Science Lecture 19, page
Power• Generally built in proximity of cheap electricity • Solar, Wind,…
8
Computer Science Lecture 19, page
Hardware• Custom server designs: open compute project• Custom networking infrastructure• Thermal management using free cooling• Temperature Management in Data Centers: Why Some
(might) Like it Hot [Sigmetrics ’12]• Environmental Conditions and Disk Reliability in Free-
cooled Datacenters [FAST ’16]• James Hamilton: http://perspectives.mvdirona.com/
9
MGHPCC Data Center
• Data center in Holyoke10
Computer Science Lecture 19, page
Modular Data Center• ...or use shipping
containers
• Each container filled with thousands of servers
• Can easily add new containers• “Plug and play”• Just add electricity
• Allows data center to be easily expanded
11
Computer Science Lecture 19, page
Data Center Challenges• Resource management
• How to efficiently use server and storage resources?• Many apps have variable, unpredictable workloads• Want high performance and low cost• Automated resource management• Performance profiling and prediction
• Energy Efficiency• Servers consume huge amounts of energy• Want to be “green”• Want to save money
12
Computer Science Lecture 19, page
Reliability Challenges• Typical failures in first year of a google data center:
• 0.5% overheat (power down most machines in under five minutes, expect 1-2 days to recover)
• 1 PDU (Power Distribution Unit) failure (about 500-1000 machines suddenly disappear, budget 6 hours to come back)
• 1 rack-move (You have plenty of warning: 500-1000 machines powered down, about 6 hours)
• 1 network rewiring (rolling 5% of machines down over 2-day span)• 20 rack failures (40-80 machines instantly disappear, 1-6 hours to get back) 5 racks go
wonky (40-80 machines see 50% packet loss)• 8 network maintenances (4 might cause ~30-minute random connectivity losses)• 12 router reloads (takes out DNS and external virtual IP address (VIPS) for a couple
minutes)• 3 router failures (have to immediately pull traffic for an hour)• dozens of minor 30-second blips for DNS• 1000 individual machine failures• thousands of hard drive failures
http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/people/jeff/stanford-295-talk.pdf
13
Computer Science Lecture 19, page
Data Center Costs• Running a data center is expensive
http://perspectives.mvdirona.com/2008/11/28/CostOfPowerInLargeScaleDataCenters.aspx
14
Computer Science Lecture 19, page
Economy of Scale• Larger data centers can be cheaper to buy and run
than smaller ones• Lower prices for buying equipment in bulk• Cheaper energy rates
• Automation allows small number of sys admins to manage thousands of servers
• General trend is towards larger mega data centers• 100,000s of servers
• Has helped grow the popularity of cloud computing
15
Computer Science Lecture 19, page
Software Stack
16
1
The Google Stack
Source: Malte Schwarzkopf. “Operating system support for warehouse-scale computing”. PhDthesis. University of Cambridge Computer Laboratory (to appear), 2015, Chapter 2.
Chubby [Bur06]locking and coordination
Borg [VPK+15] and Omega [SKA+13]cluster manager and job scheduler
coordination & cluster management
Dapper [SBB+10]pervasive tracing
CPI2 [ZTH+13]interference mitigationm
onito
ring
tool
s
GFS/Colossus [GGL03]distributed block store and file system
BigTable [CDG+06]row-consistent multi-dimensional sparse map
MegaStore [BBC+11]cross-DC ACID database
Spanner [CDE+13]cross-DC multi-version DB
Dremel [MGL+10]columnar database
data storage
MapReduce [DG08]parallel batch processing
FlumeJava [CRP+10]parallel programming
Tenzing [CLL+11]SQL-on-MapReduce
Percolator [PD10]incremental processing PowerDrill [HBB+12]
query UI & columnar store
MillWheel [ABB+13]stream processing
Pregel [MAB+10]graph processing
data processing
Figure 1: The Google infrastructure stack. I omit the F1 database [SOE+12] (the back-endof which was superseeded by Spanner), and unknown front-end serving systems. Arrowsindicate data exchange and dependencies between systems; simple layering does not implya dependency or relation.
In addition, there are also papers that do not directly cover systems in the Google stack:
• An early-days (2003) high-level overview of the Google architecture [BDH03].
• An extensive description of Google’s General Configuration Language (GCL), sadly withsome parts blackened [Bok08].
• A study focusing on tail latency effects in Google WSCs [DB13].
• Several papers characterising Google workloads from public traces [MHC+10; SCH+11;ZHB11; DKC12; LC12; RTG+12; DKC13; AA14].
• Papers analysing the impact of workload co-location [MTH+11; MT13], hyperthread-ing [ZZE+14], and job packing strategies on workloads [VKW14].
Bin Packing Scheduling
Computer Science Lecture 19, page
Cluster Scheduling• Google Borg, Omega, Kubernetes• Apache Mesos, Hadoop YARN• Sparrow (Distributed) , TetriSched [EuroSys ’16]• Docker Swarm
17
Omega: flexible, scalable schedulers for large compute clusters
Malte Schwarzkopf † ⇤ Andy Konwinski‡ ⇤ Michael Abd-El-Malek§ John Wilkes§†University of Cambridge Computer Laboratory ‡University of California, Berkeley §Google, Inc.†
‡
§
{mabdelmalek,johnwilkes}@google.com
AbstractIncreasing scale and the need for rapid response to changingrequirements are hard to meet with current monolithic clus-ter scheduler architectures. This restricts the rate at whichnew features can be deployed, decreases efficiency and uti-lization, and will eventually limit cluster growth. We presenta novel approach to address these needs using parallelism,shared state, and lock-free optimistic concurrency control.
We compare this approach to existing cluster schedulerdesigns, evaluate how much interference between schedulersoccurs and how much it matters in practice, present sometechniques to alleviate it, and finally discuss a use casehighlighting the advantages of our approach – all driven byreal-life Google production workloads.
Categories and Subject Descriptors D.4.7 [OperatingSystems]: Organization and Design—Distributed systems;K.6.4 [Management of computing and information systems]:System Management—Centralization/decentralization
Keywords Cluster scheduling, optimistic concurrency con-trol
1. IntroductionLarge-scale compute clusters are expensive, so it is impor-tant to use them well. Utilization and efficiency can be in-creased by running a mix of workloads on the same ma-chines: CPU- and memory-intensive jobs, small and largeones, and a mix of batch and low-latency jobs – ones thatserve end user requests or provide infrastructure servicessuch as storage, naming or locking. This consolidation re-duces the amount of hardware required for a workload, butit makes the scheduling problem (assigning jobs to ma-chines) more complicated: a wider range of requirements
⇤ Work done while interning at Google, Inc.
Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. To copy otherwise, to republish, to post on servers or to redistributeto lists, requires prior specific permission and/or a fee.EuroSys’13 April 15-17, 2013, Prague, Czech RepublicCopyright c
� 2013 ACM 978-1-4503-1994-2/13/04. . . $15.00
subset
Monolithic Two-level Shared state
cluster machines
cluster stateinformation
schedulinglogic
pessimisticconcurrency
(offers)
optimisticconcurrency
(transactions)
no concurrency
full state
Figure 1: Schematic overview of the scheduling architec-tures explored in this paper.
and policies have to be taken into account. Meanwhile, clus-ters and their workloads keep growing, and since the sched-uler’s workload is roughly proportional to the cluster size,the scheduler is at risk of becoming a scalability bottleneck.
Google’s production job scheduler has experienced allof this. Over the years, it has evolved into a complicated,sophisticated system that is hard to change. As part of arewrite of this scheduler, we searched for a better approach.
We identified the two prevalent scheduler architecturesshown in Figure 1. Monolithic schedulers use a single,centralized scheduling algorithm for all jobs (our existingscheduler is one of these). Two-level schedulers have a sin-gle active resource manager that offers compute resources tomultiple parallel, independent “scheduler frameworks”, asin Mesos [13] and Hadoop-on-Demand [4].
Neither of these models satisfied our needs. Monolithicschedulers do not make it easy to add new policies and spe-cialized implementations, and may not scale up to the clus-ter sizes we are planning for. Two-level scheduling archi-tectures do appear to provide flexibility and parallelism, butin practice their conservative resource-visibility and lockingalgorithms limit both, and make it hard to place difficult-to-schedule “picky” jobs or to make decisions that requireaccess to the state of the entire cluster.
Our solution is a new parallel scheduler architecture builtaround shared state, using lock-free optimistic concurrencycontrol, to achieve both implementation extensibility andperformance scalability. This architecture is being used in
351
Computer Science Lecture 19, page
Azure
What is the cloud?
Remotely available Pay-as-you-go High scalability
Shared infrastructure
18
Computer Science Lecture 19, page
Hardware Virtualization• Allows a server to be “sliced” into Virtual Machines
• VM has own OS/applications
• Rapidly adjust resource allocations
• VM migration within a LAN
Virtualization Layer
Linux
VM 2
Windows
VM 1
Windows Linux
19
Computer Science Lecture 19, page
Virtualization in Data Centers• Virtual Servers
• Consolidate servers• Faster deployment• Easier maintenance
• Virtual Desktops• Host employee desktops in VMs• Remote access with thin clients• Desktop is available anywhere• Easier to manage and maintain
Home
Work
20
Computer Science Lecture 19, page
Containers• Virtual Machines too “heavy”, especially for I/O• VM advantages:
• Run different Operating Systems (Linux, BSD, Windows..)• Completely isolated (performance and security)
• Containers promise compartmentalization with bare-metal performance
• Joyent: Solaris (ILLUMOS) Zones used for public IaaS cloud
• Others: Containers inside VMs for security
21
Computer Science Lecture 19, page
The Cloud StackSoftware as a Service
Platform as a Service
Infrastructure as a Service
Azure
Office apps, CRM
Software platforms
Servers & storage
Hosted applicationsManaged by provider
Platform to let you run your own apps
Provider handles scalability
Raw infrastructureCan do whatever you
want with it22
Computer Science Lecture 19, page
• Rents servers and storage to customers• Uses virtualization to share each server for multiple customers• Economy of scale lowers prices• Can create VM with push of a button
IaaS: Amazon EC2
Smallest Medium Largest
VCPUs 1 5 33.5
RAM 613MB 1.7GB 68.4GB
Price $0.02/hr $0.17/hr $2.10/hr
Storage $0.10/GB per month
Bandwidth $0.10 per GB
23
Computer Science Lecture 19, page
• Provides highly scalable execution platform• Must write application to meet App Engine API• App Engine will autoscale your application• Strict requirements on application state
• “Stateless” applications much easier to scale
• Hardware virtualization not required• Multiple users’ threads running in same OS• Allows google to quickly increase number of “worker threads”
running each client’s application
• Simple scalability, but limited control• Only supports Java and Python
PaaS: Google App Engine
24
Computer Science Lecture 19, page
Programming Models• Client/Server
• Web servers, databases, CDNs, etc
• Batch processing• Business processing apps, payroll, etc
• Map Reduce• Data intensive computing• Scalability concepts built into programming model
25
Computer Science Lecture 19, page
Public or Private• Not all enterprises are comfortable with using
public cloud services• Don’t want to share CPU cycles or disks with competitors• Privacy and regulatory concerns
• Private Cloud• Use cloud computing concepts in a private data center
• Automate VM management and deployment• Provides same convenience as public cloud• May have higher cost
• Hybrid Model• Move resources between private and public depending on load• Cloud Bursting
26
Computer Science Lecture 19, page
Amazon Ecosystem
27
Compute:•EC2•Elastic Map Reduce•Container Service•Lambda (event handling)
Storage:•EBS•S3•Glacier (long term)
Data base:•DynamoDB•Redshift•Relational DB Service
Applications:•Email server (Route 53)•Machine Learning•Message Queueing•Micropayment
Computer Science Lecture 19, page
Cloud Challenges• Privacy / Security
• How to guarantee isolation between client resources?
• Extreme Scalability• How to efficiently manage 1,000,000 servers?
• Programming models• How to effectively use 1,000,000 servers?
28
Computer Science Lecture 19, page
Cloud Economics• Provide on-demand resources• Need to maintain surplus, idle servers• Solution: cheap, transient servers (EC2 Spot instances)• Can be revoked at any time
29
0 50 100 150 200Time
0.0
0.1
0.2
0.3
0.4
0.5
Pric
e ($
/hr)
Spot price
Revocation
Bid
70-90% lower cost
Computer Science Lecture 19, page
Using Spot Instances• Place a bid. Amazon terminates if spot price > bid• Termination causes loss of application state• Mitigate revocations by checkpointing • Nested Virtualization: SpotCheck [EuroSys ’15]• Spark: Flint [EuroSys ’16]• Failures among servers of different types uncorrelated
30
0
5
10
15
20
0 5 10 15 20Spot Market
Spot
Mar
ket
0
200
400
600
RevocationGap(Hours)
Computer Science Lecture 19, page
Further Resources• “Above the Clouds” - cloud computing survey
paper from Berkeley
• Workshops & Conferences• Hot Topics in Cloud Computing (HotCloud)• Symposium on Cloud Computing (SOCC)• lots of other small workshops• most recent systems conferences (NSDI, USENIX ATC, OSDI,
SOSP)
• Other• Amazon AWS blog by Jeff Barr• James Hamilton’s Perspectives: http://
31