Scan-Sharing for Optimizing RDF Graph Pattern Matching on MapReduce
Optimizing MapReduce Provisioning in the Cloud
description
Transcript of Optimizing MapReduce Provisioning in the Cloud
![Page 1: Optimizing MapReduce Provisioning in the Cloud](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816936550346895de098a6/html5/thumbnails/1.jpg)
University of Minnesota
Optimizing MapReduce Provisioningin the Cloud
Michael Cardosa, Aameek Singh†,Himabindu Pucha†, Abhishek Chandra
http://www.cs.umn.edu/~cardosa
Department of Computer Science, University of Minnesota
†IBM Almaden Research Center
![Page 2: Optimizing MapReduce Provisioning in the Cloud](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816936550346895de098a6/html5/thumbnails/2.jpg)
University of Minnesota
MapReduce Provisioning Problem Platform:
Virtualized Cloud Environment, which enables
Virtualized MapReduce Clusters Several MapReduce Jobs from different
users Goal: Optimize system-wide metrics, such
as: throughput, energy, load distribution, user costs
Problem: At the Cloud Service Provider level, how can we harvest opportunities to increase performance, save energy, or reduce user costs? 2
![Page 3: Optimizing MapReduce Provisioning in the Cloud](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816936550346895de098a6/html5/thumbnails/3.jpg)
University of Minnesota
MapReduce Platform: Hadoop Open-source implementation of MapReduce
distributed computing framework Used widely: Yahoo, Facebook, NYT, (Google)
InputData
![Page 4: Optimizing MapReduce Provisioning in the Cloud](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816936550346895de098a6/html5/thumbnails/4.jpg)
University of Minnesota
Hadoop Clusters
4
Distributed data Replicated chunks
Distributed computation Map/reduce tasks
Traditional: Dedicated physical nodes
![Page 5: Optimizing MapReduce Provisioning in the Cloud](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816936550346895de098a6/html5/thumbnails/5.jpg)
University of Minnesota
Virtual Hadoop Clusters
5
Run Hadoop on top of VMs E.g.: Amazon Elastic MapReduce =
Hadoop+AmazonEC2
Server Pool
VM Pool
Hadoop Processes
![Page 6: Optimizing MapReduce Provisioning in the Cloud](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816936550346895de098a6/html5/thumbnails/6.jpg)
University of Minnesota
Roadmap Intro & Problem Platform Overview Spatio-Temporal Insights for
Provisioning Building Blocks for MapReduce
Provisioning Case Study: Performance optimization Case Study: Energy optimization
6
![Page 7: Optimizing MapReduce Provisioning in the Cloud](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816936550346895de098a6/html5/thumbnails/7.jpg)
University of Minnesota
Spatio-Temporal Insights for Provisioning
Initial Focus: Energy Savings Goal: Minimize energy usage
Energy+cooling ~ 42% of total cost [Hamilton08]
Problem: How to place the VMs on available physical servers to minimize energy usage? Minimize Cumulative Machine Uptime (CMU)
7
![Page 8: Optimizing MapReduce Provisioning in the Cloud](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816936550346895de098a6/html5/thumbnails/8.jpg)
University of Minnesota
VM Placement: Spatial Fit
8
Job 1 Job 2 Job 3 Job 4
Co-Place complementary
workloads
![Page 9: Optimizing MapReduce Provisioning in the Cloud](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816936550346895de098a6/html5/thumbnails/9.jpg)
University of Minnesota
Which placement is better?
9
20min
10min
100min
20min20min
20min
SHUTDOWN SHUTDOWN
A B
![Page 10: Optimizing MapReduce Provisioning in the Cloud](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816936550346895de098a6/html5/thumbnails/10.jpg)
University of Minnesota
Time Balancing
10
20 25
90
20 25 20 25
20 25
30
20 25
30
20 25
30
Time Balance
![Page 11: Optimizing MapReduce Provisioning in the Cloud](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816936550346895de098a6/html5/thumbnails/11.jpg)
University of Minnesota
Building Blocks for Provisioning
11
Objective-drivenresource provisioning
MapReduce Jobs
Jobprofiling
Clusterscaling Migration
Cloud Execution Environment
Initial Provisioning Continuous Optimization
![Page 12: Optimizing MapReduce Provisioning in the Cloud](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816936550346895de098a6/html5/thumbnails/12.jpg)
University of Minnesota
Building Blocks for Provisioning Job Profiling: MapReduce job runtime
estimation Based on number of VMs allocated to job Based on input data size Offline and Online Profiling
Cluster Scaling: Changing number of VMs allocated to a particular MapReduce job Affects runtime of job; relies on Job Profiling
model Migration: Useful for continuous
optimization Load balancing, VM consolidation
12
![Page 13: Optimizing MapReduce Provisioning in the Cloud](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816936550346895de098a6/html5/thumbnails/13.jpg)
University of Minnesota
Job Profiling: Runtime Estimation Based on Number of VMs
13
![Page 14: Optimizing MapReduce Provisioning in the Cloud](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816936550346895de098a6/html5/thumbnails/14.jpg)
University of Minnesota
Job Profiling: Runtime Estimation Based on Input Data Size
14
![Page 15: Optimizing MapReduce Provisioning in the Cloud](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816936550346895de098a6/html5/thumbnails/15.jpg)
University of Minnesota
Job Profiling: Runtime Estimation Online Profiling: Additional refinement
15
![Page 16: Optimizing MapReduce Provisioning in the Cloud](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816936550346895de098a6/html5/thumbnails/16.jpg)
University of Minnesota
Cluster Scaling Increasing allocated resources (typical):
Add additional VMs to join virtualized Hadoop cluster
Job performance increases, runtime decreases
E.g, for Time Balancing: Energy reasons E.g, Load Balancing and Deadlines:
Performance
16
![Page 17: Optimizing MapReduce Provisioning in the Cloud](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816936550346895de098a6/html5/thumbnails/17.jpg)
University of Minnesota
Cluster Scaling: Time Balancing
17
20 25
90
20 25 20 25
20 25
30
20 25
30
20 25
30
Time Balance
![Page 18: Optimizing MapReduce Provisioning in the Cloud](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816936550346895de098a6/html5/thumbnails/18.jpg)
University of Minnesota
Roadmap Intro & Problem Platform Overview Spatio-Temporal Insights for
Provisioning Building Blocks for MapReduce
Provisioning Case Study: Performance optimization Case Study: Energy optimization
18
![Page 19: Optimizing MapReduce Provisioning in the Cloud](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816936550346895de098a6/html5/thumbnails/19.jpg)
University of Minnesota
Case Study: Performance & Deadlines
Goal: Meet deadlines for MapReduce jobs Determine initial allocation accurately Dynamically adjust allocation to meet
deadline if necessary Monitoring: Use offline profiling to estimate
number of VMs needed based on past performance
Actuation: Online profiling: Trigger points to invoke cluster scaling
19
![Page 20: Optimizing MapReduce Provisioning in the Cloud](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816936550346895de098a6/html5/thumbnails/20.jpg)
University of Minnesota
Case Study: Energy Savings Goal: Minimize energy consumption from
the execution of a large batch of MapReduce jobs Energy+cooling ~ 42% of total cost
[Hamilton08] Pass energy savings on to users
Problem: How to place the VMs on available physical servers to minimize energy usage? Minimize Cumulative Machine Uptime (CMU)
20
![Page 21: Optimizing MapReduce Provisioning in the Cloud](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816936550346895de098a6/html5/thumbnails/21.jpg)
University of Minnesota
Case Study: Energy Savings Use Job Profiling to place similar-runtime
VMs together for initial provisioning Use Job Profiling to adjust number of
VMs in each cluster to adjust runtimes if needed
Monitoring: Online profiling to determine when energy could be saved by using migration or cluster scaling
Actuation: Use Cluster Scaling or Migration to dynamically adjust for inaccuracies/unknowns in initial provisioning
21
![Page 22: Optimizing MapReduce Provisioning in the Cloud](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816936550346895de098a6/html5/thumbnails/22.jpg)
University of Minnesota
Conclusion Framework: Building blocks (STEAMEngine)
for the optimization of MapReduce provisioning from a cloud service provider perspective
Preliminary evaluations to validate usefulness of each building block
Approaches for applying building blocks to meet specific goals, e.g. performance, energy
22
![Page 23: Optimizing MapReduce Provisioning in the Cloud](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816936550346895de098a6/html5/thumbnails/23.jpg)
University of Minnesota
Thank you! Questions?
23
![Page 24: Optimizing MapReduce Provisioning in the Cloud](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816936550346895de098a6/html5/thumbnails/24.jpg)
University of Minnesota
Job Profiling: Runtime Estimation Based on Number of VMs
24
![Page 25: Optimizing MapReduce Provisioning in the Cloud](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816936550346895de098a6/html5/thumbnails/25.jpg)
University of Minnesota
Cluster Scaling Increasing allocated resources (typical):
Add additional VMs to join virtualized Hadoop cluster
Job performance increases, runtime decreases
E.g, for Time Balancing: Energy reasons E.g, Load Balancing and Deadlines:
Performance
25