Ana-Maria Oprescu, Thilo Kielmann (Vrije University) Presented By Gal Cohen Cloud Computing Seminar...
-
Upload
gervais-owen -
Category
Documents
-
view
215 -
download
0
Transcript of Ana-Maria Oprescu, Thilo Kielmann (Vrije University) Presented By Gal Cohen Cloud Computing Seminar...
![Page 1: Ana-Maria Oprescu, Thilo Kielmann (Vrije University) Presented By Gal Cohen Cloud Computing Seminar CS Technion, Spring 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649dd25503460f94ac92ba/html5/thumbnails/1.jpg)
Scheduling Bag Of Tasks Under Budget Constraints
Ana-Maria Oprescu, Thilo Kielmann (Vrije University)Presented By Gal CohenCloud Computing SeminarCS Technion, Spring 2012
![Page 2: Ana-Maria Oprescu, Thilo Kielmann (Vrije University) Presented By Gal Cohen Cloud Computing Seminar CS Technion, Spring 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649dd25503460f94ac92ba/html5/thumbnails/2.jpg)
2
High throughput computing jobs No interactive deadline Tasks are independent of each other All tasks are ready for execution Unknown runtimes
Execution Model:◦ Allocate resources (e.g. machines)◦ Run each task (once) from the bag on some
machine
Bag Of Tasks
![Page 3: Ana-Maria Oprescu, Thilo Kielmann (Vrije University) Presented By Gal Cohen Cloud Computing Seminar CS Technion, Spring 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649dd25503460f94ac92ba/html5/thumbnails/3.jpg)
3
Unknown runtime distribution However, some distribution exists The total number of jobs is also known Tasks can be aborted
Assumptions: Bag Of Tasks
![Page 4: Ana-Maria Oprescu, Thilo Kielmann (Vrije University) Presented By Gal Cohen Cloud Computing Seminar CS Technion, Spring 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649dd25503460f94ac92ba/html5/thumbnails/4.jpg)
4
There are many Cloud providers. (EC2, Azure, Rackspace, 3Tera)
Many types of machines even in the same provider, for a different price.◦ CPU count and speed◦ Memory size
Upper limit on the number of machines assignable from a provider (self imposed)
A machine is charged per ATU (Hour)
Using cloud computing to run bag of tasks: Abstractions
![Page 5: Ana-Maria Oprescu, Thilo Kielmann (Vrije University) Presented By Gal Cohen Cloud Computing Seminar CS Technion, Spring 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649dd25503460f94ac92ba/html5/thumbnails/5.jpg)
5
The Goal◦ Run all the tasks from a given bag on cloud
computers, meeting a limited budget◦ Minimize the makespan of the whole bag (without
exceeding the budget constraint)
Assumption◦ Running each task on a machine separately (FIFO)
Problem description
![Page 6: Ana-Maria Oprescu, Thilo Kielmann (Vrije University) Presented By Gal Cohen Cloud Computing Seminar CS Technion, Spring 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649dd25503460f94ac92ba/html5/thumbnails/6.jpg)
6
The scheduler (BaTS) runs outside of the cloud (for free)
The scheduler gets the Bag Of Tasks It allocates machines from each cloud Dispatch jobs to the allocated machines Receives feedback on tasks completion
Model Description
![Page 7: Ana-Maria Oprescu, Thilo Kielmann (Vrije University) Presented By Gal Cohen Cloud Computing Seminar CS Technion, Spring 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649dd25503460f94ac92ba/html5/thumbnails/7.jpg)
BaTS: Budget constrained task scheduler (Illustration)
![Page 8: Ana-Maria Oprescu, Thilo Kielmann (Vrije University) Presented By Gal Cohen Cloud Computing Seminar CS Technion, Spring 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649dd25503460f94ac92ba/html5/thumbnails/8.jpg)
8
1. pick a sampling set of tasks of size 2. Pick initial workers from each machine type3. Run a test set on each type of machine
(parallel)4. Estimate avg Task Execution Time for each type5. Construct a configuration based on estimates6. Acquire Machines and run tasks7. At Regular monitoring intervals go back to 5
BaTS: Budget constrained task scheduler (Outline)
![Page 9: Ana-Maria Oprescu, Thilo Kielmann (Vrije University) Presented By Gal Cohen Cloud Computing Seminar CS Technion, Spring 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649dd25503460f94ac92ba/html5/thumbnails/9.jpg)
9
picking the sampling set size
Error LevelTypical Values: 0.10,0.15,0.20,0.25
– confidence interval
![Page 10: Ana-Maria Oprescu, Thilo Kielmann (Vrije University) Presented By Gal Cohen Cloud Computing Seminar CS Technion, Spring 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649dd25503460f94ac92ba/html5/thumbnails/10.jpg)
10
picking the sampling set size R
equir
ed s
am
ple
siz
e
(n)
Bag Of Tasks Size (N)
![Page 11: Ana-Maria Oprescu, Thilo Kielmann (Vrije University) Presented By Gal Cohen Cloud Computing Seminar CS Technion, Spring 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649dd25503460f94ac92ba/html5/thumbnails/11.jpg)
11
1. pick a sampling set of tasks of size 2. Pick initial workers from each machine type3. Run a test set on each type of machine
(parallel)4. Estimate avg Task Execution Time for each type5. Construct a configuration based on estimates6. Acquire Machines and run tasks7. At Regular monitoring intervals go back to 5
BaTS: Budget constrained task scheduler (Outline)
![Page 12: Ana-Maria Oprescu, Thilo Kielmann (Vrije University) Presented By Gal Cohen Cloud Computing Seminar CS Technion, Spring 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649dd25503460f94ac92ba/html5/thumbnails/12.jpg)
12
Estimate the runtime of running tasks using the average runtime of tasks with larger execution time
Update a moving average of Task Execution Time (in minutes) for each machine type , during the computation
Estimating avg Task Execution Time for each machine type
![Page 13: Ana-Maria Oprescu, Thilo Kielmann (Vrije University) Presented By Gal Cohen Cloud Computing Seminar CS Technion, Spring 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649dd25503460f94ac92ba/html5/thumbnails/13.jpg)
13
1. pick a sampling set of tasks of size 2. Pick initial workers from each machine type3. Run a test set on each type of machine
(parallel)4. Estimate avg Task Execution Time for each type5. Construct a configuration based on estimates6. Acquire Machines and run tasks7. At Regular monitoring intervals go back to 5
BaTS: Budget constrained task scheduler (Outline)
![Page 14: Ana-Maria Oprescu, Thilo Kielmann (Vrije University) Presented By Gal Cohen Cloud Computing Seminar CS Technion, Spring 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649dd25503460f94ac92ba/html5/thumbnails/14.jpg)
14
We need to decide on the value of , The number of machines from each type
We want to minimize:
While not exceeding the budget :
Construct a configuration based on estimates
ATU cost for machine of type i
![Page 15: Ana-Maria Oprescu, Thilo Kielmann (Vrije University) Presented By Gal Cohen Cloud Computing Seminar CS Technion, Spring 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649dd25503460f94ac92ba/html5/thumbnails/15.jpg)
15
Maximize Subject to
Using BKP (Bounded Knapsack Problem)
Construct a configuration based on estimates (cont.)
![Page 16: Ana-Maria Oprescu, Thilo Kielmann (Vrije University) Presented By Gal Cohen Cloud Computing Seminar CS Technion, Spring 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649dd25503460f94ac92ba/html5/thumbnails/16.jpg)
19
1. pick a sampling set of tasks of size 2. Pick initial workers from each machine type3. Run a test set on each type of machine
(parallel)4. Estimate avg Task Execution Time for each type5. Construct a configuration based on estimates6. Acquire Machines and run tasks7. At Regular monitoring intervals go back to 5
BaTS: Budget constrained task scheduler (Outline)
![Page 17: Ana-Maria Oprescu, Thilo Kielmann (Vrije University) Presented By Gal Cohen Cloud Computing Seminar CS Technion, Spring 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649dd25503460f94ac92ba/html5/thumbnails/17.jpg)
20
Continuous monitoring is needed:◦ The configuration was decided based on
estimates of average speeds that might not be accurate
◦ Estimated speed of a machine type () converges during the run
◦ The estimated budget and makespan neglects startup time
◦ The machines ATU start time are different. So, we can’t monitor just before ATU ends
Refining the initial configuration
![Page 18: Ana-Maria Oprescu, Thilo Kielmann (Vrije University) Presented By Gal Cohen Cloud Computing Seminar CS Technion, Spring 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649dd25503460f94ac92ba/html5/thumbnails/18.jpg)
21
Thus, BaTS continuously tries to avoid budget violations
Theoretically, It’s easy. As the execution continues, the bag is smaller and the budget is smaller.
The trouble is estimating the size of the bag at a given moment. (some machines will finish their current job before ATU ends)
Refining the initial configuration (cont.)
![Page 19: Ana-Maria Oprescu, Thilo Kielmann (Vrije University) Presented By Gal Cohen Cloud Computing Seminar CS Technion, Spring 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649dd25503460f94ac92ba/html5/thumbnails/19.jpg)
22
For every type i, we maintain a list of all machines that participated at some point the computation
For every machine we remember◦ the number of executed tasks, ◦ The total uptime
Refining the initial configuration (cont.)
![Page 20: Ana-Maria Oprescu, Thilo Kielmann (Vrije University) Presented By Gal Cohen Cloud Computing Seminar CS Technion, Spring 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649dd25503460f94ac92ba/html5/thumbnails/20.jpg)
23
Total uptime after executing, Machine speed The remaining unused time of the ATU is The expected future #tasks executed by , #Tasks to be paid for
Refining the initial configuration (cont.)
![Page 21: Ana-Maria Oprescu, Thilo Kielmann (Vrije University) Presented By Gal Cohen Cloud Computing Seminar CS Technion, Spring 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649dd25503460f94ac92ba/html5/thumbnails/21.jpg)
24
The potential number of executed tasks
= is the remaining time from the previous ATU that was not large enough for a whole task.
Refining the initial configuration (cont.)
![Page 22: Ana-Maria Oprescu, Thilo Kielmann (Vrije University) Presented By Gal Cohen Cloud Computing Seminar CS Technion, Spring 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649dd25503460f94ac92ba/html5/thumbnails/22.jpg)
25
A budget violation is prevented by checking If the condition does not hold, Using the
remaining budget and tasks, BaTS computes a new slower and cheaper configuration.
Refining the initial configuration (cont.)
![Page 23: Ana-Maria Oprescu, Thilo Kielmann (Vrije University) Presented By Gal Cohen Cloud Computing Seminar CS Technion, Spring 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649dd25503460f94ac92ba/html5/thumbnails/23.jpg)
26
1. pick a sampling set of tasks of size 2. Pick initial workers from each machine type3. Run a test set on each type of machine
(parallel)4. Estimate avg Task Execution Time for each type5. Construct a configuration based on estimates6. Acquire Machines and run tasks7. At Regular monitoring intervals go back to 5
BaTS: Budget constrained task scheduler (Outline)
![Page 24: Ana-Maria Oprescu, Thilo Kielmann (Vrije University) Presented By Gal Cohen Cloud Computing Seminar CS Technion, Spring 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649dd25503460f94ac92ba/html5/thumbnails/24.jpg)
1. Compute n = sample size2. Construct initial config C , acquire machines3. While bag has tasks do
1. Wait for any machine M to ask for work2. If M returned result of task T
1. Update stats for machine M2. Update the for M’s type
3. If sample set tasks for M’s type finished1. Update clusters stat for M’s type
4. If (monitoring interval || first clusters stats ready)1. Compute estimates2. If (constraint violation || first clusters stats ready)
1. Call BKP to compute a new config, acquire/release machines
5. Send M a random Task T’, remove T’ from the bag
BaTS Algorithm
![Page 25: Ana-Maria Oprescu, Thilo Kielmann (Vrije University) Presented By Gal Cohen Cloud Computing Seminar CS Technion, Spring 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649dd25503460f94ac92ba/html5/thumbnails/25.jpg)
28
Emulating 2 clouds with 32 identical machines each
tasks, sample size Normal Distribution of tasks length
Evaluation
![Page 26: Ana-Maria Oprescu, Thilo Kielmann (Vrije University) Presented By Gal Cohen Cloud Computing Seminar CS Technion, Spring 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649dd25503460f94ac92ba/html5/thumbnails/26.jpg)
29
“Machine speed” in each “cloud” was simulated according to 5 scenarios:
Evaluation
Profitability C2 w.r.t C1
Cloud 2
Cost Speed
1/4 4 1
3/4 4 3
1 1 1
4/3 3 4
4 1 4
![Page 27: Ana-Maria Oprescu, Thilo Kielmann (Vrije University) Presented By Gal Cohen Cloud Computing Seminar CS Technion, Spring 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649dd25503460f94ac92ba/html5/thumbnails/27.jpg)
30
In each scenario, comparing RR to BaTS RR always uses 32+32 machines BaTS initial configuration is 30+30
machines and◦ Budget B = the cost of running RR for that
scenario◦ Budget B = the cost of running only on the most
“profitable” machine type. (computed offline)
Evaluation
![Page 28: Ana-Maria Oprescu, Thilo Kielmann (Vrije University) Presented By Gal Cohen Cloud Computing Seminar CS Technion, Spring 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649dd25503460f94ac92ba/html5/thumbnails/28.jpg)
Evaluation
![Page 29: Ana-Maria Oprescu, Thilo Kielmann (Vrije University) Presented By Gal Cohen Cloud Computing Seminar CS Technion, Spring 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649dd25503460f94ac92ba/html5/thumbnails/29.jpg)
Evaluation
![Page 30: Ana-Maria Oprescu, Thilo Kielmann (Vrije University) Presented By Gal Cohen Cloud Computing Seminar CS Technion, Spring 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649dd25503460f94ac92ba/html5/thumbnails/30.jpg)
Evaluation
![Page 31: Ana-Maria Oprescu, Thilo Kielmann (Vrije University) Presented By Gal Cohen Cloud Computing Seminar CS Technion, Spring 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649dd25503460f94ac92ba/html5/thumbnails/31.jpg)
34
BaTS helps choosing the cloud resources suitable for an application
BaTS helps scheduling within budget while still performing reasonably well
Conclusions
![Page 32: Ana-Maria Oprescu, Thilo Kielmann (Vrije University) Presented By Gal Cohen Cloud Computing Seminar CS Technion, Spring 2012.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649dd25503460f94ac92ba/html5/thumbnails/32.jpg)
35
Limitations◦ The provided tests “cheat” because the number of
machines is very small◦ The “Tail phase” is not handled well (The “faster”
machines will be released before the “slow” ones)◦ Guessing a proper budget◦ Actual Bags on actual clouds◦ What about data transfer costs?◦ Storage constraints?◦ Other metric – maximize the profitability (or minimize
the budget) while not exceeding a given makespan
Conclusions