Twister4Azure - Iterative MapReduce for Azure Cloud
-
Upload
thilina-gunarathne -
Category
Technology
-
view
3.279 -
download
4
description
Transcript of Twister4Azure - Iterative MapReduce for Azure Cloud
![Page 1: Twister4Azure - Iterative MapReduce for Azure Cloud](https://reader036.fdocuments.in/reader036/viewer/2022081414/5480df2eb4af9f142f8b45f4/html5/thumbnails/1.jpg)
Twister4Azure : Iterative MapReduce for Azure Cloud
CCA 2011April 12 – 13, 2011
Thilina Gunarathne, Judy Qiu, Geoffrey Fox{tgunarat, xqiu,gcf}@indiana.edu
![Page 2: Twister4Azure - Iterative MapReduce for Azure Cloud](https://reader036.fdocuments.in/reader036/viewer/2022081414/5480df2eb4af9f142f8b45f4/html5/thumbnails/2.jpg)
MapReduceRoles for Azure
Familiar MapReduce programming model Built using highly-available and scalable Azure cloud
services Co-exist with eventual consistency & high latency of
cloud services Decentralized control
No single point of failure. Supports dynamically scaling up and down of the
compute resources. MapReduce fault tolerance
![Page 3: Twister4Azure - Iterative MapReduce for Azure Cloud](https://reader036.fdocuments.in/reader036/viewer/2022081414/5480df2eb4af9f142f8b45f4/html5/thumbnails/3.jpg)
MapReduceRoles for Azure
![Page 4: Twister4Azure - Iterative MapReduce for Azure Cloud](https://reader036.fdocuments.in/reader036/viewer/2022081414/5480df2eb4af9f142f8b45f4/html5/thumbnails/4.jpg)
Twister for Azure
Reduce
Reduce
MergeAdd
Iteration? No
Map Combine
Map Combine
Map Combine
Data Cache
Yes
Hybrid scheduling of the new iteration
Job Start
Job Finish
Merge Step In-Memory Caching of static data Cache aware hybrid scheduling using Queues as well
as using a bulletin board (special table)
![Page 5: Twister4Azure - Iterative MapReduce for Azure Cloud](https://reader036.fdocuments.in/reader036/viewer/2022081414/5480df2eb4af9f142f8b45f4/html5/thumbnails/5.jpg)
Twister for Azure
Map 1
Map 2
Map n
Map Workers
Red 1
Red 2
Red n
Reduce Workers
In Memory Data Cache
Task Monitoring
Role Monitoring
Worker Role
MapID ……. Status
Map Task Table
MapID ……. Status
Job Bulleting Board
Scheduling Queue
![Page 6: Twister4Azure - Iterative MapReduce for Azure Cloud](https://reader036.fdocuments.in/reader036/viewer/2022081414/5480df2eb4af9f142f8b45f4/html5/thumbnails/6.jpg)
Performance – Kmeans Clustering
0%
20%
40%
60%
80%
100%
120%
140%
160%
0
200
400
600
800
1000
1200
1400
1600
8 X 16M 16 X 32M 32 X 64M 48 X 96M 64 X 128M
Rela
tive
Par
alle
l Effi
cien
cy
Tim
e (s
)
Num Instances X Num Data Points
Relative ParallelEfficiencyTime(s)
Performance with/without data caching. Speedup gained using data cache
Scaling speedup Increasing number of iterations
![Page 7: Twister4Azure - Iterative MapReduce for Azure Cloud](https://reader036.fdocuments.in/reader036/viewer/2022081414/5480df2eb4af9f142f8b45f4/html5/thumbnails/7.jpg)
Performance Comparisons
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
128 228 328 428 528 628 728
Par
alle
l Effi
cien
cy
Number of Query Files
Twister4Azure
Hadoop-Blast
DryadLINQ-Blast
0
500
1000
1500
2000
2500
3000
Adju
sted
Time (
s)
Num. of Cores * Num. of Blocks
Twister4Azure
Amazon EMR
Apache Hadoop
50%55%60%65%70%75%80%85%90%95%
100%
Para
llel E
ffici
ency
Num. of Cores * Num. of Files
Twister4Azure
Amazon EMR
Apache Hadoop
BLAST Sequence Search
Cap3 Sequence Assembly
Smith Watermann Sequence Alignment
![Page 8: Twister4Azure - Iterative MapReduce for Azure Cloud](https://reader036.fdocuments.in/reader036/viewer/2022081414/5480df2eb4af9f142f8b45f4/html5/thumbnails/8.jpg)
Conclusion
Enables users to easily and efficiently perform large scale iterative data analysis and scientific computations on Azure cloud.
Utilizes a novel hybrid scheduling mechanism to provide the caching of static data across iterations.
Utilize cloud infrastructure services effectively to deliver robust and efficient applications.
http://salsahpc.indiana.edu/twister4azure