Mapping Heuristics in Heterogeneous Computing
Multiple Processor Systems (EECC756)
Dr. Shaaban
Alexandru SamachisaDmitriy Bekker
May 18, 2006
Multiple Processor Systems (EECC756) Spring 2006
OverviewIntroduction
Mapping overviewHomogenous computing vs. heterogeneous computing
Task mapping heuristics (for HC)Overview (static, dynamic, online, batch)DAG used in examplesPerformance metrics
Selected algorithmsHEFT, CPOP, MCT, Min-min, Max-min, Sufferage
Conclusion
Multiple Processor Systems (EECC756) Spring 2006
Mapping OverviewMapping (matching and scheduling) of processesonto processorsGoal
Assign process to best suited machineMaximize performance
NP-completeHeuristics exist to optimize mapping performance
Best heuristic to use depends onStatic or dynamic?Task arrival rateNumber of tasks
P0
Processors
P1
P2 P3
p0 p1
p2 p3
Parallelprogram
Mapping
Multiple Processor Systems (EECC756) Spring 2006
Homogenous vs. Heterogeneous Computing
Homogenous computingBest for fixed task grain size (most common)Support for single mode of parallelismCustom architecture
Heterogeneous computing (HC)
Variable task grain sizeSupport for multiple modes of parallelismMachines in cluster may be different
Mapping much simpler in homogenous computing
Known task sizeKnown task execution time on any node
In HC…Task size and machine performance variesMapping heuristic must find optimal (or sub-optimal) machine for each task
Multiple Processor Systems (EECC756) Spring 2006
Mapping HeuristicsStatic Mapping
Performed at compile time (off-line)Expected Time to Compute (ETC) is known for all tasks
Dynamic MappingPerformed at run-timeETC only known for currently mapping tasksTwo modes
On-line (one task mapped at a time)Batch (acquire a number of tasks into meta-task and then map)
Multiple Processor Systems (EECC756) Spring 2006
DAG used in examples
Multiple Processor Systems (EECC756) Spring 2006
Performance MetricsExpected execution time (EET)
Assumes no load on systemExpected completion time (ECT)
ECT = begin time + EETMakespan (maximum ECT)Average sharing penalty
Based on waiting time of task Good measure of quality of service
Schedule length ratio (SLR)Makespan normalized to sum of minimum computation costs on minimum critical path (for static mapping)
Multiple Processor Systems (EECC756) Spring 2006
HEFT and CPOPBoth static algorithms (know whole DAG)Prioritize based on ranks (recursive)
Upward rank
Downward rank
Multiple Processor Systems (EECC756) Spring 2006
HEFTGoal: schedule task on “best” processor, minimize finish timeGives each task in the graph a priority based on the communication costs and computation (decreasing upward rank)Iterate through the tasks based on priority and assign the process with the earliest finish time for the taskComplexity: O(ep)
e = edgesp = processors
Order n2 complexity for very dense graphs (n is number of tasks)
n1
n5n6
n4
n3
n2
n7
n8n9
n10
10
20
30
40
50
60
70
80
90
0P1 P2 P3
Multiple Processor Systems (EECC756) Spring 2006
CPOPSimilar to HEFTPrioritizes the tasks based on communication and computation cost (summation of upward and downward ranks)Determines critical path processor (minimized cumulative computation costs of tasks on path)Assign critical path task to critical path processor
If task not in critical path, assign it to processor which minimizes earliest execution finish time
Same complexity as HEFT: O(ep)
n1
n5
n6
n4n3
n2
n7
n8
n9
n10
10
20
30
40
50
60
70
80
90
0P1 P2 P3
Multiple Processor Systems (EECC756) Spring 2006
On-line modeEach task is assigned as soon as receivedUsually easy to implement since the selection is based on a simple criteriaHeuristics: Minimum completion time, minimum execution time, switching algorithm, K-percent best, etc.The on-line heuristics usually need O(m) time to assign the process
Multiple Processor Systems (EECC756) Spring 2006
MCT
978889N10
706968N9
715762N8
618039N7
464245N6
373944N5
442645N4
463432N3
274640N2
91614N1
P3P2P1n1
n5n6
n4n3
n2
n7
n8
n9
n10
10
20
30
40
50
60
70
80
90
0P1 P2 P3
Assigns each task on arrival to the process that yields the MCT for that task
Complexity: O(m)
Multiple Processor Systems (EECC756) Spring 2006
Batch modeThe batch mode heuristics assume that tasks are independentThe set of independent tasks that is being processed is called a Meta-taskAll tasks get remapped at every mapping intervalOur assumptions:
Each level of dependency is constitutes a meta-taskOnce a task is assigned, it will not be remapped at the next mapping event
Meta-task 1
Meta-task 2
Meta-task 3
Meta-task 4
Multiple Processor Systems (EECC756) Spring 2006
Min-min (meta-task 2)
183936N6
193332N5
262631N4
283432N3
274640N2
P3P2P1n1
n6
n5n4
n3
n2
n7
n8
n9
n10
10
20
30
40
50
60
70
80
90
0P1 P2 P3
283332N5
352631N4
373432N3
364640N2
P3P2P1
283932N5
373432N3
364640N2
P3P2P1
473432N3
464640N2
P3P2P1
464645N2
P3P2P1
Complexity: O(S2m) per meta-task, where m is the number of machines and S is the number of tasks in the meta-task
Multiple Processor Systems (EECC756) Spring 2006
Max-min (meta-task 2)
373936N6
383332N5
452631N4
464640N2
P3P2P1
n1
n5
n6
n4
n3
n2
n7
n8
n9
n10
10
20
30
40
50
60
70
80
90
0P1 P2 P3
183936N6
193332N5
262631N4
283432N3
274640N2
P3P2P1
373953N6
383352N5
452653N4
P3P2P1
473352N5
542653N4
P3P2P1
544153N4
P3P2P1
Complexity: O(S2m) per meta-task, where m is the number of machines and S is the number of tasks in the meta-task
Multiple Processor Systems (EECC756) Spring 2006
Sufferage (meta-task 2)n1
n5
n6
n4
n3
n2
n7
n8
n9
n10
10
20
30
40
50
60
70
80
90
0P1 P2 P3
18183936N6
13193332N5
0262631N4
4283432N3
13274640N2
SP3P2P1
4283332N5
5352631N4
2373432N3
4364640N2
SP3P2P1
4283932N5
7373932N3
4364640N2
SP3P2P1
11283944N5
9364645N2
SP3P2P1
1464645N2
SP3P2P1
Complexity: O(S2m) per meta-task, where m is the number of machines and S is the number of tasks in the meta-task
Multiple Processor Systems (EECC756) Spring 2006
ConclusionThe static mapping heuristics tend to be dropped in favor of the dynamic ones due to their advantagesThe on-line heuristics are the easiest ones to implement and usually have good results when compared to other dynamic mapping heuristicsThe batch heuristics are harder to implement, and they also can present the problem of task starvation – need to add an aging mechanism to solve this issueWhich dynamic heuristics to use when?
On-line when task arrival rate is low (tasks mapped immediately)Batch when task arrival rate is high (group tasks into meta-tasks, then map them)
Multiple Processor Systems (EECC756) Spring 2006
Conclusion cont’dThe performance results for the example shown here does not reflect the actual performances of the presented heuristicsHowever, the selected heuristics are among the best performing ones in their categoriesMapping on heterogeneous systems is a lot more complex (if done correctly) than on homogenous systems
Multiple Processor Systems (EECC756) Spring 2006
ReferencesT. D. Braun, H. J. Siegel, N. Beck, L. Boloni, M. Maheswaran, A. I. Reuther, J. P. Robertson, M. D. Theys, B. Yao, R. F. Freund, and D. Hensgen, “A Comparison Study of Static Mapping Heuristics for a Class of Metatasks on Heterogeneous Computing Systems,” 8th IEEE Heterogeneous Computing Workshop (HCW’99), Apr. 1999.M. Maheswaran, A. Shoukat, H. J. Siegel, D. Hensgen, and R. F. Freund, “Dynamic Matching and Scheduling of a Class of Independent Tasks onto Heterogeneous Computing Systems,” 8th IEEE Heterogeneous Computing Workshop (HCW’99), Apr. 1999.M. Shaaban, EECC756 Lecture Slides, Spring 2006.H. Topcuoglu, S. Hariri, M. Wu, “Performance-effective and Low-complexity Task Scheduling for Heterogeneous Computing,” IEEE Transaction on Parallel and Distributed Systems, vol. 13, no. 3, pp. 260 – 274, March 2002.
Top Related