DIET MaDag v2.0 Implementation Benjamin ISNARD / Raphaël...
Transcript of DIET MaDag v2.0 Implementation Benjamin ISNARD / Raphaël...
DIET Workflow Management
DIET MaDag v2.0 Implementation
Benjamin ISNARD / Raphaël BOLZELIP – ENS Lyon
June 2008
Content
1. What is the MaDag ?
2. Workflow scheduling overview
3. MaDag Configuration
4. Usage
5. Future developments
Definition of a workflow
➲ An application with a graph structure● MaDag executes only DAG workflows
➲ Information is passed from one node to another
➲ Different types of workflows:● Node = TASK ( a task graph ) - ex: MaDag XML
● Node = SERVICE ( a process ) - ex: BPEL
● Node = ACTIVITY ( an abstract process ) - ex: Makefile
1. What is the MaDag ?
Abstract process Task Graph
Process Exec. Workflow
Define data
Define data
Define ressources Define ressources
Workflow example (MaDag XML language)
<dag> <node id="n1" path="greyscale" > <arg name="in" type="DIET_FILE" value="logo_diet.jpg" /> <out name="out" type="DIET_FILE" /> </node>
<node id="n2" path="flip" > <in name="in" type="DIET_FILE" source="n1#out" /> <out name="out" type="DIET_FILE" /> </node>
<node id="n3" path="duplicate" > <in name="in" type="DIET_FILE" source="n2#out" /> <out name="out1" type="DIET_FILE" /> <out name="out2" type="DIET_FILE" /> </node></dag>
logo_diet.jpg
greyscale
flip
duplicate
MaDag role
➲ Allow users to execute a workflow on a grid man-aged by DIET
➲ Practically, this means mapping tasks of the work-flow to SeD services on the grid
LA LA LA
MAMADAG
Client
Workflow
WF Task& SeD
1. What is the MaDag ?
Task exec.
MaDag requirements & constraints
➲ Manage dag execution
➲ Manage several client dag submissions (in parallel)
➲ Optimize dag makespan
➲ Optimize platform usage (total makespan)
➲ Avoid data transfers (no client data on the MaDag)
➲ Adapt dynamically to the platform load
➲ Use SeD estimations of computation time
1. What is the MaDag ?
Workflow & Multi-workflow scheduling
➲ Workflow scheduling● Tasks mapping to resources (heterogeneous)
● Tasks ordering (data dependencies)
➲ Multi-workflow scheduling● Workflow arrival time not known in advance
● Static schedule not applicable
● Concurrent workflow execution
● Competition for access to resources
2. Workflow scheduling overview
State of the problem
GRID
Scheduling heuristics
➲ HEFT (Heterogeneous Earliest Finish Time)● Order tasks using metric based on computation time
● Map tasks to the server that gives the best EFT
➲ Variants of HEFT for multi-workflow● Multi-HEFT
● FOFT (Fairness on Finish Time)
● Aging HEFT
● FIFO
● SRPT (Shortest Remaining Processing Time)
● LRPT (Longest Remaining Processing Time)
2. Workflow scheduling overview
implemented
implemented
Test: without/with Multi-HEFT
2. Workflow scheduling overview
GoDIET configuration to launch MaDag
3. MaDag configuration
<diet_configuration> <diet_hierarchy> <master_agent>
<ma_dag> <config server="localHost" remote_binary="maDagAgent"/> <parameters string="--heft"/> </ma_dag>
</master_agent> </diet_hierarchy> </diet_configuration>
➲ 1 config parameter to specify the multi-wf schedul-ing heuristic applied (default is basic-heft)
Client program for workflow execution
4. Usage
main(int argc, char * argv[]) {diet_wf_desc_t * profile;char * fileName; int res;fileName = argv[2]; // file containing XML workflow descrchar *path = NULL; size_t out_size; // output file info
profile = diet_wf_profile_alloc(fileName);
res = diet_wf_call(profile);if (!res) {
diet_wf_file_get(profile, “n3#out1”, &out_size, &path);/* OR */get_all_results(profile);
}
diet_wf_free(profile); /* free all memory and all node results storedon the SeDs */
}
ID port out
Enhancements to the MaDag
➲ Implementation of new heuristics for multi-work-flow scheduling
➲ Use data containers to allow execution of work-flows with multiple outputs (nb of data items pro-duced by a service is not known in advance)
➲ Use a functional workflow description language for data-intensive workflows
5. Future developments
Thank you for your attention!