20 October 2006 Workflow Optimization in Distributed Environments
Dynamic Workflow Management Using Performance Data
David W. Walker, Yan Huang ,Omer F. Rana, and Lican Huang
Cardiff School of Computer Science
20 October 2006 Workflow Optimization in Distributed Environments
Outline of Talk
• Background and introduction.
• The WOSE architecture for dynamic Web services.
• Performance experiments and results.
• Summary and conclusions.
20 October 2006 Workflow Optimization in Distributed Environments
The WOSE Project
• The Workflow Optimisation Services for e-Science Applications (WOSE).
• Funded by EPSRC Core e-Science Programme.
• Collaboration between:– Cardiff University– Imperial College (Prof John Darlington)– Daresbury Lab (Drs Martyn Guest and Robert Allan)
20 October 2006 Workflow Optimization in Distributed Environments
Workflow Optimisation
• Types of workflow optimisation– Through service selection– Through workflow re-ordering– Through exploitation of parallelism
• When is optimisation performed?– At design time (early binding)– Upon submission (intermediate binding)– At runtime (late binding)
20 October 2006 Workflow Optimization in Distributed Environments
Service Binding Models
• Late binding of abstract service to concrete service instance means:– We use up-to-date information to decide
which service to use when there are. multiple semantically equivalent services
– We are less likely to try to use a service that is unavailable.
20 October 2006 Workflow Optimization in Distributed Environments
Late Binding Case
• Search registry for all services that are consistent with abstract service description.
• Select optimal service based on current information, e.g, host load, etc.
• Execute this service. If it is not currently available then try the next best service.
• Doesn’t take into account time to transfer inputs to the service.
• In early and intermediate binding cases we can optimise overall workflow.
20 October 2006 Workflow Optimization in Distributed Environments
WOSE Architecture
Work at Cardiff has focused on implementing a late binding model for dynamic service discovery, based on a generic service proxy, and service discovery and optimisation services.
History database
Proxy
Configuration script
Workflow script
User
Converter ActiveBPEL workflow engine
Web service instance
Discovery Service
Optimization Service
Registry services (such as UDDI)
Performance Monitor Service
20 October 2006 Workflow Optimization in Distributed Environments
Service Discovery Issues
• Discovery of equivalent services could be based on:– Service name. Applicable when all service
providers agree on the naming of services.– Service metadata.– Service ontology.
• So far we have used the service name.
20 October 2006 Workflow Optimization in Distributed Environments
Performance-Based Service Selection
• In general, “performance” could refer to:– Service response time.– The availability of the service.– The accuracy of the results returned by the
service.– The security of the service.
• In our work we have used service response time as the basis for service selection.
• Our approach can be readily adapted for other performance metrics.
20 October 2006 Workflow Optimization in Distributed Environments
Estimating Service Response Time
• Two methods for estimating the expected service response time:
1. Based on current performance metrics from the service hosts, e.g., load averages.
2. Based on the history of previous service invocations on the service hosts. In general, this requires a model that, for a given set of service inputs on a given service host, will return the expected service response time.
• So far we have used current (or very recent) performance metrics returned by the Ganglia monitoring system.
20 October 2006 Workflow Optimization in Distributed Environments
Estimating Service Response Time (Continued)
• Distributed job management systems such as Nimrod use the rate at which a computer completes jobs as an indicator of how “good” the computer is.
• Nimrod doesn’t distinguish between different jobs.• This approach requires a substantial long-term
record of job statistics in order to give satisfactory results.
• Same approach could be applied to dynamic invocation of Web services. This avoids need for a performance model for each Web service.
• Such an approach will sometimes make bad decisions in individual cases, but overall should be effective.
20 October 2006 Workflow Optimization in Distributed Environments
Optimisation Service
Workflow script
Workflow deploy
XSLT converter
2. Dynamic invocation through proxy
3. Service query
4. List of services
Discovery Service
Proxy Service
1. Request 2A. Direct invocation
3A. Direct result
Web serviceWorkflow
engineWOSE client
7. List of services
8. Selected service
11. Result through proxy
9. Invoke service
10. Result12. Result
Performance Service
5. Performance query
6. Performance data
WOSE can either invoke a static Web service directly (steps 2A and 3A), or a dynamic Web service (steps 2 – 11),
WOSE Sequence Diagram
20 October 2006 Workflow Optimization in Distributed Environments
Service A
Proxy service
Service B
Service B1
Service B2
Service B3
Service B4
Service B5
Dynamic Service Selection within a Workflow
Dynamic invocation is worthwhile only for sufficiently long-running services since the performance gained must offset the overhead of service discovery and selection.
Select from one of the services B1 – B5.
If the selected service is not available, WOSE will automatically try the next best one.
20 October 2006 Workflow Optimization in Distributed Environments
Performance Experiments
• Is there any relationship between the current load and service response time?
• This will depend on how variable the load is over the duration of the service execution, as well as how the OS schedules jobs.
• In general, we would expect the load-response time relationship to be stronger when the service hosts are lightly loaded.
20 October 2006 Workflow Optimization in Distributed Environments
Experiment 1
• Try to keep load constant during service execution by running N instances of a long-running computation to create a background workload.
• Then invoke Web service and measure response time, i.e., time from invoking dynamic service to receiving back the result.
• The blastall Web service was used.
20 October 2006 Workflow Optimization in Distributed Environments
Experiment 1: Results
0
1000
2000
3000
4000
5000
6000
0 2 4 6 8 10 12
Load average
Ser
vice
res
po
nse
tim
e
20 October 2006 Workflow Optimization in Distributed Environments
Experiment 1: Discussion
• Plot shows that a higher load average results in a longer service response time.
• The scatter in results for any particular value of the load average is probably due to the fact that the experiments were done on a machine used by others so we could not fully control the load.
20 October 2006 Workflow Optimization in Distributed Environments
Experiment 2
• Create a synthetic, varying background workload.
• Then invoke Web service and measure response time.
• The blastall Web service was used.
20 October 2006 Workflow Optimization in Distributed Environments
Experiment 2: Results
0
1000
2000
3000
4000
5000
6000
0 5 10 15
Load average
Serv
ice r
esp
on
se t
ime
20 October 2006 Workflow Optimization in Distributed Environments
Experiment 2: Discussion
• Both experiments show a general tendency for high load averages to result in longer service response times.
• Large amount of scatter results from the fact that the load changes while the Web service is running.
• No method can predict what the future load will be, and hence any method of estimating which service host will complete execution the soonest will give the wrong answer sometimes.
20 October 2006 Workflow Optimization in Distributed Environments
Experiment 3
• Is selection based on the current load average better than making a random selection?
• If services are hosted on heterogeneous machines we have to take into account the processing speed.
• Thus, we base service selection on the performance factor, P, defined as:
1
averageLoad
GHzinSpeedCPUP
20 October 2006 Workflow Optimization in Distributed Environments
Experiment 3 (continued)
• Run synthetic workload on one computer. Record service response time for several executions of the workflow, and compute the average.
• Run synthetic workload on N computers each hosting the service. Run the workflow and dynamically select the service host based on the performance factor. Do this several times and compute the average.
20 October 2006 Workflow Optimization in Distributed Environments
Experiment 3: Results
• The average service response time for the single machine was 4252 seconds.
• The average service response time when selecting the optimal service from three hosts was 932 seconds.
• Since all the machines used are of the same type, this indicates the dynamic selection based on the current load average does result in better performance.
20 October 2006 Workflow Optimization in Distributed Environments
Conclusions and Future Work
• Dynamic service selection based on the load and CPU speed can result in faster execution of a workflow.
• We are currently repeating the experiments using a service that performs a molecular dynamics simulation.
• In the future we will also investigate dynamic service selection based on performance history data, such as rate at which a host completes service requests.
• Would like to develop statistical model of dynamic service selection for different types of background workload.
20 October 2006 Workflow Optimization in Distributed Environments
Thank you.
Questions?
Top Related