ALEAE : Handling Uncertainties in Large-Scale Distributed Systems
Emmanuel Jeannot
LORIA - INRIA - CNRS
ALEAE Kick-off
April 1st 2009
Managing uncertainies E. Jeannot 2/16
Introduction
What is a grid?
An infrastructure : Distributed Heterogeneous
But also Dynamic Shared
Lot of uncertainties
Managing uncertainies E. Jeannot 3/16
Uncertainties
Uncertainties: • unpredictable behavior• Behavior not as expected
Where does it come from? Infrastructure (hardware) Application (software) Users
Managing uncertainies E. Jeannot 4/16
Uncertainty at the infrastructure level
The hardware that compose a grid can:• Fail• Be volatile (be removed or added)• Have performance degradation (due to a shared usage)
Managing uncertainies E. Jeannot 5/16
Uncertainty at the application level
It is often assumed that: • One know the duration of the composing part of an application• Its resource usage is known• It does not fail.
However, this is not always the case
Managing uncertainies E. Jeannot 6/16
Uncertainties due to the users
Users:• Submit jobs/requests randomly• May behave with some malignity (voluntarily or not)
DOS attack Desktop grid : give wrong answer
Managing uncertainies E. Jeannot 7/16
Rationale
As resource management algorithms
cope with heterogeneity or distribution,
they also must cope with uncertainty
Managing uncertainies E. Jeannot 8/16
Ways to cope with uncertainty
Proactive methods (static)• Redundancy• Duplication
Reactive methods (dynamic)• Check-point restart• migration
Mixed (provide a static solution and adapt it dynamically)
Managing uncertainies E. Jeannot 9/16
Functional Goals
Different kinds of uncertainties lead to different desired behavior
Reliability, fault-tolerance: • Hardware failure• Software failure
Robustness:• Hardware perf. degradation• Software unpredictability
Correctness:• Bad usage
Etc…
Managing uncertainies E. Jeannot 10/16
Multi-criteria approach
The old good metrics are still valid:• Makespan• Load-balance• Response time• Lateness• Etc.
Most of the time these metrics are contradictory with the other one.
Need of a multi-criteria approach (ex: makespan/reliability).
Open issue (1)
Gather traces:
- What is the behavior of users/programs/infrastructure?
- Ease the extraction of useful information
- Ensure generality
Managing uncertainies E. Jeannot 11/16
Managing uncertainies E. Jeannot 12/16
Open issues (2)
Model the uncertainty• Trace the behavior• Analyze • Provide modeling
Managing uncertainies E. Jeannot 13/16
Carefully define metrics
Mapping a goal into a metric is not trivial:
Ex: robustness• Intuitive notion• Many metrics (one per paper)• Question: relation between these metrics.
Managing uncertainies E. Jeannot 14/16
Open issues (4)
Provide resource management (scheduling) algorithms• Mono-criteria/Multi-criteria• Static/dynamic/mixed• Works well in the worst case/on the avarage• Etc.
Managing uncertainies E. Jeannot 15/16
Open issues (5)
Static vs. Dynamic?
Each approach: advantages and drawback.
• Dynamic (ex. check-point-restart): time costly, but handle almost every cases • Static (ex. duplication): resource costly, can provide some guarantee.
What is the best approach depends on the problem.• Is the mixed approach always possible/profitable?
Managing uncertainies E. Jeannot 16/16
Open issues : real scale experimentation
Provide detection mechanisms• Failure• Malignity• Resource usage• Correctness• Etc.
Program and test solutions:• Real-scale (grid’5000, DAS-3)• Simulation• Emulation?
Validation of the models.
Today
Kick-off :• ALEAE : a two year INRIA funded project (20 k€/year)• Presentation on each item• Technical presentation on sub-item• Work plan :
Other/next meetings Visit/exchange Mission Post-doc Synergies between teams
• Important : I am moving to INRIA Bordeaux.
Managing uncertainies E. Jeannot 18/18
Managing uncertainies E. Jeannot 19/16
Conclusion
Grid environments are full of uncertainties
These uncertainties come from different factors
Handling them is difficult (especially with the traditional criteria)
What is the best way to tackle this problem (dynamic/static/mixed), is of crucial interest.
The goal of ALEAE is to tackle such issues.
Top Related