VIP: design and implementation of the portal and execution service

23
VIP: design and implementation of the portal and execution service 1 VIP Launching Workshop Lyon, December 14th 2012 Rafael Ferreira da Silva – [email protected] Rafael FERREIRA DA SILVA CNRS, CREATIS, INSA-Lyon, Université Lyon 1, INSERM For the VIP Project Consortium:

description

Virtual Imaging Platform (VIP) Launching Workshop December 14th, 2012 - Lyon, France More information: www.rafaelsilva.com

Transcript of VIP: design and implementation of the portal and execution service

Page 1: VIP: design and implementation of the portal and execution service

VIP: design and implementation of the portal and execution service

1

VIP Launching Workshop Lyon, December 14th 2012

Rafael Ferreira da Silva – [email protected]

Rafael FERREIRA DA SILVA CNRS, CREATIS, INSA-Lyon, Université Lyon 1, INSERM

For the VIP Project Consortium:

Page 2: VIP: design and implementation of the portal and execution service

Outline

  Introduction

  VIP Architecture   Web Portal   Data Transfers   Workflow Execution

  Workflow Self-Healing

  Conclusions

2 Rafael Ferreira da Silva – [email protected] http://vip.creatis.insa-lyon.fr

Page 3: VIP: design and implementation of the portal and execution service

Platform goals

  Multi-modality medical image simulators   MRI, US, CT and PET

  Objectives   Workflow execution on EGI

  Access to storage resources

  High–level interface for non-experts

  No IT required   Software as a Service (SaaS)

  No client software instalation

  New features automatically available

  Consolidated support and troubleshooting

3 Rafael Ferreira da Silva – [email protected] http://vip.creatis.insa-lyon.fr

Page 4: VIP: design and implementation of the portal and execution service

VIP – Architecture

4 Rafael Ferreira da Silva – [email protected] http://vip.creatis.insa-lyon.fr

GASW

Object Model Repository

Simulated Data Repository

Workflow Engine Job Generation

Job Scheduler

Data Management

Page 5: VIP: design and implementation of the portal and execution service

VIP – Web Portal

5 Rafael Ferreira da Silva – [email protected] http://vip.creatis.insa-lyon.fr

  User Front-End   Openly-accessible web portal   Access point to models and simulators.   User-friendly interface which assists users in using image

simulators.   Modular code design (GWT + SmartGWT)

Page 6: VIP: design and implementation of the portal and execution service

Users/Apps Management

6 Rafael Ferreira da Silva – [email protected] http://vip.creatis.insa-lyon.fr

Users Groups Application Classes Applications

Page 7: VIP: design and implementation of the portal and execution service

VIP – GRIDA

7 Rafael Ferreira da Silva – [email protected] http://vip.creatis.insa-lyon.fr

  Grid Data Management Agent   Handles file catalog and transfer operations by pooling

  Performs data replication

Page 8: VIP: design and implementation of the portal and execution service

Data Transfers Management

8 Rafael Ferreira da Silva – [email protected] http://vip.creatis.insa-lyon.fr

User Machine VIP Server Grid Storage

User uploads file to VIP Server

GRIDA Uploads file to the grid (replication)

GRIDA Downloads file to VIP Server

User downloads the file

Page 9: VIP: design and implementation of the portal and execution service

VIP – Data Repositories

9 Rafael Ferreira da Silva – [email protected] http://vip.creatis.insa-lyon.fr

  Easily integration of third-party libraries   NeuSemStore-Provenance for simulated

data

  NeuSemStore-Simulated-Objects for the model catalog

  Encapsulation of objects as GWT serialized beans

More details on the presentation of B. Gibaud

Databases GWT Server GWT Client

RPC call

GWT Bean

NeuSemStore

Page 10: VIP: design and implementation of the portal and execution service

10 Rafael Ferreira da Silva – [email protected] http://vip.creatis.insa-lyon.fr

  MOTEUR workflow engine   Applications described on formal language http://modalis.i3s.unice.fr/softwares/moteur

  Generic Application Service Wrapper (GASW)   Bash scripts wrapped in grid jobs

  Self-healing of workflow execution

VIP – Workflow Engine

Page 11: VIP: design and implementation of the portal and execution service

VIP – Architecture

  Workload Management System with Pilot Jobs   Distributed Infrastructure with

Remote Agent Control (DIRAC) [CPPM-LHCb]

http://diracgrid.org

  Hosted by CC-IN2P3 French National Instance

  Data Storage and Computing Back-End   EGI infrastructure, Biomed VO http://www.egi.eu

11 Rafael Ferreira da Silva – [email protected] http://vip.creatis.insa-lyon.fr

Page 12: VIP: design and implementation of the portal and execution service

Workflow Execution

Rafael Ferreira da Silva – [email protected]

2. User launches a simulation

3. MOTEUR generates invocations

4. GASW generates grid jobs

5. Jobs are submitted to DIRAC

6. Pilot jobs are submitted to EGI

1. Input data upload

7. Pilot jobs fetch grid jobs

8. Inputs download

10. Results upload

11. Download results

9. Execution

http://vip.creatis.insa-lyon.fr 12

Page 13: VIP: design and implementation of the portal and execution service

Outline

  Introduction

  VIP Architecture   Web Portal   Data Transfers   Workflow Execution

  Workflow Self-Healing

  Conclusions

13 Rafael Ferreira da Silva – [email protected] http://vip.creatis.insa-lyon.fr

Page 14: VIP: design and implementation of the portal and execution service

Workflow Self-Healing

14 Rafael Ferreira da Silva – [email protected] http://vip.creatis.insa-lyon.fr

  Problem: costly manual operations   Rescheduling tasks, restarting services, killing misbehaving

experiments or replicating data files

  Objective: automated platform administration   Autonomous detection of operational incidents

  Perform appropriate set of actions

  Assumptions: online and non-clairvoyant   Only partial information available

  Decisions must be fast

  Production conditions, no user activity and workloads prediction

Page 15: VIP: design and implementation of the portal and execution service

General MAPE-K loop

15 Rafael Ferreira da Silva – [email protected] http://vip.creatis.insa-lyon.fr

Incident 1 degree η = 0.8

Incident 2 degree η = 0.4

Incident 3 degree η = 0.1

level 1

level2

level3

Roulette wheel selection

Incident 1

Selected

Rule Confidence (ρ) ρxη

2 1 0.8 0.32

3 1 0.2 0.02

1 1 1.0 0.80

Association rules for incident 1

Incident 2

Selected

Roulette wheel selection based on association rules

Set of Actions

x2

level 1

level2

level3

level 1

level2

level3

=ηiη jj=1

n∑

event (job completion and failures) or timeout

Monitoring Analysis

Execution Knowledge

Planning

Monitoring data

Page 16: VIP: design and implementation of the portal and execution service

Incident: Activity Blocked   An invocation is late compared to the others

  Possible causes   Longer waiting times

  Lost tasks (e.g. killed by site due to quota violation)

  Resources with poor performance

16 Rafael Ferreira da Silva – [email protected]

Invocations completion rate for a simulation Job flow for a simulation

http://vip.creatis.insa-lyon.fr

Page 17: VIP: design and implementation of the portal and execution service

Activity blocked: degree   Degree computed from all completed jobs of the activity

  Job phases: setup inputs download execution outputs upload

  Assumption: bag-of-tasks (all jobs have equal durations)

  Median-based estimation:

  Incident degree: job performance w.r.t median

17 Rafael Ferreira da Silva – [email protected]

d =Ei

Mi + Ei

∈ [0,1]

Median duration of jobs phases

Real job duration

42s

300s

20s

?

42s

300s

400s*

15s

Estimated job duration

50s

250s

400s

15s

completed

current

Mi = 715s Ei = 757s

*: max(400s, 20s) = 400s

http://vip.creatis.insa-lyon.fr

Page 18: VIP: design and implementation of the portal and execution service

Activity blocked: levels and actions

  Levels: identified from the platform logs

  Actions   Job replication

  Cancel replicas with bad performance

  Replicate only if all active replicas are running

18 Rafael Ferreira da Silva – [email protected]

Replication process for one task

Level 1 (no actions)

Level 2

action: replicate jobs

d

τ1

http://vip.creatis.insa-lyon.fr

Page 19: VIP: design and implementation of the portal and execution service

Experimental results

19 Rafael Ferreira da Silva – [email protected]

speeds up FIELD-II execution up to 4

Repetition w

1 –0.10

2 –0.15

3 –0.09

4 0.05

5 –0.26

  Goal: Self-Healing vs No-Healing   Cope with recoverable errors

  Metrics   Makespan of the activity execution

  Resource waste

For w < 0: self-healing consumed less resources

For w > 0: self-healing wasted resources €

w =(CPU + data) self −healing(CPU + data)no−healing

−1

Self-Healing process reduced resource consumption up to 26% when compared

to the No-Healing execution R. Ferreira da Silva, T. Glatard, F. Desprez, Self-healing of operational workflow incidents on distributed computing infrastructures, IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), Ottawa, Canada, 2012.

Page 20: VIP: design and implementation of the portal and execution service

VIP – Facts   321 registered users, from

38 countries

  Most used portal certificate in EGI (August 2012) https://wiki.egi.eu/wiki/EGI_robot_certificate_users

  Consumed 379 CPU years from January 2011 to August 2012 http://accounting.egi.eu

  1/10 of the total activity of the biomed international VO. One of the most active users

20 Rafael Ferreira da Silva – [email protected] http://vip.creatis.insa-lyon.fr

Page 21: VIP: design and implementation of the portal and execution service

VIP – Facts

21 Rafael Ferreira da Silva – [email protected] http://vip.creatis.insa-lyon.fr

Repartition of application executions in VIP (Nov 2011 – Oct 2012)

1155 executed simulations during the last year (~3/day)

Applications

Repartition of portal users on EGI (August 2012) (source: https://wiki.egi.eu/wiki/EGI_robot_certificate_users)

Users

Page 22: VIP: design and implementation of the portal and execution service

Concluding remarks   VIP is an openly-accessible web portal for multi-modality

medical image simulators   MRI, US, CT and PET and other tools   Workflow execution on EGI   Access to storage resources   High–level interface for non-experts

  No IT required (Software as a Service)

  Facts   321 registered users from 38 countries   Consumed about 400 CPU years / year

  Limits and perspectives   Fair resource allocation among workflows   User support   Heavy data transfers

22 Rafael Ferreira da Silva – [email protected] http://vip.creatis.insa-lyon.fr

Page 23: VIP: design and implementation of the portal and execution service

VIP: design and implementation of the portal and execution services

VIP Launching Workshop Lyon, December 14th 2012

Rafael Ferreira da Silva – [email protected]

Rafael FERREIRA DA SILVA CNRS, CREATIS, INSA-Lyon, Université Lyon 1, INSERM

For the VIP Project Consortium:

Thank you for your attention. Questions?

http://vip.creatis.insa-lyon.fr!