INFSO-RI-508833 Enabling Grids for E-sciencE Scenarios for Integrating Data and Job Scheduling...

11
INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org Scenarios for Integrating Data and Job Scheduling Peter Kunszt On behalf of the JRA1-DM Cluster, CERN JRA1 All Hands Meeting Brno, Czech Republic June 20-22, 2005

Transcript of INFSO-RI-508833 Enabling Grids for E-sciencE Scenarios for Integrating Data and Job Scheduling...

Page 1: INFSO-RI-508833 Enabling Grids for E-sciencE  Scenarios for Integrating Data and Job Scheduling Peter Kunszt On behalf of the JRA1-DM Cluster,

INFSO-RI-508833

Enabling Grids for E-sciencE

www.eu-egee.org

Scenarios for Integrating Data and Job SchedulingPeter KunsztOn behalf of the JRA1-DM Cluster, CERN

JRA1 All Hands MeetingBrno, Czech RepublicJune 20-22, 2005

Page 2: INFSO-RI-508833 Enabling Grids for E-sciencE  Scenarios for Integrating Data and Job Scheduling Peter Kunszt On behalf of the JRA1-DM Cluster,

JRA1 All Hands Meeting - June 20-22 2

Enabling Grids for E-sciencE

INFSO-RI-508833

How to spend these 45 minutes

• Short presentation – background info on existing services and capabilities– Motivation for DM/WMS integration– some options on how to actually do it

• Rest of the time may be spent in discussing the way forward, deciding our focus for the near future/mid-term

Page 3: INFSO-RI-508833 Enabling Grids for E-sciencE  Scenarios for Integrating Data and Job Scheduling Peter Kunszt On behalf of the JRA1-DM Cluster,

JRA1 All Hands Meeting - June 20-22 3

Enabling Grids for E-sciencE

INFSO-RI-508833

Data Movement and Scheduling

gLite DM Cluster:• Dedicated high-level component: Data Scheduler

– Not in the plan for Release 2!!!– Some FTS bugs can only be addressed by this component.– Risk: once people understand what it’s about, it may creep back into

the plan (wouldn’t be the first time)

• Low-level site-based components: FPS and FTS.

Others in EGEE:• Condor Stork• Globus Reliable File Transfer

Externals:• P2P systems (bittorrent and friends)

Page 4: INFSO-RI-508833 Enabling Grids for E-sciencE  Scenarios for Integrating Data and Job Scheduling Peter Kunszt On behalf of the JRA1-DM Cluster,

JRA1 All Hands Meeting - June 20-22 4

Enabling Grids for E-sciencE

INFSO-RI-508833

Why do we have our own at all?

• None of the available services have fulfilled our requirements, especially for– Security– Channel management– Extensibility

Page 5: INFSO-RI-508833 Enabling Grids for E-sciencE  Scenarios for Integrating Data and Job Scheduling Peter Kunszt On behalf of the JRA1-DM Cluster,

JRA1 All Hands Meeting - June 20-22 5

Enabling Grids for E-sciencE

INFSO-RI-508833

FTS Capabilities

• Transfer a (list of) files given by their SURL or TURL (srm or gsiftp protocol) between two sites

• Manage the whole transfer as a single job• Run through a set of states• Apply site policies (extension)

Page 6: INFSO-RI-508833 Enabling Grids for E-sciencE  Scenarios for Integrating Data and Job Scheduling Peter Kunszt On behalf of the JRA1-DM Cluster,

JRA1 All Hands Meeting - June 20-22 6

Enabling Grids for E-sciencE

INFSO-RI-508833

FPS Capabilities

• The FPS is a FTS with some extra configuration• Accept jobs also with LFNs and GUIDs• Resolve LFNs and GUIDs through a catalog• Register Replica at the end of the successful transfer

• Biggest usage difference between FTS/FPS today (also a matter of configuration):– FTS will transfer the files using the User Proxy– FPS will transfer the files using the Service Proxy (should be

dual proxy!!) since the access is enforced through the catalogs

Page 7: INFSO-RI-508833 Enabling Grids for E-sciencE  Scenarios for Integrating Data and Job Scheduling Peter Kunszt On behalf of the JRA1-DM Cluster,

JRA1 All Hands Meeting - June 20-22 7

Enabling Grids for E-sciencE

INFSO-RI-508833

Stork Capabilities

• Performs managed transfers– No multi-file transfer yet, but promised

• Job described as a ClassAd• Integrated with DAGs• Protocol Translation possible (in-memory modules, each transfer

is a running process at the Stork node)• Security model needs more work (promised)• No channel management, poor extensibility (cataloguing) –

ongoing discussions

Usage:• Beneath FTS : using Stork instead of srm/gsi copy• Over FTS: Write a plugin to Stork to submit into FPS to do the

transfer. This would take care of a lot of issues, except for security.

Page 8: INFSO-RI-508833 Enabling Grids for E-sciencE  Scenarios for Integrating Data and Job Scheduling Peter Kunszt On behalf of the JRA1-DM Cluster,

JRA1 All Hands Meeting - June 20-22 8

Enabling Grids for E-sciencE

INFSO-RI-508833

Globus RFT Capabilities

• Provide managed transfer between two servers• The job is managed, re-tries are possible and server deaths are

taken care of• Can support splicing, multi target transfers• No channel concept – would need to be set up as such implicitly

(i.e. servers per Channel)• Callbacks for security hooks exist• Name resolution hooks may exist, need to look at in detail

Usage of RFT• Beneath FTS – trivial, just submit to FTS instead calling globus-

url-copy• Under FTS – not possible

Page 9: INFSO-RI-508833 Enabling Grids for E-sciencE  Scenarios for Integrating Data and Job Scheduling Peter Kunszt On behalf of the JRA1-DM Cluster,

JRA1 All Hands Meeting - June 20-22 9

Enabling Grids for E-sciencE

INFSO-RI-508833

What do we want?

• Seamless integration of data and CPU jobs from the user point of view

• Re-use of the infrastructure where possible– Especially common security infrastructure for proxy mgmt

• JDL for transfer jobs• Policies for data placement through WMS• Mixed DAGs – a transfer may be a DAG node• Transfer Job optimization (policy-based?)

Page 10: INFSO-RI-508833 Enabling Grids for E-sciencE  Scenarios for Integrating Data and Job Scheduling Peter Kunszt On behalf of the JRA1-DM Cluster,

JRA1 All Hands Meeting - June 20-22 10

Enabling Grids for E-sciencE

INFSO-RI-508833

How? Options

• We resurrect the Data Scheduler– Implements JDL interface– Extensible– WMS just hands over transfer jobs to proper DS– DAG integration in dagman a question mark – can you do callouts?

• WMS does it all– Manages transfer jobs it gets through the JDL/DAG and translates it to

proper FTS/FPS.submit() calls– Either WMS Monitors FTS/FPS or FTS puts state into L&B– Or other notification mechanisms

• Stork/Condor does it all– WMS just submits the whole DAG as is to Condor– Stork works its magic and might call the FPS/FTS to actually perform

the transfers

Page 11: INFSO-RI-508833 Enabling Grids for E-sciencE  Scenarios for Integrating Data and Job Scheduling Peter Kunszt On behalf of the JRA1-DM Cluster,

JRA1 All Hands Meeting - June 20-22 11

Enabling Grids for E-sciencE

INFSO-RI-508833

Discussion

• More options?• Every option has pros/cons• We should decide on which path to go down today..