PDC’06 – production status and issues Latchezar Betev TF meeting – May 04, 2006.

10
PDC’06 – production status and issues Latchezar Betev TF meeting – May 04, 2006

Transcript of PDC’06 – production status and issues Latchezar Betev TF meeting – May 04, 2006.

Page 1: PDC’06 – production status and issues Latchezar Betev TF meeting – May 04, 2006.

PDC’06 – production status and issues

Latchezar Betev

TF meeting – May 04, 2006

Page 2: PDC’06 – production status and issues Latchezar Betev TF meeting – May 04, 2006.

2PDC’06 – production status

Running status Central services – all OK, no intervention

necessary With the exception of the ProxyServer – solution is being

discussed (Andreas, Pablo, Predrag) Site services – all OK (on running sites) Running standard production jobs, old AliRoot Job duration – 8 hours Job output – CERN storage still firewalled:

prevents us from storing data at CERN Stable running since 25/04 – 9 days Currently 15 sites (2 T1s, 13 T2s)

Page 3: PDC’06 – production status and issues Latchezar Betev TF meeting – May 04, 2006.

3PDC’06 – production status

Site profiles Average 520 jobs, max 1180 jobs

Page 4: PDC’06 – production status and issues Latchezar Betev TF meeting – May 04, 2006.

4PDC’06 – production status

Site profiles (2) Job statistics

Page 5: PDC’06 – production status and issues Latchezar Betev TF meeting – May 04, 2006.

5PDC’06 – production status

Site profiles (CERN) Periodical drop in jobs accepted by LCG

Page 6: PDC’06 – production status and issues Latchezar Betev TF meeting – May 04, 2006.

6PDC’06 – production status

Site profiles (T2s) Uneven job acceptance, no method yet to

track and enforce ALICE resources share

Page 7: PDC’06 – production status and issues Latchezar Betev TF meeting – May 04, 2006.

7PDC’06 – production status

Repartition of done jobs Approximately 40/60 % repartition T2/T1. Muenster (Opteron) is

boosting the T2 share, T1s are underrepresented.

Page 8: PDC’06 – production status and issues Latchezar Betev TF meeting – May 04, 2006.

8PDC’06 – production status

Issues Storage at CERN – still unresolved Monitoring and submitting jobs at sites:

Sites typically advertise 0 free CPUsCurrent system is auto-calculating the number

of jobs to submit – penalizing ALICEHave to go back to the AliEn system of

deterministic values for number of CPUs and number of submitted job agents, irrespective of advertised resources.

Job communication (Proxy) with the central services

Page 9: PDC’06 – production status and issues Latchezar Betev TF meeting – May 04, 2006.

9PDC’06 – production status

Loss of connectivity with CS Simultaneous occurrence in sites, correlated with ERROR_S,

ERROR_IB

Page 10: PDC’06 – production status and issues Latchezar Betev TF meeting – May 04, 2006.

10PDC’06 – production status

Issues (2) Deployment of VO-boxes:

Process is steadily ongoing, however not as fast as we would like it to be

Mix of problems – some LCG, some AliEn services related.

Deployment experts are working around the clock Hopefully after the initial setup phase, further updates

will be much faster Hope that gLite 3.0 is not going to change the rules

completely

List of sites – to be discussed after this presentation