Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the...
-
Upload
amos-harris -
Category
Documents
-
view
213 -
download
0
Transcript of Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the...
![Page 1: Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group S&C week Jun 2, 2014.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfdb1a28abf838cb0a8e/html5/thumbnails/1.jpg)
efi.uchicago.educi.uchicago.edu
FAX status reportIlija Vukoticon behalf of the atlas-adc-federated-xrootd working group
S&C weekJun 2, 2014
![Page 2: Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group S&C week Jun 2, 2014.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfdb1a28abf838cb0a8e/html5/thumbnails/2.jpg)
efi.uchicago.educi.uchicago.edu
2
Content
• Status– Coverage– Traffic– Failover– Overflow
• Changes in localSetupFAX • Monitoring changes
– Changes in GLED collector, dashboard– Failover & overflow monitoring– FaxStatusBoard
• Meetings – Tutorial – 23 -27 June – dedicated to instructing on xAOD and the
new analysis model – ROOTIO – 25-27 June
![Page 3: Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group S&C week Jun 2, 2014.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfdb1a28abf838cb0a8e/html5/thumbnails/3.jpg)
efi.uchicago.educi.uchicago.edu
3
FAX topology
Topology change in North America• added East and
West• will serve CA cloud• all hosted at BNL
Will need NL cloud redirector
![Page 4: Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group S&C week Jun 2, 2014.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfdb1a28abf838cb0a8e/html5/thumbnails/4.jpg)
efi.uchicago.educi.uchicago.edu
4
FAX in Europe
To come:SaraNikhefIL cloud - IL-TAU, Technion, Weizmann
![Page 5: Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group S&C week Jun 2, 2014.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfdb1a28abf838cb0a8e/html5/thumbnails/5.jpg)
efi.uchicago.educi.uchicago.edu
5
FAX in North America To come:TRIUMF (June?)McGill (end of June)SCINET (end of June)Victoria (~August)
![Page 6: Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group S&C week Jun 2, 2014.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfdb1a28abf838cb0a8e/html5/thumbnails/6.jpg)
efi.uchicago.educi.uchicago.edu
6
FAX in Asia
To come:Beijing (~two weeks)TokyoAustralia (few weeks)
![Page 7: Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group S&C week Jun 2, 2014.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfdb1a28abf838cb0a8e/html5/thumbnails/7.jpg)
efi.uchicago.educi.uchicago.edu
7
Status
• Most sites running stably• Glitches do happen but
are fixed usually in few hours
• SSB issues solved• New sites added
– IFAE– PIC– IN2P3-LPC
• In need of restart:– UNIBE-LHEP
![Page 8: Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group S&C week Jun 2, 2014.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfdb1a28abf838cb0a8e/html5/thumbnails/8.jpg)
efi.uchicago.educi.uchicago.edu
8
Coverage
• Now auto-updated Twiki page– https://twiki.cern.ch/twiki/bin/view/AtlasComputing/FaxCoverage
• Coverage is good (~85%), but we should aim for >95% !• Info fetched from
http://dashb-atlas-job-prototype.cern.ch/dashboard/request.py/dailysummary
![Page 9: Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group S&C week Jun 2, 2014.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfdb1a28abf838cb0a8e/html5/thumbnails/9.jpg)
efi.uchicago.educi.uchicago.edu
9
Traffic• Slowly increasing• Max peak output record broken• Still small to what we expect will come
![Page 10: Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group S&C week Jun 2, 2014.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfdb1a28abf838cb0a8e/html5/thumbnails/10.jpg)
efi.uchicago.educi.uchicago.edu
10
Failover • Running stably
![Page 11: Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group S&C week Jun 2, 2014.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfdb1a28abf838cb0a8e/html5/thumbnails/11.jpg)
efi.uchicago.educi.uchicago.edu
11
Overflow status
• All the chain ready
• I have set all the US queues to allow 3 Gbps to be both delivered to and delivered from sites.
• Test tasks submitted to sites that don’t have the data so that transfertype=FAX is invoked.
• This does not test the JEDI decision making (the one based on cost matrix)
• Waiting for actual jobs to check the full chain– Users not yet instructed to use JEDI client
– Waiting for JEDI monitor
![Page 12: Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group S&C week Jun 2, 2014.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfdb1a28abf838cb0a8e/html5/thumbnails/12.jpg)
efi.uchicago.educi.uchicago.edu
12
Overflow tests
• Test is the hardest IO test – 100% events, all branches read, standard TTC/no AsyncPrefetch.
• Site specific FDR datasets (10 DSs, 744 files, 2.7TB) • All the source/destination combinations of US sites• All of it submitted in 3 batches, but not all started
simultaneously. Affected by priority degradation.• Three input files per job. • If site is copy2scratch pilot does xrdcp to scratch, if
not jobs access files remotely.
![Page 13: Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group S&C week Jun 2, 2014.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfdb1a28abf838cb0a8e/html5/thumbnails/13.jpg)
efi.uchicago.educi.uchicago.edu
13
Overflow tests
• Error rate– Total 9188 jobs– Finished 9052– Failed 117 – 1.3%
o 24 – OU reading OU (no FAX involved)o 66 – reading from WT2 (files are corrupted)o 27 – 0.29 % -actual FAX errors where SWT2 did not
deliver the files. Will be investigated.o The rest are “Payload run out of memory”
![Page 14: Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group S&C week Jun 2, 2014.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfdb1a28abf838cb0a8e/html5/thumbnails/14.jpg)
efi.uchicago.educi.uchicago.edu
14
Overflow tests
• Jobs reading from local scratch - for comparison
Direct access site Reading locallyPer job:• 7.2 MB/s• 67% CPU eff• 71 ev/s
Scout jobsScout jobs
Copy2scratch site
Per job:• 11.0 MB/s• 97% CPU eff• 109 ev/s
![Page 15: Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group S&C week Jun 2, 2014.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfdb1a28abf838cb0a8e/html5/thumbnails/15.jpg)
efi.uchicago.educi.uchicago.edu
15
Overflow tests
• Jobs reading remote sources
Direct access site Reading remotelyPer job:• 4.2 MB/s• 43% CPU eff• 42 ev/s
Direct access siteReading remotelyPer job:• 3.5 MB/s• 29% CPU eff• 34 ev/s
No saturationPossibly a start of saturation
![Page 16: Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group S&C week Jun 2, 2014.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfdb1a28abf838cb0a8e/html5/thumbnails/16.jpg)
efi.uchicago.educi.uchicago.edu
16
Overflow tests
• MWT2 reading from OU and SWT2 simultaneously• In aggregate reached 850 MB/s – limit for MWT2 at that
time.
![Page 17: Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group S&C week Jun 2, 2014.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfdb1a28abf838cb0a8e/html5/thumbnails/17.jpg)
efi.uchicago.educi.uchicago.edu
17
Cost matrix
destination
sour
ce
http://1-dot-waniotest.appspot.com/
![Page 18: Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group S&C week Jun 2, 2014.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfdb1a28abf838cb0a8e/html5/thumbnails/18.jpg)
efi.uchicago.educi.uchicago.edu
18
localSetupFAX
• Added command fax-ls – Made by Shuwei YE.– Will finally replace isDSinFAX– He will move all the other tools to Rucio
• Change in fax-get-best-redirector– Each time does three queries
o SSB to get endpoints and their statuso AGIS to get sites, hosting the endpointso AGIS to get site coordinates
– Each call returns hundreds of kb’s – Can’t scale to large number of requests– Solution:
o Made a GoogleAppEngine servlets that each 30 min take info from SSB and AGIS and deliver it from memory
o Information slimmed to what is actually needed: ~several kbo Now requests served in few tens of ms.o “Infinitely” scalable
![Page 19: Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group S&C week Jun 2, 2014.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfdb1a28abf838cb0a8e/html5/thumbnails/19.jpg)
efi.uchicago.educi.uchicago.edu
19
Monitoring – collector, dashboard• Problem: support of multi-VO sites• Meeting: Alex, Matevz, me• Issues:
– Site name: o ATLAS reports it o CMS not or badly, will fix it
– Requesting user’s VOo ATLAS does ito CMS not strict about it. US-CMS uses GUMS. Will fix it.
• Proposal:– During the summer Matevz develops XrdMon that can handle multi-VO
messages– Sends messages from multi-VO sites to a special “mixed” AMQ. Dashboard
splits traffic according to user’s VO.Details:https://docs.google.com/document/d/1Syx3_vkwCfc5lj2lQzbUUrKT0Je238w6lcwVL7IY1GY/edit#
![Page 20: Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group S&C week Jun 2, 2014.](https://reader035.fdocuments.in/reader035/viewer/2022062805/5697bfdb1a28abf838cb0a8e/html5/thumbnails/20.jpg)
efi.uchicago.educi.uchicago.edu
20
Monitoring
• Failover– Not flexible enough
• Overflow– No monitoring yet– Need to compare jobs grouped by transfer type