WEB API: WHY THEY MATTER ECOL 453/553 2012 Nirav Merchant [email protected].
Having it both ways: Bring Data to Computation ... · Having it both ways: Bring Data to...
Transcript of Having it both ways: Bring Data to Computation ... · Having it both ways: Bring Data to...
![Page 1: Having it both ways: Bring Data to Computation ... · Having it both ways: Bring Data to Computation & Computation to Data with iRODS Nirav&Merchant& The$University$of$Arizona nirav@email.arizona.edu&](https://reader035.fdocuments.in/reader035/viewer/2022081512/60538af87aa18a06ab35c284/html5/thumbnails/1.jpg)
Having it both ways: Bring Data to Computation & Computation to Data with
iRODS
Nirav Merchant The University of Arizona [email protected]
h5p://www.cyverse.org Twi5er: @CyVerseOrg
![Page 2: Having it both ways: Bring Data to Computation ... · Having it both ways: Bring Data to Computation & Computation to Data with iRODS Nirav&Merchant& The$University$of$Arizona nirav@email.arizona.edu&](https://reader035.fdocuments.in/reader035/viewer/2022081512/60538af87aa18a06ab35c284/html5/thumbnails/2.jpg)
Topic Coverage: • Mo8va8on/Use case
• Constraints, challenges • Technology op8ons
• Our solu8on, early results • Next steps
![Page 3: Having it both ways: Bring Data to Computation ... · Having it both ways: Bring Data to Computation & Computation to Data with iRODS Nirav&Merchant& The$University$of$Arizona nirav@email.arizona.edu&](https://reader035.fdocuments.in/reader035/viewer/2022081512/60538af87aa18a06ab35c284/html5/thumbnails/3.jpg)
CyVerse: Pla,orm Philosophy • Strive to provide the CI Lego blocks • Danish 'leg godt' -‐ 'play well’ • Also translates as 'I put together' in La8n • If desired func8onality is not available, the community can craJ their own by using and extending CyVerse CI components (like lego blocks)
• Through these extensible and customized p laPorms c reate a ecosys tem of interoperable tools that benefit the broad community (and not few lab groups)
• Provide the tools to allow community to manage their digital assets (cloud, HPC etc.)
• Improve Computa8onal Produc8vity
6/3/16 3
![Page 4: Having it both ways: Bring Data to Computation ... · Having it both ways: Bring Data to Computation & Computation to Data with iRODS Nirav&Merchant& The$University$of$Arizona nirav@email.arizona.edu&](https://reader035.fdocuments.in/reader035/viewer/2022081512/60538af87aa18a06ab35c284/html5/thumbnails/4.jpg)
Ready to use PlaGorms
FoundaIonal CapabiliIes
Established CI Components
Extensible Services
h"p://www.cyverse.org
The CyVerse Technology Stack A Blueprint for Cyberinfrastructure Design
Ease of U
se
Flexibility
![Page 5: Having it both ways: Bring Data to Computation ... · Having it both ways: Bring Data to Computation & Computation to Data with iRODS Nirav&Merchant& The$University$of$Arizona nirav@email.arizona.edu&](https://reader035.fdocuments.in/reader035/viewer/2022081512/60538af87aa18a06ab35c284/html5/thumbnails/5.jpg)
How is it being used ? • User build their own systems (powered by CyVerse components) but managed by them
• Share analysis methods, algorithms, data (reproducibility) • Consume specific components (a la carte, Data Store, Atmosphere)
• Directly use applica8ons (DE) • Custom design appliances (Atmosphere) • Publish their findings (PNAS, Nature) • Advocate use and build “your” community • Create new learning material and courses, special topics workshops
6/3/16 5 Licensed under CC By 2015 h_p://
iplantc.org
![Page 6: Having it both ways: Bring Data to Computation ... · Having it both ways: Bring Data to Computation & Computation to Data with iRODS Nirav&Merchant& The$University$of$Arizona nirav@email.arizona.edu&](https://reader035.fdocuments.in/reader035/viewer/2022081512/60538af87aa18a06ab35c284/html5/thumbnails/6.jpg)
Cohesive Pla,orm for Data lifecycle
6/3/16 6
![Page 7: Having it both ways: Bring Data to Computation ... · Having it both ways: Bring Data to Computation & Computation to Data with iRODS Nirav&Merchant& The$University$of$Arizona nirav@email.arizona.edu&](https://reader035.fdocuments.in/reader035/viewer/2022081512/60538af87aa18a06ab35c284/html5/thumbnails/7.jpg)
The eternal ques:on…..
6/8/16 7
Data to Compute or Compute to Data
![Page 8: Having it both ways: Bring Data to Computation ... · Having it both ways: Bring Data to Computation & Computation to Data with iRODS Nirav&Merchant& The$University$of$Arizona nirav@email.arizona.edu&](https://reader035.fdocuments.in/reader035/viewer/2022081512/60538af87aa18a06ab35c284/html5/thumbnails/8.jpg)
Toolchest
• iRODS • Condor • Docker • Rethinking the role of a “resource server”
6/8/16 8 Licensed under CC By 2015 h_p://iplantc.org
![Page 9: Having it both ways: Bring Data to Computation ... · Having it both ways: Bring Data to Computation & Computation to Data with iRODS Nirav&Merchant& The$University$of$Arizona nirav@email.arizona.edu&](https://reader035.fdocuments.in/reader035/viewer/2022081512/60538af87aa18a06ab35c284/html5/thumbnails/9.jpg)
Mo8va8on: Data to Compute • Most of our use cases operated on ~100-‐200 GB data at a 8me
• Many of the analysis steps were few cores (~12) and reasonable RAM ( ~128 GB)
• Tasks were “naturally data parallel” • Easier to provision, share, scale and maintain “shared nothing” (or not much) compu8ng infrastructure
6/8/16 9 Licensed under CC By 2015 h_p://iplantc.org
![Page 10: Having it both ways: Bring Data to Computation ... · Having it both ways: Bring Data to Computation & Computation to Data with iRODS Nirav&Merchant& The$University$of$Arizona nirav@email.arizona.edu&](https://reader035.fdocuments.in/reader035/viewer/2022081512/60538af87aa18a06ab35c284/html5/thumbnails/10.jpg)
Condor Worker
Condor Worker
Condor Worker
Condor Worker
Our Solu8on: Data to Compute
6/8/16 10 Licensed under CC By 2015 h_p://iplantc.org
Discovery Env.
Condor Master (Docker)
iRODS
Condor Worker
Condor Worker
Condor Worker
Condor Worker
Note: Conceptual View
Other Compute infrastructure (HPC, Cloud)
Note: Conceptual View
![Page 11: Having it both ways: Bring Data to Computation ... · Having it both ways: Bring Data to Computation & Computation to Data with iRODS Nirav&Merchant& The$University$of$Arizona nirav@email.arizona.edu&](https://reader035.fdocuments.in/reader035/viewer/2022081512/60538af87aa18a06ab35c284/html5/thumbnails/11.jpg)
Mo8va8on: Compute to Data • Moving data to compute not feasible in many cases (100 TB+, large repositories)
• Availability of “fat nodes” (or choice for resource servers)
• Availability of specialized compute with storage systems (Wrangler)
6/8/16 11 Licensed under CC By 2015 h_p://iplantc.org
![Page 12: Having it both ways: Bring Data to Computation ... · Having it both ways: Bring Data to Computation & Computation to Data with iRODS Nirav&Merchant& The$University$of$Arizona nirav@email.arizona.edu&](https://reader035.fdocuments.in/reader035/viewer/2022081512/60538af87aa18a06ab35c284/html5/thumbnails/12.jpg)
Condor Worker
Condor Worker
Condor Worker
Condor Worker
Our Solu8on: Compute to Data
6/8/16 12 Licensed under CC By 2015 h_p://iplantc.org
Discovery Env.
Condor Master (Docker)
iRODS
Condor Worker
Condor Worker
Condor Worker
Condor Worker
Note: Conceptual View
Other Compute infrastructure (HPC, Cloud)
R R
Res
![Page 13: Having it both ways: Bring Data to Computation ... · Having it both ways: Bring Data to Computation & Computation to Data with iRODS Nirav&Merchant& The$University$of$Arizona nirav@email.arizona.edu&](https://reader035.fdocuments.in/reader035/viewer/2022081512/60538af87aa18a06ab35c284/html5/thumbnails/13.jpg)
Steps • Bring in data (choose your method) • Register the data with iRODS • Apply the metadata (ipc_data_set=IPCC-‐WG2) • Let condor announce it (class ads), also configure limits (num of concurrent jobs, core, ram, space to write output etc.)
• Submit job with class add and let condor scheduler match and manage it
• If you need more , create more copies (replica) and profit
• If you need to send it else where (HPC etc) use glidein and bosco
6/8/16 13 Licensed under CC By 2015 h_p://iplantc.org
![Page 14: Having it both ways: Bring Data to Computation ... · Having it both ways: Bring Data to Computation & Computation to Data with iRODS Nirav&Merchant& The$University$of$Arizona nirav@email.arizona.edu&](https://reader035.fdocuments.in/reader035/viewer/2022081512/60538af87aa18a06ab35c284/html5/thumbnails/14.jpg)
ireme IREME is a command-‐line u8lity which allows registering dataset(s) with irods , and assigning metadata to those datasets , which are then used with condor’s classads mechanism to match jobs with machines Ireme is also responsible for orchestra8ng the process of adver8sing metadata and datasets present on the condor worker / resource worker , in the form of machine classads
Usage -‐p -‐-‐path : Physical path of the resource to be registered with irods -‐c -‐-‐coll : CollecIon name within the irods database where files are registered -‐m -‐-‐meta : Comma-‐seperated meta data tags (key:value pairs) associated with the collecIon
Example Syntax ireme -‐p /home/user/sample_folder -‐c /tempZone/home/user/sample_coll -‐m key1:value1,key2:value2
![Page 15: Having it both ways: Bring Data to Computation ... · Having it both ways: Bring Data to Computation & Computation to Data with iRODS Nirav&Merchant& The$University$of$Arizona nirav@email.arizona.edu&](https://reader035.fdocuments.in/reader035/viewer/2022081512/60538af87aa18a06ab35c284/html5/thumbnails/15.jpg)
iRODS ClassAds IRODS_RESOURCE is the classad custom variable which advertises iRODS resource required by the
job in the form of metadata tags or dataset name (collection name). The Condor Negotiator matches job classad requirement (metadata or dataset) with classads
advertised by the Condor Worker
Sample Job ClassAd w/ iRODS requirement Executable=test2 Log=test.log Output=test.out error=test.error log=test.log +IRODS_RESOURCE="key=value" Requirements=TARGET.meta_available==true Queue
Condor Nego8ator
Sample Machine ClassAd w/ iRODS Ads meta_available = isMetaAvaialbe(TARGET.IRODS_RESOURCE) STARTD_EXPRS=meta_available , $(STARTD_EXPRS)
![Page 16: Having it both ways: Bring Data to Computation ... · Having it both ways: Bring Data to Computation & Computation to Data with iRODS Nirav&Merchant& The$University$of$Arizona nirav@email.arizona.edu&](https://reader035.fdocuments.in/reader035/viewer/2022081512/60538af87aa18a06ab35c284/html5/thumbnails/16.jpg)
Condor Master/ Nego8ator / Collector
Condor Worker / Resource Server
Condor Worker/ Resource Server
Condor Worker/ Resource Server
ICAT server
![Page 17: Having it both ways: Bring Data to Computation ... · Having it both ways: Bring Data to Computation & Computation to Data with iRODS Nirav&Merchant& The$University$of$Arizona nirav@email.arizona.edu&](https://reader035.fdocuments.in/reader035/viewer/2022081512/60538af87aa18a06ab35c284/html5/thumbnails/17.jpg)
Data FlowUser Job
iRODS Dataset ClassAd
iRODS Meta Data ClassAd
Condor Master / Nego8ator
Condor Worker / Resource Server
Condor Worker / Resource Server w/ required iRODS resoruce
Condor Worker / Resource Server
ClassAds ClassAds
ClassAds
![Page 18: Having it both ways: Bring Data to Computation ... · Having it both ways: Bring Data to Computation & Computation to Data with iRODS Nirav&Merchant& The$University$of$Arizona nirav@email.arizona.edu&](https://reader035.fdocuments.in/reader035/viewer/2022081512/60538af87aa18a06ab35c284/html5/thumbnails/18.jpg)
Data Flow after Classads Matching
User Job
iRODS Dataset ClassAd
iRODS Meta Data ClassAd
Condor Master / Nego8ator
Condor Worker / Resource Server
Condor Worker / Resource Server w/ required iRODS resoruce
Condor Worker / Resource Server
![Page 19: Having it both ways: Bring Data to Computation ... · Having it both ways: Bring Data to Computation & Computation to Data with iRODS Nirav&Merchant& The$University$of$Arizona nirav@email.arizona.edu&](https://reader035.fdocuments.in/reader035/viewer/2022081512/60538af87aa18a06ab35c284/html5/thumbnails/19.jpg)
Syndicate: Using CDN & beyond (Edge Compu8ng)
S3
DropBox
Metadata Service
SG
SG
SG SG
SG
GenBank
Shared Volume
SG
SG
CyVerse
![Page 20: Having it both ways: Bring Data to Computation ... · Having it both ways: Bring Data to Computation & Computation to Data with iRODS Nirav&Merchant& The$University$of$Arizona nirav@email.arizona.edu&](https://reader035.fdocuments.in/reader035/viewer/2022081512/60538af87aa18a06ab35c284/html5/thumbnails/20.jpg)