Virtual Data Tools Status Update
description
Transcript of Virtual Data Tools Status Update
![Page 1: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/1.jpg)
Virtual Data ToolsStatus UpdateATLAS Grid Software Meeting
BNL, 6 May 2002
Mike Wilde
Argonne National Laboratory
An update on work by Jens Voeckler, Yong Zhao, Gaurang Mehta, and many others.
![Page 2: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/2.jpg)
2
Data suppliers publish data to the Grid Users request raw or derived data from Grid,
without needing to know– Where data is located
– Whether data is stored or computed on demand
User and applications can easily determine– What it will cost to obtain data
– Quality of derived data
Virtual Data Grid serves requests efficiently, subject to global and local policy constraints
The Virtual Data Model
![Page 3: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/3.jpg)
3
pythia_input
pythia.exe
cmsim_input
cmsim.exe
writeHits
writeDigis
begin v /usr/local/demo/scripts/cmkin_input.csh file i ntpl_file_path file i template_file file i num_events stdout cmkin_param_fileend
begin v /usr/local/demo/binaries/kine_make_ntpl_pyt_cms121.exe pre cms_env_var stdin cmkin_param_file stdout cmkin_log file o ntpl_fileend
begin v /usr/local/demo/scripts/cmsim_input.csh file i ntpl_file file i fz_file_path file i hbook_file_path file i num_trigs stdout cmsim_param_fileend
begin v /usr/local/demo/binaries/cms121.exe condor copy_to_spool=false condor getenv=true stdin cmsim_param_file stdout cmsim_log file o fz_file file o hbook_fileend
begin v /usr/local/demo/binaries/writeHits.sh condor getenv=true pre orca_hits file i fz_file file i detinput file i condor_writeHits_log file i oo_fd_boot file i datasetname stdout writeHits_log file o hits_dbend
begin v /usr/local/demo/binaries/writeDigis.sh pre orca_digis file i hits_db file i oo_fd_boot file i carf_input_dataset_name file i carf_output_dataset_name file i carf_input_owner file i carf_output_owner file i condor_writeDigis_log stdout writeDigis_log file o digis_dbend
CMS Pipeline in VDL
![Page 4: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/4.jpg)
4
Virtual Data for Real Science:A Prototype Virtual Data Catalog
Virtual DataCatalog
(PostgreSQL)
Local FileStorage
Virtual DataLanguage
VDLInterpreter
(VDLI)GSI
GSI
GSI
Job Execution SiteU of Chicago
GridFTPClient
GlobusGRAM
Co
nd
or
Po
ol
Job Execution SiteU of Wisconsin
GridFTPClient
GlobusGRAM
Co
nd
or
Po
ol
Job Execution SiteU of Florida
GridFTPClient
GlobusGRAM
Co
nd
or
Po
ol
JobSumissionSitesANL, SC,…
Condor-GAgent
GlobusClient
GridFTPServer
Grid testbed
Simulate Physics
Simulate CMS Detector
Response
Copy flat-fileto OODBMS
Simulate Digitizationof Electronic Signals
Production DAG of Simulated CMS Data:
Architecture of the System:
![Page 5: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/5.jpg)
5
Cluster-finding Data Pipelinecatalog
cluster
5
4
core
brg
field
tsObj
3
2
1
brg
field
tsObj
2
1
brg
field
tsObj
2
1
brg
field
tsObj
2
1
core
3
![Page 6: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/6.jpg)
6
Virtual Data Tools Virtual Data API
– A Java class hierarchy to represent transformations and derivations
Virtual Data Language– Textual for illustrative examples– XML for machine-to-machine interfaces
Virtual Data Database– Makes the objects of a virtual data definition
persistent Virtual Data Service
– Provides an OGSA interface to persistent objects
![Page 7: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/7.jpg)
7
Languages
VDLt – textual version– mainy for documentation for now
– May eventually implement a ytranslator
– Can dump data structures in this representation VDLx – XML version – app-to-VDC interchange
– Useful for bulk data entry – catalog import-export aDAGx – XML version of abstract DAG cDAG – actual DAGman DAG
![Page 8: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/8.jpg)
8
Components and Interfaces
Java API– Manage Catalog objects (tr,dv, args…)
– Create / Locate / Update / Delete
– Same API at client and within server
– Can embed Java classes in an App for now Virtual Data Catalog Server
– Web (eventually OGSA)
– SOAP interface mirrors Java API operations XML processor Database – managed by VDCS
![Page 9: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/9.jpg)
9
System Architecture
Client App Virtual Data
Catalog Service
Virtual DataCatalog Objects
Virtual DataCatalog Database
Clie
nt A
PI
![Page 10: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/10.jpg)
10
Initial Release Architecture
Client App
Virtual DataCatalog Objects
Virtual DataCatalog Database
Client API
![Page 11: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/11.jpg)
11
Applicaton interfaces
Invoke Java client API (to make OGSA calls) Invoke Java server API (for now, embed
VDC processing directly in App Make OGSA calls directly Formulate XML (VDLx) to load the catalog
or request derivations
![Page 12: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/12.jpg)
12
Example VDL-Text
TR t1( output a2, input a1, none env="100000", none pa="500" )
{ app = "/usr/bin/app3";
argument parg = "-p "${none:pa};
argument farg = "-f "${input:a1};
argument xarg = "-x -y ";
argument stdout = ${output:a2};
profile env.MAXMEM = ${none:env};
}
![Page 13: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/13.jpg)
13
Example Derivation
DV t1 (
a2=@{output:run1.exp15.T1932.summary},
a1=@{input:run1.exp15.T1932.raw},
env="20000", pa="600“
);
![Page 14: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/14.jpg)
14
Derivations with dependencies
TR trans1( output a2, input a1 ){ app = "/usr/bin/app1"; argument stdin = ${input:a1}; argument stdout = ${output:a2};}TR trans2( output a2, input a1 ){ app = "/usr/bin/app2"; argument stdin = ${input:a1}; argument stdout = ${output:a2};}DV trans1( a2=@{output:file2}, a1=@{input:file1});DV trans2( a2=@{output:file3}, a1=@{output:file2});
![Page 15: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/15.jpg)
15
Expressing Dependencies
generate f.a
findrange
findrange f.b
f.c
analyze f.d
![Page 16: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/16.jpg)
16
Define the transformations
TR generate( output a ){ app = "generator.exe";
argument stdout = ${output:a2};
TR findrange( output b, input a, none p="0.0" ){
app = "ranger.exe"; argument arg = "-i "${:p}; argument stdin = ${output:a}; argument stdout = ${output:b};}
TR default.analyze( input a[], output c ){ pfnHint vanilla = "analyze.exe"; argument files = ${:a}; argument stdout = ${output:a2};}
![Page 17: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/17.jpg)
17
Derivations forming a DAG
DV generate( a=@{output:f.a} );DV findrange( b=@{output:f.b},
a=@{input:f.a}, p="0.5" );DV findrange( b=@{output:f.c}, a=@{input:f.a}, p="1.0" );DV analyze( a=[ @{input:f.b}, @{input:f.c} ],
c=@{output:f.d} );
![Page 18: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/18.jpg)
18
Virtual Data Class Diagram
Diagram by Jens Voeckler
![Page 19: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/19.jpg)
19
Virtual Data Catalog Structure
![Page 20: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/20.jpg)
20
Virtual Data Language - XML
![Page 21: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/21.jpg)
21
VDL Searches Locate the derivations that can produce a
specific lfn General queries for catalog maintenance Locate transforms that can produce a
specific file type (what does a type mean in this context?)
![Page 22: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/22.jpg)
22
Virtual Data Issues
Param file support Param structures Sequences Virtual datasets
![Page 23: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/23.jpg)
23
Execution Environment Profile
Condor / DAGman / GRAM / WP1 Concept of a EE driver
– Allows plug-in of DAG generating code for: DAGman, Condor, GRAM, WP1 JM/RB
Execution Profile: Global, User/Group, Transformation, Derviation , Invocation
![Page 24: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/24.jpg)
24
First Release – June 2002
Java Catalog Classes XML import – export Textual VDL formatting DAX – (abstract) DAG in XML Simple planner for constrained Grid
– Will generate Condor DAGs
![Page 25: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/25.jpg)
25
Next Releases - Features
RLS Integration Compound Transformations Database persistency OGSA Service Other needed clients: C, TCL, ? Expanded execution profiles / planners
– Support for WP1 scheduler / broker
– Support for generic RSL-based schedulers
![Page 26: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/26.jpg)
26
Longer-term Feature Preview
Instance tracking Virtual files and virtual transformations Multi-modal data Structured namespaces Grid-wide distributed catalog service Metadata database integration Knowledge-base integration
![Page 27: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/27.jpg)
29
SDSS Extension:Dynamic Dependencies
Data is organized into spacial cells Scope of search is not known until run time In this case – nearest 9 or 25 cells to a centroid Need a dynamic algorithmic spec for what the
range of cells to process is – a nested loop that generates the actual file names to examine.
In complex cases, might be a sequence of such centroid-based sequences.
![Page 28: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/28.jpg)
30
LIGO Example
Consider 3 (fictitious) channels: c, p, t Operations are extract and concatenate ex –i a –s t0 –e tb >ta ex –i e –s te –e t1 >te cat ta b c d te | filter exch p <a –s t0 –e t1 filter –v p,t Examine whether derived metadata handles this concept
t0 t1
a edcb
tetdtctbta tf
![Page 29: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/29.jpg)
31
Distributed Virtual Data Service Will parallel the service architecture
of the RLS …but probably can’t use soft-state approach –
needs consistency; can accept latency Need a global name space for collaboration-wide
information and knowledge sharing May use distributed database technology below the
covers Will leverage a distributed, structured namespce Preliminary – not yet designed
![Page 30: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/30.jpg)
32
Distributed Virtual Data Service
apps
Tier 1 centers
Regional Centers
Local sites
VDC
VDCVDC
VDC
Distributed virtual data service
![Page 31: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/31.jpg)
33
End of presentation
![Page 32: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/32.jpg)
34
Supplementary Material
![Page 33: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/33.jpg)
35
Knowledge Management Architecture
Knowledge based requests are formulated in terms of science data– Eg, Give me this transform of channels c,p,&t over time range t0-t1
Finder finds the data files– Translates range “t0-t1” into a set of files
Coder creates an execution plan and defines derivations from known transformations– Can deal with missing files (e.g, file c in LIGO example)
K-B request is formulated in terms of virtual datasets Coder translates into logical files Planner trans;ates into physical files
Coderknowledge-
basedrequest
Finder
MetadataCatalog
Virtual DataCatalog
PlanneraDAG
![Page 34: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/34.jpg)
36
NCSA Linux cluster
5) Secondary reports complete to master
Master Condor job running at
Caltech
7) GridFTP fetches data from UniTree
NCSA UniTree - GridFTP-enabled FTP server
4) 100 data files transferred via GridFTP, ~ 1 GB each
Secondary Condor job on WI
pool
3) 100 Monte Carlo jobs on Wisconsin Condor pool
2) Launch secondary job on WI pool; input files via Globus GASS
Caltech workstation
6) Master starts reconstruction jobs via Globus jobmanager on cluster
8) Processed objectivity database stored to UniTree
9) Reconstruction job reports complete to master
User View of the Virtual Data Grid
Scott Koranda, Miron Livny, others
![Page 35: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/35.jpg)
37
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Data: 0.5 MB 175 MB 275 MB 105 MB
SC2001 Demo Version:
pythia cmsim writeHits writeDigis
1 run = 500 events
1 run
1 run
1 run
1 run
1 event
CPU: 2 min 8 hours 5 min 45 min
truth.ntpl hits.fz hits.DB digis.DB
Production Pipeline GriphyN-CMS Demo
![Page 36: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/36.jpg)
38
GriPhyN: Virtual DataTracking Complex Dependencies
Dependency graph is:
– Files: 8 < (1,3,4,5,7), 7 < 6, (3,4,5,6) < 2
– Programs: 8 < psearch, 7 < summarize,(3,4,5) < reformat, 6 < conv, (1,2) < simulate
simulate –t 10 …
file1
file2reformat –f fz …
file1file1File3,4,5
psearch –t 10 …
conv –I esd –o aodfile6 summarize –t 10 …
file7
file8
Requestedfile
![Page 37: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/37.jpg)
39
Re-creating Virtual Data
To recreate file 8: Step 1
– simulate > file1, file2
simulate –t 10 …
file1
file2reformat –f fz …
file1file1File3,4,5
psearch –t 10 …
conv –I esd –o aodfile6 summarize –t 10 …
file7
file8
Requestedfile
![Page 38: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/38.jpg)
40
Re-creating Virtual Data
To re-create file8: Step 2
– files 3, 4, 5, 6 derived from file 2
– reformat > file3, file4, file5
– conv > file 6
simulate –t 10 …
file1
file2reformat –f fz …
file1file1File3,4,5
psearch –t 10 …
conv –I esd –o aodfile6 summarize –t 10 …
file7
file8
Requestedfile
![Page 39: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/39.jpg)
41
Re-creating Virtual Data
To re-create file 8: step 3
– File 7 depends on file 6
– Summarize > file 7
simulate –t 10 …
file1
file2reformat –f fz …
file1file1File3,4,5
psearch –t 10 …
conv –I esd –o aodfile6 summarize –t 10 …
file7
file8
Requestedfile
![Page 40: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/40.jpg)
42
Re-creating Virtual Data
To re-create file 8: final step
– File 8 depends on files 1, 3, 4, 5, 7
– psearch < file1, file3, file4, file5, file 7 > file 8
simulate –t 10 …
file1
file2
psearch –t 10 …
reformat –f fz …
conv –I esd –o aod
file1file1File3,4,5
file6 summarize –t 10 …
file7
file8
Requestedfile
![Page 41: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/41.jpg)
43
SDSS Galaxy Cluster Finding
![Page 42: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/42.jpg)
44
Cluster-finding Grid
Work of: Yong Zhao, James Annis, & others
![Page 43: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/43.jpg)
45
Cluster-finding pipeline execution
![Page 44: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/44.jpg)
46
Virtual Data in CMS
Virtual Data Long Term Vision of CMS: CMS Note 2001/047, GRIPHYN 2001-16
![Page 45: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/45.jpg)
47
CMS Data Analysis
100b 200b
5K 7K
100K
50K
300K100K
100K
50K
100K200K
100K
100b 200b
5K 7K
100K
50K
300K100K
100K
50K
100K200K
100K
Tag 2
Jet finder 2
Jet finder 1
ReconstructionAlgorithm
Tag 1
Calibration data
Raw data(simulated
or real)
Reconstructeddata
(produced by physics
analysis jobs)
Event 1 Event 2 Event 3
Uploaded data Virtual data Algorithms
Dominant use of Virtual Data in the Future
![Page 46: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/46.jpg)
48
Topics – Planner Does the planner have a queue? What does
presence and absence of queue imply? How is responsibility between planner and the
executor (cluster scheduler) partitioned? How does planner estimate times if it only has
partial responsibility for when/where things run? How does a cluster sched assign CPUs – dedicated
or shared? See Mirons email on NeST for more Qs Use of a Execution profiler in the planner arch?
– Characterize the resource requirements of an app over time
– Parameterize the res reqs of an app w.r.t its (salient) parameters
![Page 47: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/47.jpg)
49
Planner Context
Map of grid resources Status of grid resources
– State (up/down)
– Load
– Dedication (commitment of resource to VO or group based on policy)
Policy Request Queue (w/ lookahead, or process
sequentially?)
![Page 48: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/48.jpg)
50
CAS and SAS
Site Authorization Service– How does a physical site control the policy by which
its resources get used?
– How does a SAS and a CAS interact?
– Can a resource inerpret restructed proxies from multiple CAS’s? (Yes, but not from arbitrary CASes)
– Consider MPI and MPICH-G jobs – how would the latter be handled?
– Consider: if P2 schedules a whole DAG up front – causes schedule to use outdated information
![Page 49: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/49.jpg)
51
Planner Architecture
S1A
sharedSE
C C
LRC
SAS-1
ooo
Site 1
VO-A
CAS-A Planner 1
Virtual Data Service vdb
Replica Location Service RIS
SnA
sharedSE
C C
LRC
S2A
sharedSE
C C
LRC
S1C
sharedSE
C C
LRC
VO-C
CAS-C
![Page 50: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/50.jpg)
52
Policy
Focuses on Security and Configuration (controlled resource sharing/allocation)
Allocation example:– “cms should get 90% of the resources at
Caltech”
– Issues of fair share scheduling How to factor in time quanta:CPU-hours;
GB-Days Relationship to accounting
![Page 51: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/51.jpg)
53
Policy and the Planner
Planner considers:– Policy (fairly static, from CAS/SAS)
– Grid status
– Job (user/group) resource consumptn history
– Job profiles (resources over time) from Prophesy
planner
policy
AccountingRecords
Status
Job Usageinfo
Job ProfileRecords
Prohphesy(predictor)
Job ProfilingData
![Page 52: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/52.jpg)
54
GriPhyN/PPDGData Grid Architecture
Application
Planner
Executor
Catalog Services
Info Services
Policy/Security
Monitoring
Repl. Mgmt.
Reliable TransferService
Compute Resource Storage Resource
DAG (concrete)
DAG (abstract)
DAGMAN, Kangaroo
GRAM GridFTP; GRAM; SRM
GSI, CAS
MDS
MCAT; GriPhyN catalogs
GDMP
MDS
Globus
![Page 53: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/53.jpg)
55
(evolving) View of Data Grid Stack
Data Transport(GridFTP)
Storage Element
Local Repl Catalog(Flat or Hierarchical)
Reliable FileTransfer
Replica LocationService
Publish-SubscribeService (GDMP)
StorageElementManager
Reliable Replication
![Page 54: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/54.jpg)
56
Executor Example: Condor DAGMan
Directed Acyclic Graph Manager
Specify the dependencies between Condor jobs using DAG data structure
Manage dependencies automatically– (e.g., “Don’t run job “B” until job “A” has completed
successfully.”)
Each job is a “node” in DAG
Any number of parent or children nodes
No loops
Job A
Job B Job C
Job D
Slide courtesy Miron Livny, U. Wisconsin
![Page 55: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/55.jpg)
57
Executor Example: Condor DAGMan (Cont.)
DAGMan acts as a “meta-scheduler” – holds & submits jobs to the Condor queue at the
appropriate times based on DAG dependencies
If a job fails, DAGMan continues until it can no longer make progress and then creates a “rescue” file with the current state of the DAG– When failed job is ready to be re-run, the rescue file is
used to restore the prior state of the DAG
DAGMan
CondorJobQueue
C
D
B
C
B
A
Slide courtesy Miron Livny, U. Wisconsin
![Page 56: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/56.jpg)
58
Abstract DAG– Represents user requests
– Simplest case: request for one or more data product
– Complex case: request execution of a chained set of applications
– No file or execution locations need be present
Concrete DAG– Specifies any application invocations needed to derive
data
– Specifes locations of all invocations (to the site level)
– Includes explicit job steps to move data
DAG Usage
![Page 57: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/57.jpg)
59
Strawman Architecture
VDLX VDCStrawman
Planner 1aDAG
Planner 2
cDAG(concrete
DAGman dag)
![Page 58: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/58.jpg)
60
The GriPhyN Charter
“A virtual data grid enables the definition and delivery of a potentially unlimited virtual space of data products derived from other data. In this virtual space, requests can be satisfied via direct retrieval of materialized products and/or computation, with local and global resource management, policy, and security constraints determining the strategy used.”
![Page 59: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/59.jpg)
61
GriPhyN-LIGO SC2001 Demo
Desired Result
:
Single channel time series
HTTP
frontend
MyProxyserver
ReplicaCatalog
ExecutorCondorG/DAGMan
Planner Monitoring
TransformationCatalog
GridFTP GRAM/LDAS
LDAS at UWMGridCVS
Logs
SC floor
GridFTP
ComputeResource
GRAM
xml
Cgi interface
G-DAG (DAGMan)
GridFTP GRAM/LDAS
LDAS at CaltechUWM
GridFTP
UWM
GridFTP
ReplicaSelection
Frame
In integration
Prototype exclusive
In design
Globus component
![Page 60: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/60.jpg)
62
GriPhyN CMS SC2001 Demo
Full Event Database of ~100,000
large objects
Full Event Database of
~40,000 large objects
“Tag” database of ~140,000
small objects
RequestRequest
Parallel tuned GSI FTP Parallel tuned GSI FTP
Bandwidth Greedy Grid-enabled Object Collection Analysisfor Particle Physics
http://pcbunn.cacr.caltech.edu/Tier2/Tier2_Overall_JJB.htm
![Page 61: Virtual Data Tools Status Update](https://reader035.fdocuments.in/reader035/viewer/2022081603/568148ba550346895db5d4e4/html5/thumbnails/61.jpg)
63
Virtual Datain Action
?
Major Archive Facilities
Network caches & regional centers
Local sites
Data request may Access local data Compute locally Compute remotely Access remote
data Scheduling &
execution subject to local & global policies