The EU DataGrid – Information and Monitoring Services The European DataGrid Project Team .
EDG Application The European DataGrid Project Team .
-
Upload
kathlyn-brooks -
Category
Documents
-
view
219 -
download
0
Transcript of EDG Application The European DataGrid Project Team .
EDG Application
The European DataGrid Project Team
http://www.eu-datagrid.org
EDG Applications Tutorial – n° 2
EDG Application Areas
High Energy Physics
Biomedical Applications
Earth Observation Science Applications
EDG Applications Tutorial – n° 3
High Energy Physics
4 Experiments on LHC CMSATLAS
LHCb
~6-8 PetaBytes / year~108 events/year
~103 batch and interactive users
EDG Applications Tutorial – n° 4
Europe: 267 institutes, 4603 usersElsewhere: 208 institutes, 1632 users
CERN’s Network in the World
EDG Applications Tutorial – n° 5
Data Flow in LHC
RAW Data
DAQ
Trigger
Reconstruction
Event Summary Data (ESD) Reconstruction Tags
RAW Tags Conditions / Calibration Data
Physics Generator
Detector Simulation
Generator Data
RAWmc Data
Monte Carlo
Reconstruction
Event Summary Data (ESD) Reconstruction Tags
RAWmc Tags Conditions / Calibration Data
EDG Applications Tutorial – n° 6
Example: CMS Monte Carlo Production
EDG Applications Tutorial – n° 7
CMS jobs description
CMKIN : MC Generation of the proton-proton interaction for a physics channel (dataset)
CMSIM: Detailed simulation of the CMS detector, processing the data produced during the CMKIN step
CMKINJob
CMSIMJob
Output data
Output data
Grid Storage
Write to Grid
Storage Element
Write to Grid
Storage Element
Read from
Grid
Stora
ge Elem
ent
* PIII 1GHz 512MB 46.8 SI95
size/event time*/event
CMKIN ~ 0.05MB ~ 0.4-0.5 sec
CMSIM ~ 1.8 MB ~ 6 min
EDG Applications Tutorial – n° 8
CMS EDG
SECE
CMS software
CMS production components interfaced
to EDG middleware
BOSSDB
WorkloadManagement
System
JDL
RefDB
parameters
Push data or info
Pull info
UIIMPALA/BOSS
CE
CMS software
CE
CMS software
CE
SE
SE
SE
Production is managed from the EDG User Interface with IMPALA/BOSS
CMS Virtual Organization server at NIKHEF (Amsterdam)
EDG Applications Tutorial – n° 9
CMS EDG
SECE
CMS software
BOSSDB
WorkloadManagement
System
JDL
RefDB
parameters
data registration
input
dat a
lo
cat i
on
Push data or info
Pull info
UIIMPALA/BOSS
Replica Manager
CE
CMS software
CE
CMS software
CE
WN
SECE
CMS software
SE
SE
SE
CMKIN jobs running on all EDG Testbed sites with CMS software installed CMSIM jobs running on CE close to the input data produced data: scripts for batch replication to a dedicated SE
X
CMS production components interfaced
to EDG middleware
EDG Applications Tutorial – n° 10
CMS EDG
SECE
CMS software
CMS production components interfaced
to EDG middleware
BOSSDB
WorkloadManagement
System
JDL
RefDB
parameters
data registration
Job output filteringRuntime monitoring
input
dat a
lo
cat i
on
Push data or info
Pull info
UIIMPALA/BOSS
Replica Manager
CE
CMS software
CE
CMS software
CE
WN
SECE
CMS software
SE
SE
SE
Job monitoring and bookkeeping: BOSS DBs, EDG Logging & Bookkeeping service
EDG Applications Tutorial – n° 11
CMS use of the system (Statistics)
CEsSEs
Nb
. of
evts
time
Events Production within EDG is part of the Official CMS production
http://cmsdoc.cern.ch/cms/production/www/html/general/index.html
EDG Applications Tutorial – n° 12
Summary of CMS work and the planning for use of EDG
middleware
RESULTS We can distribute and run CMS s/w in the EDG environment
We have generated ~250K events for physics with ~10000 jobs in 3 week period
OBSERVATIONS and PLANNING for the future We were able to quickly add new sites to provide extra resources
There was a fast turnaround in bug fixing and installing new software
The stress test was labor intensive (since software was developing and th
Release EDG 2.0 should fix the major problems and allow for enhanced scalability,and we look forward to evaluating it and using it in our Data Challenge work
EDG Applications Tutorial – n° 13
ESA(IT) – KNMI(NL)Processing of raw GOMEdata to ozone profiles.
2 alternative algorithms~28000 profiles/day IPSL(FR)
Validate some of theGOME ozone profiles (~106/y)Coincident in space and time
with Ground-Based measurements
Visualization & Analyze
EDG EO challenge: Processing / validation of 1y of GOME data
LIDAR data (7 stations, 2.5MB per month)
DataGridenvironment
Level 2
(example of 1 day total O3)
Level 1
Raw satellite data from the GOME instrument(~75 GB - ~5000 orbits/y)
EDG Applications Tutorial – n° 14
EO WebMap Portal
EDG Applications Tutorial – n° 15
Web Portal EO ProductCatalogue
EDGStorage Element
EDGUser Interface
EDGResource
BrokerEDGComputing
Element
EO Replica Catalogue
Processing Sequence
EOGrid Engine
EO ProductArchive
1. Search Level-1 catalogue
2. Retrieve Level-2 products
3. Level-2 Products already registered in RC?
8. Submit jobs to process Level-1 data
7. Register Level-1 data
11. Register level-2 data
9. Process Level-1 data
10. Transfer Level-2 data to SE
12. Return new Level-2 products
Yes? 4. Return available Level-2 productsNo? 5. Perform GRID processing on-the-fly 6. Transfer
Level-1 data from Archive to the Grid
EDG Applications Tutorial – n° 16
GOME Ozone Profile Validation
Goals of the DataGrid applicationvalidate satellite data with all ground based data available in an easy way: Comparison of ozone profiles provided by satellite with lidar data in different locations and times (see the web portal) Statistical comparison and analysis in order to improve algorithms.
OZONE LAYER50 km
10 km
ERS/GOME satellite
Lidar at the Haute Provence Observatory
EDG Applications Tutorial – n° 17
Validation Processing Sequence
Level 2 Catalogue
Lidar data catalogue
Queries and data information retrieval from the Lidar metadata catalogue
GRID
ComputingElement
Storage Elements with
Lidar data
Queries and data information retrieval from the Gome Level 2 orbit or pixel metadata catalogues
When completed comparison between lidar and satellite ozone profiles
Satellite data validation Lidar site
Level 2 Catalogue
GRID Portal
Storage Elements with Gome L2 data
Submission of the Job in the GRID
1
2
3
4
EDG Applications Tutorial – n° 18
Validation OutputFigure 1:
Estimation of the bias between Gome and Lidar using one month of data.
Figure 2 :
example of 2 profiles : Comparison between Gome profile and lidar profile for the 2nd October 2000.
EDG Applications Tutorial – n° 19
Perspectives for Biomedical Applications
Grids open new perspectives in large scale genomics analysis
Complete genome annotation
Cross-genomes analysis
Data mining on distributed databases
Pipelining of huge automatic bio-informatics analysis
Medical image processing
Large databases processing
Anatomy and physiology modeling
Epidemiological studies
EDG Applications Tutorial – n° 20
Biomedical Applications Bio-informatics
Phylogenetics : BBE Lyon (T. Sylvestre) Search for primers : Centrale Paris (K. Kurata) Statistical genetics : CNG Evry (N. Margetic) Bio-informatics web portal : IBCP (C. Blanchet) Parasitology : LBP Clermont, Univ B. Pascal (N. Jacq) Data-mining on DNA chips : Karolinska (R. Médina, R.
Martinez) Geometrical protein comparison : Univ. Padova (C. Ferrari)
Medical imaging MR image simulation : CREATIS (H. Benoit-Cattin) Medical data and metadata management : CREATIS (J.
Montagnat) Mammographies analysis ERIC/Lyon 2 (S. Miguet, T.
Tweed) Simulation platform for PET/SPECT based on Geant4 : GATE
collaboration (L. Maigne)
Applications deployedApplications tested on EDGApplications under preparation
EDG Applications Tutorial – n° 21
Medical Imaging
Medical images
Metadata
HH
1. query
2. visu
alisat
ion
3. similarity search4. scores
5. best results visualisation
LFN image patient hospital ...
EDG Applications Tutorial – n° 22
Graphic layer
Job Monitoring
Grid File Browsing
File registration and retrieval
EDG Applications Tutorial – n° 23
Graphical InterfacesImage registration
Image retrieval
Local files Grid files Metadata
Query over metadata Query result
EDG Applications Tutorial – n° 24
Image Registration
LFN image patient hospital ...
Imager
SE
EDG Applications Tutorial – n° 25
Similarity searchSimilarity computation
Results visualization
Job monitoring Ranked list of images
Source image Most similar images Low score images
EDG Applications Tutorial – n° 26
Future: Interfacing medical data with the Grid
Client 1interface
Client 2interface
RSinterface
core
grid - serverinterface
header blankingencryption
StorageElement
ReplicaCatalog
ReplicationService
RCinterface
Metadata interface
Medical (trusted) site
Grid middleware
File metadataACLsizechecksum...
Application metadataACLencryption keysensitive metadata...Medical server
StorageElementMSS
Master File
Replica
Imager
EDG Applications Tutorial – n° 27
Parallel ProcessingMagnetic Resonance Images simulation using the grid
3 levels of parallelism:
Parallel isochromat computations
Multi-slice MRI computation
Parallel magnetization kernel
Magnetisationcomputation
kernel
Reconstructionalgorithm MRI
ImageVirtualobject
MRIsequence
EDG Applications Tutorial – n° 28
Summary
Use Cases High Energy Physics
Earth Observation
Biomedical Applications
EDG Applications Tutorial – n° 29
Further Information
High Energy Physics
http://datagrid-wp8.web.cern.ch/DataGrid-WP8/
Bio-Informatics
http://marianne.in2p3.fr/datagrid/wp10/index.html
Earth Observation
http://styx.esrin.esa.it/grid/