Post on 27-Mar-2015
Data Centric IssuesParticle Physics and
Grid Data ManagementTony Doyle
University of Glasgow
Outline: Data to Metadata to DataOutline: Data to Metadata to Data
IntroductionYesterday “.. all my troubles seemed so far away”
(non-Grid) Database Access Data Hierarchy
Today “.. is the greatest day I’ve ever known”
Grids and Metadata Management File Replication Replica Optimisation
Tomorrow “.. never knows”
Event Replication Query Optimisation
IntroductionYesterday “.. all my troubles seemed so far away”
(non-Grid) Database Access Data Hierarchy
Today “.. is the greatest day I’ve ever known”
Grids and Metadata Management File Replication Replica Optimisation
Tomorrow “.. never knows”
Event Replication Query Optimisation
:
:E.g.,
Resource-specific implementations of basic services
E.g., transport protocols, name servers, differentiated services, CPU schedulers, public key infrastructure, site accounting, directory service, OS bypass
Resource-independent and application-independent services
authentication, authorisation, resource location, resource allocation, events,
accounting, remote data accessremote data access, information, policy, fault detection
Distributedcomputing
toolkit
Grid Fabric (Resources)
Grid Services
(Middleware)
Application Toolkits
Data-Data-intensiveintensive
applicationsapplicationstoolkittoolkit
Collaborativeapplications
toolkit
RemoteVisualisationapplications
toolkit
Problemsolving
applicationstoolkit
Remoteinstrumentation
applicationstoolkit
Applications Chemistry
Biology
Cosmology
High Energy PhysicsHigh Energy Physics
Environment
GRID Services: ContextGRID Services: Context
Online Data Rate vs SizeOnline Data Rate vs Size
Level 1 Rate (Hz)
105
104
103
102
High Level-1 Trigger(1 MHz)
High No. ChannelsHigh Bandwidth(500 Gbit/s)
High Data Archive(PetaByte)
LHCB
KLOE
HERA-B
CDF II
CDF
H1ZEUS
UA1
LEP
NA49
ALICE
Event Size (bytes)
104 105 106
ATLASCMS
107
106
It doesn’t…Factor
O(1000)Online datareductionvia trigger selection
“How can this data reach theend user?”
Offline Data HierarchyOffline Data Hierarchy“RAW, ESD, AOD, TAG”
RAWRAW Recorded by DAQRecorded by DAQTriggered eventsTriggered events
Detector digitiDetector digitissationation~1 MB/event~1 MB/event
ESDESDPseudo-physical information:Pseudo-physical information:
Clusters, track candidates Clusters, track candidates (electrons, muons), etc.(electrons, muons), etc.
Reconstructed Reconstructed informationinformation
~100 kB/event~100 kB/event
AODAOD
Physical informationPhysical information::Transverse momentum, Transverse momentum,
Association of particles, jets, Association of particles, jets, (best) id of particles,(best) id of particles,
Physical info for relevant “objects”Physical info for relevant “objects”
Selected Selected informationinformation
~10 kB/event~10 kB/event
TAGTAGAnalysis Analysis
informationinformation~1 kB/event~1 kB/eventRelevant information Relevant information
for fast event selectionfor fast event selection
Physics AnalysisPhysics AnalysisESD: Data or Monte CarloESD: Data or Monte Carlo
Event Tags Event TagsEvent Selection
Analysis Object DataAnalysis Object DataAnalysis Object DataAnalysis Object DataAnalysis Object Data
AOD
Analysis Object Data
AOD
Calibration DataCalibration Data
Analysis, Skims
Raw DataRaw Data
Tier 0,1Collaboration
wide
Tier 2Analysis
Groups
Tier 3, 4Physicists
Physics Analysis
Physics
Objects Physics
Objects
Physics
Objects
INC
RE
AS
ING
DA
TA
FLO
W
REAL and SIMULATED data required.
Central and Distributed production.
Data StructureData Structure
Raw DataRaw Data
Reconstruction
Data Acquisition
Level 3 trigger
Trigger TagsTrigger Tags
Event Summary Data
ESD
Event Summary Data
ESD Event Tags Event Tags
Physics Models
Monte Carlo Truth DataMonte Carlo Truth Data
MC Raw DataMC Raw Data
Reconstruction
MC Event Summary DataMC Event Summary Data MC Event Tags MC Event Tags
Detector Simulation
Calibration DataCalibration Data
Run ConditionsRun Conditions
Trigger System
A running (non-Grid) experimentA running (non-Grid) experiment
Three Steps to select an event today1. Remote access to O(100) TBytes of
ESD data2. Via remote access to 100 GBytes of
TAG data3. Using offline selection e.g. ZeusIO-
Variable (Ee>20.0)and(Ntrks>4) Access to remote store via batch job 1% database event finding overhead O(1M) lines of reconstruction code No middleware 20k lines of C++ “glue” from Objectivity
(TAG) to ADAMO (ESD) database
Three Steps to select an event today1. Remote access to O(100) TBytes of
ESD data2. Via remote access to 100 GBytes of
TAG data3. Using offline selection e.g. ZeusIO-
Variable (Ee>20.0)and(Ntrks>4) Access to remote store via batch job 1% database event finding overhead O(1M) lines of reconstruction code No middleware 20k lines of C++ “glue” from Objectivity
(TAG) to ADAMO (ESD) database
TAG
ESD
100 Million selected events from 5 years’ data TAG selection via 250 variables/event
A future (Grid) experimentA future (Grid) experiment
Three steps to (analysis) heaven 1. 10 (1) PByte of RAW (ESD) data/yr 2. 1 TByte of TAG data (local access)/yr3. Offline selection e.g. ATLASIO-
Variable (Mee>100.0)and(Njets>4) Interactive access to local TAG store Automated batch jobs to distributed
Tier-0, -1, -2 centres O(1M) lines of reconstruction code O(1M) lines of middleware… NEW… O(20k) lines of Java/C++ provide TAG
“glue” from TAG to ESD database All working? Efficiently?
Three steps to (analysis) heaven 1. 10 (1) PByte of RAW (ESD) data/yr 2. 1 TByte of TAG data (local access)/yr3. Offline selection e.g. ATLASIO-
Variable (Mee>100.0)and(Njets>4) Interactive access to local TAG store Automated batch jobs to distributed
Tier-0, -1, -2 centres O(1M) lines of reconstruction code O(1M) lines of middleware… NEW… O(20k) lines of Java/C++ provide TAG
“glue” from TAG to ESD database All working? Efficiently?
1000 Million events
from 1 year’s data-taking TAG selection via
250 variables
DataBase DataBase Solutions IncSolutions Inc..
Inter
Grid Data Management: RequirementsGrid Data Management: Requirements1. “Robust” - software development
infrastructure2. “Secure” – via Grid certificates3. “Scalable” – non-centralised
4. “Efficient” – Optimised replication Examples:
1. “Robust” - software development infrastructure
2. “Secure” – via Grid certificates3. “Scalable” – non-centralised
4. “Efficient” – Optimised replication Examples:
GDMP Spitfire Reptor Optor
1. Robust?Development Infrastructure
1. Robust?Development Infrastructure
CVS Repository management of DataGrid source code all code available (some mirrored)
Bugzilla Package Repository
public access to packaged DataGrid code Development of Management Tools
statistics concerning DataGrid code auto-building of DataGrid RPMs publishing of generated API documentation
latest build = Release 1.2 (August 2002)
CVS Repository management of DataGrid source code all code available (some mirrored)
Bugzilla Package Repository
public access to packaged DataGrid code Development of Management Tools
statistics concerning DataGrid code auto-building of DataGrid RPMs publishing of generated API documentation
latest build = Release 1.2 (August 2002)
testbed 1 source code lines
java
cpp
ansic
python
perl
sh
csh
sed
sql
makefile
140506 Lines of Code140506 Lines of Code10 Languages10 Languages(Release 1.0)(Release 1.0)
Component ETT UT IT NI NFF MB SD
Resource Broker
Job Desc. Lang.
Info. Index
User Interface
Log. & Book. Svc.
Job Sub. Svc.
Broker Info. API
SpitFire
GDMP
Rep. Cat. API
Globus Rep. Cat.
ETT Extensively Tested in Testbed
UT Unit Testing
IT Integrated Testing
NI Not Installed
NFF Some Non-Functioning Features
MB Some Minor Bugs
SD Successfully Deployed
Component ETT UT IT NI NFF MB SD
Schema
FTree
R-GMA
Archiver Module
GRM/PROVE
LCFG
CCM
Image Install.
PBS Info. Prov.
LSF Info. Prov.
Component ETT UT IT NI NFF
MB SD
SE Info. Prov.
File Elem. Script
Info. Prov. Config.
RFIO
MSS Staging
Mkgridmap & daemon
CRL update & daemon
Security RPMs
EDG Globus Config.
Component ETT UT IT NI NFF MB
SD
PingER
UDPMon
IPerf
Globus2 Toolkit
1. Robust?Software Evaluation
1. Robust?Software Evaluation
1. Robust?Middleware Testbed(s)
1. Robust?Middleware Testbed(s)
B.Jones– July 2002 - n° 2
Testing Activities
WPs add unittested code toCVS repository
Run nightly build& auto. tests
AnyErrors?
I nstall on cert. Testbed& run back. compat.
tests
yesFix problems
no AnyErrors?
yes
Fix problems
no
Candidate beta ReleaseFor testing by apps.
“Development”testbed
“Certification”testbed
“Application”testbed
“WP specific”testbeds
AnyErrors?
yes no
Candidate publicrelease
for use by apps.
24x7Offi ce hours
I Team
TSTG
ATG
Apps
WPs
Validation/Maintenance=>Testbed(s)
EU-wide development
1. Robust?Code Development Issues
1. Robust?Code Development Issues
Reverse Engineering (C++ code analysis and restructuring; coding standards) => abstraction of existing code to UML architecture diagrams
Language choice(currently 10 used in DataGrid)
Java = C++ - - “features” (global variables, pointer manipulation, goto statements, etc.).
Constraints (performance, libraries, legacy code) Testing (automation, object oriented testing)
Industrial strength? OGSA-compliant? O(20 year) Future proof??
Reverse Engineering (C++ code analysis and restructuring; coding standards) => abstraction of existing code to UML architecture diagrams
Language choice(currently 10 used in DataGrid)
Java = C++ - - “features” (global variables, pointer manipulation, goto statements, etc.).
Constraints (performance, libraries, legacy code) Testing (automation, object oriented testing)
Industrial strength? OGSA-compliant? O(20 year) Future proof??
Experiment-wide database selection
Output files Storage options preferences (SE, MSS, closest...)
Define execution cri teria (CE, priori ty ...)
Submit Physic Appl i
login
PRODUCTION: Simulation
else
If actor is proxy certi fied
Get LFNs for database access
Al locate output LFNs
Write submission job (JDL?) -Submit Job to Grid
VO metadata data description catalog
VO Job submission bookkeeping service
VO metadata configuration Catalog
Job resource match
VO repl ica catalog
Record job parameter
Al locate Job Id
Optimize CE choice /VO
Submit job to CE
Submit Job to Working Node
Prepare exec environment -associate PFN-LFN
Execute Physic Appl i
Manage Output Files & update Fi le catalog LFN-PFN
Record execution info
Fi le management & PFN selection
Record job parameter (JDL, input, ...)
Register/update attributes (LFN)
Management of job-related information
Display avai lable resources/JDL
Job execution accounting service
POSIX cal l -Open (LFN) Read/Wri te Close or grid wrapper to POSIX cal ls
VO Database access
Grid access via API
Appl ication is never recompi led or rel inked to run on Grid - Access to data is done via standard POSIX cal ls (???????)
Register/Update attributes (LFN) in VO metadata Catalog
Publ ish job-related information
ex: automatic file replication or fi le transfer & fi le catalog update
PHYSIC APPLICATIONGRIDEXPERIMENT SPECIFIC MODULESPRODUCTION TEAM
ETT Extensively Tested in Testbed
UT Unit Testing
IT Integrated Testing
NI Not Installed
NFF Some Non-Functioning Features
MB Some Minor Bugs
SD Successfully Deployed
testbed 1 source code lines
java
cpp
ansic
python
perl
sh
csh
sed
sql
makefile
Data Management on the GridData Management on the Grid “Data in particle physics is centred on events stored in a database…
Groups of events are collected in (typically GByte) files… In order to utilise additional resources and minimise data analysis time, Grid replication mechanisms are currently being used at the file level.”
Access to a database via Grid certificates
(Spitfire/OGSA-DAI) Replication of files on the Grid
(GDMP/Giggle) Replication and Optimisation Simulation
(Reptor/Optor)
“Data in particle physics is centred on events stored in a database… Groups of events are collected in (typically GByte) files… In order to utilise additional resources and minimise data analysis time, Grid replication mechanisms are currently being used at the file level.”
Access to a database via Grid certificates
(Spitfire/OGSA-DAI) Replication of files on the Grid
(GDMP/Giggle) Replication and Optimisation Simulation
(Reptor/Optor)
Servlet Container
SSLServletSocketFactory
TrustManager
Security Servlet
Does user specify role?
Map role to connection id
Authorization Module
HTTP + SSLRequest + client certificate
Yes
Role
Trusted CAsIs certificate signed
by a trusted CA?
No
Has certificatebeen revoked?
Revoked Certsrepository
Find default
No
Role repositoryRole ok?
Connectionmappings
Translator Servlet
RDBMS
Request a connection ID
ConnectionPool
2. Spitfire2. Spitfire
“Secure?”At the level required in Particle Physics
2. Database client API2. Database client API A database client API has been defined Implement as grid service using
standard web service technologies Ongoing development with OGSA-DAI
A database client API has been defined Implement as grid service using
standard web service technologies Ongoing development with OGSA-DAI
Talk: •“Project Spitfire - Towards Grid Web Service Databases”
3. GDMP and the Replica Catalogue3. GDMP and the Replica Catalogue
StorageElement1 StorageElement2 StorageElement3
Globus 2.0 Replica Catalogue (LDAP)
CentralisedLDAP based
GDMP 3.0 = File mirroring/replication toolOriginally for replicating CMS Objectivity files for High Level Trigger studies. Now used widely in HEP.
Replica Catalogue TODAY
3. Giggle: “Hierarchical P2P”3. Giggle: “Hierarchical P2P”
LRC LRC LRC
RLI
RLIRLI
LRC
Hierarchical indexing. The higher-level RLI contains pointers to lower-level RLIs or LRCs.
StorageElement
StorageElement
StorageElement
StorageElement
StorageElement
RLI = Replica Location Index
LRC = Local Replica Catalog
LRC
“Scalable?”Trade-off:ConsistencyVersusEfficiency
4. Reptor/Optor: File Replication/ Simulation4. Reptor/Optor: File Replication/ Simulation
Tests file replication strategies: e.g. economic model
Tests file replication strategies: e.g. economic model
ReplicaLocation
Index
Site
Replica Manager
StorageElement
ComputingElement
Optimiser
Resource Broker
User Interface
Pre-/Post-processing
Core API
Optimisation API
Processing API
LocalReplica
Catalogue
ReplicaLocation
Index
ReplicaMetadata CatalogueReplica
LocationIndex
Site
Replica Manager
StorageElement
ComputingElement
Optimiser
Pre-/Post-processing
LocalReplica
Catalogue
Reptor: Replica architectureOptor: Test file replication strategies: economic model
Reptor: Replica architectureOptor: Test file replication strategies: economic model
Demo and Poster: •“Studying Dynamic Grid Optimisation Algorithms for File Replication”
“Efficient?”Requires simulationStudies…
Application RequirementsApplication Requirements
“The current EMBL production database is 150 GB, which takes over four hours to download at full bandwidth capability at the EBI. The EBI's data repositories receive 100,000 to 250,000 hits per day with 20% from UK sites; 563 unique UK domains with 27 sites have more than 50 hits per day.” MyGrid Proposal
“The current EMBL production database is 150 GB, which takes over four hours to download at full bandwidth capability at the EBI. The EBI's data repositories receive 100,000 to 250,000 hits per day with 20% from UK sites; 563 unique UK domains with 27 sites have more than 50 hits per day.” MyGrid Proposal
Suggests: Less emphasis on
efficient data access and data hierarchy aspects (application specific).
Large gains in biological applications from efficient file replication.
Larger gains from application-specific replication?
Suggests: Less emphasis on
efficient data access and data hierarchy aspects (application specific).
Large gains in biological applications from efficient file replication.
Larger gains from application-specific replication?
Events.. to Files.. to EventsEvents.. to Files.. to Events
RAWRAW
ESDESD
AODAOD
TAGTAG
““Interesting Events List” Interesting Events List”
RAWRAW
ESDESD
AODAOD
TAGTAG
RAWRAW
ESDESD
AODAOD
TAGTAG
Tier-0Tier-0(International)(International)
Tier-1Tier-1(National)(National)
Tier-2Tier-2(Regional)(Regional)
Tier-3Tier-3(Local)(Local)
DataFiles
DataFiles
DataFiles
TAGData
DataFilesData
FilesDataFiles
RAWDataFile
DataFilesData
FilesESDData
DataFilesData
FilesAODData
Event 1 Event 2 Event 3
Not all pre-filtered events are interesting… Non pre-filtered events may be… File Replication Overhead.
Events.. to EventsEvent Replication and Query OptimisationEvents.. to EventsEvent Replication and Query Optimisation
RAWRAW
ESDESD
AODAOD
TAGTAG
““Interesting Events List” Interesting Events List”
RAWRAW
ESDESD
AODAOD
TAGTAG
RAWRAW
ESDESD
AODAOD
TAGTAG
Tier-0Tier-0(International)(International)
Tier-1Tier-1(National)(National)
Tier-2Tier-2(Regional)(Regional)
Tier-3Tier-3(Local)(Local)
Event 1 Event 2 Event 3
Knowledge“Stars in Stripes”
Distributed (Replicated)
Database
@#%&*!
Data Grid for the ScientistData Grid for the Scientist
E = mc2
Grid Middleware
…In order to get back to the real (or simulated) data.
Incremental
Process…Level of the metadata? file?… event?… sub-event?…
SummarySummary
Yesterday’s data access issues are still here They just got bigger (by a factor 100) Data Hierarchy is required to access more data more
efficiently… insufficientToday’s Grid tools are developing rapidly
Enable replicated file access across the grid File replication standard (lfn:\\, pfn:\\) Emerging standards for Grid Data Access..
Tomorrow “.. never knows”
Replicated “Events” on the Grid?.. Distributed databases?.. or did that diagram look a little too monolithic?
Yesterday’s data access issues are still here They just got bigger (by a factor 100) Data Hierarchy is required to access more data more
efficiently… insufficientToday’s Grid tools are developing rapidly
Enable replicated file access across the grid File replication standard (lfn:\\, pfn:\\) Emerging standards for Grid Data Access..
Tomorrow “.. never knows”
Replicated “Events” on the Grid?.. Distributed databases?.. or did that diagram look a little too monolithic?