Dirk Düllmann [email protected] CCS Meeting, April 18 th 2002

28
Summary of the Summary of the Persistency RTAG Report Persistency RTAG Report ( heavily based on David Malon’s slides heavily based on David Malon’s slides) - - and some personal remarks and some personal remarks Dirk Düllmann Dirk Düllmann [email protected] [email protected] CCS Meeting, April 18 CCS Meeting, April 18 th th 2002 2002

description

Summary of the Persistency RTAG Report ( heavily based on David Malon’s slides ) - and some personal remarks. Dirk Düllmann [email protected] CCS Meeting, April 18 th 2002. SC2 mandate to the RTAG. - PowerPoint PPT Presentation

Transcript of Dirk Düllmann [email protected] CCS Meeting, April 18 th 2002

Page 1: Dirk Düllmann dirk.duellmann@cern.ch CCS Meeting, April 18 th  2002

Summary of the Summary of the Persistency RTAG ReportPersistency RTAG Report

((heavily based on David Malon’s slidesheavily based on David Malon’s slides))

- - and some personal remarksand some personal remarks

Dirk DüllmannDirk Dü[email protected]@cern.ch

CCS Meeting, April 18CCS Meeting, April 18thth 2002 2002

Page 2: Dirk Düllmann dirk.duellmann@cern.ch CCS Meeting, April 18 th  2002

SC2 mandate to the RTAGSC2 mandate to the RTAG

Write the product specification for the Persistency Write the product specification for the Persistency Framework for Physics Applications at LHCFramework for Physics Applications at LHC

Construct a component breakdown for the Construct a component breakdown for the management of all types of LHC datamanagement of all types of LHC data

Identify the responsibilities of Experiment Identify the responsibilities of Experiment Frameworks, existing products (such as ROOT) and Frameworks, existing products (such as ROOT) and as yet to be developed productsas yet to be developed products

Develop requirements/use cases to specify (at least) Develop requirements/use cases to specify (at least) the metadata /navigation component(s)the metadata /navigation component(s)

Estimate resources (manpower) needed to prototype Estimate resources (manpower) needed to prototype missing componentsmissing components

Page 3: Dirk Düllmann dirk.duellmann@cern.ch CCS Meeting, April 18 th  2002

Guidance from the SC2Guidance from the SC2

The RTAG may decide to address all types The RTAG may decide to address all types of data, or may decide to postpone some of data, or may decide to postpone some topics for other RTAGS, once the topics for other RTAGS, once the components have been identified.components have been identified.

The RTAG should develop a detailed The RTAG should develop a detailed description at least for the event data description at least for the event data management. management.

Issues of schema evolution, dictionary Issues of schema evolution, dictionary construction and storage, object and data construction and storage, object and data models should be addressed.models should be addressed.

Page 4: Dirk Düllmann dirk.duellmann@cern.ch CCS Meeting, April 18 th  2002

RTAG CompositionRTAG Composition One member from each experiment, one from IT/DB, One member from each experiment, one from IT/DB,

one from ROOT team:one from ROOT team:– Fons Rademakers Fons Rademakers (Alice)(Alice)– David Malon David Malon (ATLAS)(ATLAS)– Vincenzo Innocente Vincenzo Innocente (CMS)(CMS)– Pere Mato Pere Mato (LHCb)(LHCb)– Dirk Düllmann Dirk Düllmann (IT/DB)(IT/DB)– Rene Brun Rene Brun (ROOT)(ROOT)

Quoting Vincenzo’s report at CMS Week (6 March 02)Quoting Vincenzo’s report at CMS Week (6 March 02)

““Collaborative, friendly atmosphere”Collaborative, friendly atmosphere”

““Real effort to define a common product”Real effort to define a common product”

This is already an accomplishment.This is already an accomplishment.

Page 5: Dirk Düllmann dirk.duellmann@cern.ch CCS Meeting, April 18 th  2002

Response of RTAG to mandate Response of RTAG to mandate and guidance (excerpted from and guidance (excerpted from

report)report)

Intent of this RTAG is to assume an Intent of this RTAG is to assume an optimistic posture regarding the potential optimistic posture regarding the potential for commonalityfor commonality among the LHC among the LHC experiments experiments in all areasin all areas related to data related to data management management

Limited time available to the RTAG Limited time available to the RTAG precludes treatment of all components of a precludes treatment of all components of a data management architecture at equal data management architecture at equal depth depth – will propose areas in which further work, and will propose areas in which further work, and

perhaps additional RTAGs, will be neededperhaps additional RTAGs, will be needed

Page 6: Dirk Düllmann dirk.duellmann@cern.ch CCS Meeting, April 18 th  2002

Response of RTAG to mandate Response of RTAG to mandate and guidance (excerpted from and guidance (excerpted from

report)report)

Consonant with SC2 guidance, the RTAG has Consonant with SC2 guidance, the RTAG has chosen to chosen to focusfocus its its initial discussions oninitial discussions on the the architecture of a architecture of a persistence management persistence management serviceservice based upon a based upon a common streaming layer, common streaming layer, and and onon the associated services the associated services needed to support needed to support itit– Even if we cannot accomplish everything we aspire to, Even if we cannot accomplish everything we aspire to,

we want to we want to ensureensure that we have provided a that we have provided a solid solid foundation for a near-term common projectfoundation for a near-term common project

While our aim is to define components and their While our aim is to define components and their interactions in terms of abstract interfaces that interactions in terms of abstract interfaces that any implementation must respect, it is any implementation must respect, it is notnot our our intention to produce a design that intention to produce a design that requirerequires a s a clean-slate implementation clean-slate implementation

Page 7: Dirk Düllmann dirk.duellmann@cern.ch CCS Meeting, April 18 th  2002

Response of RTAG to mandate Response of RTAG to mandate and guidance (excerpted from and guidance (excerpted from

report)report)

For the streaming layer and related For the streaming layer and related services, we plan to provide a foundation for services, we plan to provide a foundation for an initial common project that can be an initial common project that can be based based upon the capabilities of existing upon the capabilities of existing implementationsimplementations, and upon , and upon ROOTROOT’s I/O ’s I/O capabilities capabilities in particularin particular

While new capabilities required of an initial While new capabilities required of an initial implementation should not be daunting, we implementation should not be daunting, we do not wish at this point to underestimate do not wish at this point to underestimate the amount of the amount of repackaging and refactoring repackaging and refactoring work requiredwork required to support common project to support common project requirements requirements

Page 8: Dirk Düllmann dirk.duellmann@cern.ch CCS Meeting, April 18 th  2002

StatusStatus

Reasonable agreement on design criteria, e.g., Reasonable agreement on design criteria, e.g., – Component orientedComponent oriented, communication through abstract , communication through abstract

interfaces, no back channels, components make no interfaces, no back channels, components make no assumptions about implementation technology of assumptions about implementation technology of components with which they communicatecomponents with which they communicate

– Persistence for C++ data models is the principal targetPersistence for C++ data models is the principal target, , but our environments are already multilingual; should but our environments are already multilingual; should avoid constructions that make language migration and avoid constructions that make language migration and multi-language support difficult multi-language support difficult

– Architecture should Architecture should not preclude multiple persistence not preclude multiple persistence technologiestechnologies

– Experiments’ transient data models should Experiments’ transient data models should not need not need compile-time/link-time dependencies on persistence compile-time/link-time dependencies on persistence technologytechnology in order to use persistence services in order to use persistence services

Page 9: Dirk Düllmann dirk.duellmann@cern.ch CCS Meeting, April 18 th  2002

Status IIStatus II Reasonable agreement on design criteria, e.g., Reasonable agreement on design criteria, e.g.,

– Transient object types may have several persistent representationsTransient object types may have several persistent representations, , the type of a transient object restored from a persistent one may be the type of a transient object restored from a persistent one may be different than the type of the object that was saved, a persistent different than the type of the object that was saved, a persistent object cannot assume it “knows” what type of transient object will object cannot assume it “knows” what type of transient object will be built from itbe built from it

– ……more…more… Component discussions and requirement discussions have Component discussions and requirement discussions have

been uneven—extremely detailed and highly technical in been uneven—extremely detailed and highly technical in some areas, with other areas neglected thus far for lack of some areas, with other areas neglected thus far for lack of timetime

Primary focus has beenPrimary focus has been on issues and components on issues and components involved in defining involved in defining a common persistence servicea common persistence service – Cache manager, persistence manager, storage manager, streamer Cache manager, persistence manager, storage manager, streamer

service, placement service, dictionary service(s), …service, placement service, dictionary service(s), …– Object identification, navigation, …Object identification, navigation, …

Page 10: Dirk Düllmann dirk.duellmann@cern.ch CCS Meeting, April 18 th  2002

The proposed initial projectThe proposed initial project

Charge to the initial project Charge to the initial project is to deliver is to deliver the components of a common file-based the components of a common file-based streaming layerstreaming layer sufficient to support sufficient to support persistence for all four experiments’ event persistence for all four experiments’ event models, with management of the resulting models, with management of the resulting files hosted in a relational layerfiles hosted in a relational layer– Elaboration of what this means appears in the Elaboration of what this means appears in the

reportreport– Note that persistence service is intended to Note that persistence service is intended to

support all kinds of datasupport all kinds of data—it is not specific to —it is not specific to event dataevent data

Page 11: Dirk Düllmann dirk.duellmann@cern.ch CCS Meeting, April 18 th  2002

Initial project componentsInitial project components

The specification in the report describes The specification in the report describes agreed-upon common project components, agreed-upon common project components, including including – persistence manager persistence manager – placement service placement service – streaming service streaming service – storage manager storage manager – externalizable technology-independent references externalizable technology-independent references – services to support event collectionsservices to support event collections– connections to grid-provided replica management connections to grid-provided replica management

and LFN/PFN servicesand LFN/PFN services

Page 12: Dirk Düllmann dirk.duellmann@cern.ch CCS Meeting, April 18 th  2002

Initial project implementation Initial project implementation technologiestechnologies

Streaming layer should be implemented Streaming layer should be implemented using the ROOT framework’s I/O servicesusing the ROOT framework’s I/O services

Components with relational Components with relational implementations should make no deep implementations should make no deep assumptions about the underlying assumptions about the underlying technologytechnology– Nothing intentionally proposed that precludes Nothing intentionally proposed that precludes

implementation using such open source implementation using such open source products as MySQLproducts as MySQL

Page 13: Dirk Düllmann dirk.duellmann@cern.ch CCS Meeting, April 18 th  2002

RTAG’s First Component RTAG’s First Component Diagram (under discussion)Diagram (under discussion)

DictionarySvcDictionarySvcStreamerSvcStreamerSvcStreamerSvcStreamerSvc

PersistencyMgrPersistencyMgr

I Refl ectionStreamerSvcStreamerSvc DictionarySvcDictionarySvc

StorageMgrStorageMgr

CacheMgrCacheMgr

I PRefl ection

PlacementSvcPlacementSvcI Placement

I Cnv

I ReadWrite

I Pers

C++

Page 14: Dirk Düllmann dirk.duellmann@cern.ch CCS Meeting, April 18 th  2002

Persistence ManagerPersistence Manager

Principal point of contact between Principal point of contact between experiment-specific frameworks and experiment-specific frameworks and persistence servicespersistence services

Handles requests to store state of an Handles requests to store state of an object, returning a token (e.g., a persistent object, returning a token (e.g., a persistent address)address)

Handles requests to retrieve state of an Handles requests to retrieve state of an object corresponding to a tokenobject corresponding to a token

Like Gaudi/Athena persistence serviceLike Gaudi/Athena persistence service

Page 15: Dirk Düllmann dirk.duellmann@cern.ch CCS Meeting, April 18 th  2002

Dictionary servicesDictionary services

Manages descriptions of transient (persistence-Manages descriptions of transient (persistence-capable) classes and their persistent representationscapable) classes and their persistent representations

Interface to obtain reflective interface about classesInterface to obtain reflective interface about classes Entries in dictionary may be generated by disparate Entries in dictionary may be generated by disparate

sourcessources– Rootcint, ADL, LHCb XML, …Rootcint, ADL, LHCb XML, …

With automatic converter/streamer generation, likely With automatic converter/streamer generation, likely that some persistent representations will be that some persistent representations will be derivable from transient representation, but derivable from transient representation, but possibility of multiple persistent representations possibility of multiple persistent representations suggests separate dictionariessuggests separate dictionaries

Page 16: Dirk Düllmann dirk.duellmann@cern.ch CCS Meeting, April 18 th  2002

Streaming or conversion serviceStreaming or conversion service

Consults transient and persistent Consults transient and persistent representation dictionaries to produce representation dictionaries to produce persistent (or “persistence-ready”) persistent (or “persistence-ready”) representation of a transient object, or representation of a transient object, or vice versavice versa

Page 17: Dirk Düllmann dirk.duellmann@cern.ch CCS Meeting, April 18 th  2002

Placement servicePlacement service

Supports runtime control of physical Supports runtime control of physical placement, equivalent to “where” hints in placement, equivalent to “where” hints in ODMGODMG

Intended to support physical clustering Intended to support physical clustering within in event, and to separate events within in event, and to separate events written to different physics streamswritten to different physics streams

Interface seen by experiments’ application Interface seen by experiments’ application frameworks is independent of persistence frameworks is independent of persistence technologytechnology

Insufficiently specified in the RTAG reportInsufficiently specified in the RTAG report

Page 18: Dirk Düllmann dirk.duellmann@cern.ch CCS Meeting, April 18 th  2002

References and reference References and reference servicesservices

Common class for encapsulating persistent Common class for encapsulating persistent addresses addresses

ExternalizableExternalizable Independent of storage technology, likely Independent of storage technology, likely

with an opaque payload that is with an opaque payload that is technology-specifictechnology-specific

In hybrid model, intent is that from a Ref, In hybrid model, intent is that from a Ref, one can determine what “file” is needed one can determine what “file” is needed without consulting the particular storage without consulting the particular storage technologytechnology

Page 19: Dirk Düllmann dirk.duellmann@cern.ch CCS Meeting, April 18 th  2002

Store managerStore manager

Stores and retrieves variable-length Stores and retrieves variable-length stream of bytesstream of bytes

Deals with issues at the file levelDeals with issues at the file level

Page 20: Dirk Düllmann dirk.duellmann@cern.ch CCS Meeting, April 18 th  2002

RefRefLFN serviceLFN service

Translates an Object Reference into the Translates an Object Reference into the logical filename of the file containing the logical filename of the file containing the referenced objectsreferenced objects

Expected that Ref will have some kind of Expected that Ref will have some kind of file id, that can be used to determine file id, that can be used to determine logical file name in the grid senselogical file name in the grid sense

Page 21: Dirk Düllmann dirk.duellmann@cern.ch CCS Meeting, April 18 th  2002

LFNLFN{PFN{PFN}PosixName service}PosixName service

Translate a logical filename into the posix Translate a logical filename into the posix name of one physical replica of this filename of one physical replica of this file

Expect to get this from grid projects, Expect to get this from grid projects, though common project may need to though common project may need to deliver a service that hides several deliver a service that hides several possible paths behind a single interfacepossible paths behind a single interface

Page 22: Dirk Düllmann dirk.duellmann@cern.ch CCS Meeting, April 18 th  2002

Event collection servicesEvent collection services

Support for explicit event collections (not Support for explicit event collections (not just collections by containment)just collections by containment)– Support for collections of event referencesSupport for collections of event references

Queryable collections: Like a list of Queryable collections: Like a list of events, together with queryable tags events, together with queryable tags – Possibly indexedPossibly indexed

Page 23: Dirk Düllmann dirk.duellmann@cern.ch CCS Meeting, April 18 th  2002

MetaDataCatalog

DictionarySvcStreamerSvcStreamerSvc

PersistencyMgr

IReflectionStreamerSvc DictionarySvc

StorageMgr

IPReflection

FileCatalog

ICnv

IReadWrite

C++

CacheMgr

ICache

TFile,TDirectoryTSocket

TClass, etc.

TBuffer, TMessage, TRef, TKey

TGrid

TTree

TStreamerInfo

IteratorSvc TChainTEventListTDSet

IPers

IFCatalog

SelectorSvc

IMCatalog

PlacementSvcIPlacement

TFile

CustomCacheMgrIPers

One possible One possible

mapping to a ROOTmapping to a ROOT

implementationimplementation

(under discussion)(under discussion)

Page 24: Dirk Düllmann dirk.duellmann@cern.ch CCS Meeting, April 18 th  2002

Clarify Resources, Clarify Resources, Responsibilities & RisksResponsibilities & Risks

Expected resources at project start and their evolutionExpected resources at project start and their evolution– Commitments from experiments, the ROOT team and ITCommitments from experiments, the ROOT team and IT

Who does what (and until when)?Who does what (and until when)?– Who develops which software component(s)?Who develops which software component(s)?– Who maintains those components afterwards?Who maintains those components afterwards?– Who develops production services around those?Who develops production services around those?

What is the procedure for dropping any of those What is the procedure for dropping any of those services?services?– Any “in house” development involves a significant maintenance Any “in house” development involves a significant maintenance

commitment by somebody and risk for somebody elsecommitment by somebody and risk for somebody else– Need to agree on these commitmentsNeed to agree on these commitments

Page 25: Dirk Düllmann dirk.duellmann@cern.ch CCS Meeting, April 18 th  2002

Components & InterfacesComponents & Interfaces

Need to rapidly agree on concrete component Need to rapidly agree on concrete component interfacesinterfaces– Could we classify/prioritize interfaces by risk?Could we classify/prioritize interfaces by risk?

eg by damage caused by a later interface changeeg by damage caused by a later interface change– At least the major (aka external) interfaces need the official At least the major (aka external) interfaces need the official

blessing of the Architects’ Forum and can not be modified blessing of the Architects’ Forum and can not be modified without it’s agreementwithout it’s agreement

No component infrastructure defined so farNo component infrastructure defined so far Component inheritance hierarchy, component factories, Component inheritance hierarchy, component factories,

component mapping to shared libraries etc.component mapping to shared libraries etc. LCG approach needs to be “compatible” with LCG approach needs to be “compatible” with

– several LCG sub-projectsseveral LCG sub-projects– several different experiment frameworksseveral different experiment frameworks– several existing HEP packages eg ROOT I/O several existing HEP packages eg ROOT I/O – several RDBMS implementationsseveral RDBMS implementations

May need to assume some instability until solid foundation is May need to assume some instability until solid foundation is accepted for LCG applications areaaccepted for LCG applications area

Page 26: Dirk Düllmann dirk.duellmann@cern.ch CCS Meeting, April 18 th  2002

Components & Interface cont.Components & Interface cont.

Fast adoption of existing “interface” Fast adoption of existing “interface” classes may be tempting but is also very classes may be tempting but is also very riskyrisky– Should not just bless existing header files Should not just bless existing header files

which were conceived as implementation which were conceived as implementation headersheaders

– Should take the opportunity to (re-) design a Should take the opportunity to (re-) design a minimal, but complete set of abstract minimal, but complete set of abstract interfaces interfaces

– And then implement them using existing And then implement them using existing technologytechnology

Page 27: Dirk Düllmann dirk.duellmann@cern.ch CCS Meeting, April 18 th  2002

Timescales, Functionality & Timescales, Functionality & Technologies of an Initial Technologies of an Initial

PrototypePrototype Experiments have somewhat differing timescales for their first Experiments have somewhat differing timescales for their first

use of LCG componentsuse of LCG components– Synchronization of initial release schedule would definitely improve Synchronization of initial release schedule would definitely improve

chances of successchances of success

Experiments may favor different subsets of full functionality for Experiments may favor different subsets of full functionality for a first prototypea first prototype– Need to agree on main requirements for prototype s/w and Need to agree on main requirements for prototype s/w and

associated services to guide implementation and technology associated services to guide implementation and technology choiceschoices

– Synchronization of feature content and implementation Synchronization of feature content and implementation technology is requiredtechnology is required

Which RDBMS backend? What are the deployment requirement? Which RDBMS backend? What are the deployment requirement? – ““lightweight” system (end-user managed) - maybe reduced requirements on lightweight” system (end-user managed) - maybe reduced requirements on

scalability and fault tolerance and even on functionalityscalability and fault tolerance and even on functionality– Fully managed production system - based on established database services Fully managed production system - based on established database services

(incl. backup, recovery from h/w fault …)(incl. backup, recovery from h/w fault …) May need prototype implementation for both May need prototype implementation for both

Page 28: Dirk Düllmann dirk.duellmann@cern.ch CCS Meeting, April 18 th  2002

SummarySummary

Persistency RTAG has delivered its final report to the Persistency RTAG has delivered its final report to the SC2SC2– Significant agreement on requirements and a component Significant agreement on requirements and a component

breakdown have been achievedbreakdown have been achieved– Report does not define all components and their interaction in Report does not define all components and their interaction in

full or equal depthfull or equal depth– A common project on object persistency is proposedA common project on object persistency is proposed

Possible next stepsPossible next steps– Clarify available resources and responsibilitiesClarify available resources and responsibilities– Agree on scope, timescale and deployment model of first Agree on scope, timescale and deployment model of first

project prototypeproject prototype– Rapidly agree on concrete set of component interfaces and Rapidly agree on concrete set of component interfaces and

spawn work packages to implement them spawn work packages to implement them – Continue to resolve remaining open questionsContinue to resolve remaining open questions