Status Report of the RD45 Project z Based on LCB Review, November 1999.

23
Status Report of the RD45 Project Based on LCB Review, November 1999

Transcript of Status Report of the RD45 Project z Based on LCB Review, November 1999.

Page 1: Status Report of the RD45 Project z Based on LCB Review, November 1999.

Status Report of the RD45 Project

Based on LCB Review, November 1999

Page 2: Status Report of the RD45 Project z Based on LCB Review, November 1999.

Overview

Initial Goals of the ProjectSummary of April 1998 LCB ReviewWork on MilestonesEvolution of O(R)DBMS marketRisk Analysis: Summary and

ConclusionsFuture ActivitiesSummary

Page 3: Status Report of the RD45 Project z Based on LCB Review, November 1999.

RD45: IntroductionProposed in 1994; approved early 1995Goal: provide persistency for LHC data

• Objects: event, calibrations, histograms, etc.

Assumed: O-O environment, C++ (initially)• Now also Java, including interoperability issues

Emphasis: standards (OMG, ODMG, ...), potential use of commercial solutions

Anticipated 3 project phases:• Requirements gathering, limited prototyping• Detailed prototyping and evaluation• Implementation

Page 4: Status Report of the RD45 Project z Based on LCB Review, November 1999.

Guiding Principles “In particular, the data should be presented in as consistent a

way as possible. The data themselves may be stored in a variety of formats but this should be hidden from the user…”

"The ODMG ... binding is based on one fundamental principle: the programmer should perceive the binding as a single language for expressing both database and programming operations, not two separate languages with arbitrary boundaries between them.“

Capability of scaling to LHCdata volumes & rates

Capable of satisfying wide variety of HEP needs DAQ, SIM, REC, Analysis, ...

Use of “standard”, widely-usedsolutions if applicable

CMS

Page 5: Status Report of the RD45 Project z Based on LCB Review, November 1999.

Phase 2 - Milestones

Impact of using an ODBMS Object model, physical data organisation, use of CASE tools, 3rd

party class libraries, C++ application code

Evaluation of ODBMS features & suitability for HEP Schema Evolution, Object Versioning, Data Replication

Performance comparisons with existing solutions PAW + Ntuples

Use of ODBMS for typical simulation, reconstruction and analysis scenarios with data volumes of up to 1TB.

Impact of ODBMS on end-user physicist. (Including private schema & collections for simulation, reconstruction and analysis.)

Demonstrate the feasibility of using an ODBMS and MSS at data rates sufficient for ATLAS and CMS 1997 test-beam requirements.

1

9

9

6

1

9

9

7

Page 6: Status Report of the RD45 Project z Based on LCB Review, November 1999.

LCB Review, April 1998

“The project has achieved the initial R&D goal of investigating and identifying potential solutions to the problem of persistent data storage for LHC experiments.”

“The proposed solution: ODBMS (Objectivity/DB) is now adopted for data persistency not only by all the LHC experiments but by many others (BaBar, NA45, COMPASS, RHIC) ready to take data in 1-2 years.”

No longer valid

(except ALICE)

Page 7: Status Report of the RD45 Project z Based on LCB Review, November 1999.

Milestones (April 98) Provide, together with the IT/PDP group, production data

management services based on Objectivity/DB and HPSS with sufficient capacity to solve the requirements of ATLAS and CMS test beam and simulation needs, COMPASS and NA45 tests for their '99 data taking runs.

Develop and provide appropriate database administration tools, (meta-)data browsers and data import/export facilities, as required for

Develop and provide production versions of the HepODBMS class libraries, including reference and end-user guides.

Continue R&D, based on input and use cases from the LHC collaborations to produce results in time for the next versions of the collaborations' Computing Technical Proposals (end 1999).

Page 8: Status Report of the RD45 Project z Based on LCB Review, November 1999.

Production Servers setup; used for CDR and other activities. Milestones: ATLAS: 1TB, CMS: 100MB/s. COMPASS, CHORUS, NA45, others, ...

Federated DB Backup tool developed (based on multiple FDs); numerous DB browsers (CERN DRO_Tool, SLAC BDB, Micram Hudson, …), DB Import/Export based on SLAC model

New release of HepODBMS including scalable event collections + import of BaBar “conditions DB”. Revised user doc (XML) + ref. manual (DOC++), CSC tutorials + examples

R&D activities: Database usage over a wide area network Clustering and re-clustering strategies Multi-user, multi-federation issues Database integration with MSS

Milestones - Results

MONARC

Examples follow...

Page 9: Status Report of the RD45 Project z Based on LCB Review, November 1999.
Page 10: Status Report of the RD45 Project z Based on LCB Review, November 1999.

Multi-user, Multi-FD Issues

Multiple FDs used mainly to workaround limitations in Objectivity/DB, e.g. lock contention on global resources (catalogue)

e.g. online / offline systems (BaBar, CMS, …)

lack of private schema/cataloguee.g. user schema / data

lack of securitysee Objy V5.2

lack of support of partial backupsdescribed above

One of the main issues for with Objy meeting in late Feb

Page 11: Status Report of the RD45 Project z Based on LCB Review, November 1999.

Multi-FD Example: CMS

DB1DB1 DB2DB2 DB3DB3 DBnDBnDB1DB1 DB2DB2 DB3DB3 DBnDBn

RunDBRunDB

LogDBLogDB

BConfDBBConfDB Prod Prod FDFD

Prod BootProd Boot Offline - cmsc01Offline - cmsc01

Clone FDClone FD

OnlineOnlineProd BootProd Boot

Prod Prod FDFD

LogDBLogDB

BConfDBBConfDB

RunDBRunDB

Page 12: Status Report of the RD45 Project z Based on LCB Review, November 1999.

Multiple FDs & User Data

Production FD cloned by usersUsers can add private data / schemaCan share dataScalability?

Objects usingObjects usingnew Schemanew Schema

User1User1FDFD

User1 BootUser1 Boot

PrivatePrivateSchemaSchema

User2User2FDFD

U2DB1U2DB1

U2DB2U2DB2

User2 BootUser2 Boot

U1DB1U1DB1

U1DB2U1DB2

Prod Prod FDFD

DB1DB1 DB2DB2 DB3DB3 DBnDBn

Clone FDClone FD

Prod BootProd Boot

Approach also usedby other large Objy users, e.g. COMPASS Space telescope

Page 13: Status Report of the RD45 Project z Based on LCB Review, November 1999.

Federation Backup Procedure

Is production FD in a consistent state?Copy all relevant DB filesInstall on backup FDCheck consistencyCopy to tape

Fallback Lock ServerFallback Lock Server

DB5DB5 DB7DB7DB6DB6

Disk Server ADisk Server A

FederationFederationCatalogueCatalogue& Schema& Schema

DB 1DB 1

DB3DB3

DB2DB2

Production Lock ServerProduction Lock Server

FederationFederationCatalogueCatalogue& Schema& Schema

DB 1DB 1

DB3DB3

DB2DB2

DB5DB5

DB7DB7

DB6DB6

HPSS Controlled TapesHPSS Controlled Tapes

DB8DB8

DB10DB10

DB9DB9

Backup TapesBackup Tapes

FederationFederationCatalogueCatalogue& Schema& Schema

DB 1DB 1

DB3DB3

DB2DB2

2. Copy toTape

1.FD Copy & Test

Backup TapesBackup Tapes

FederationFederationCatalogueCatalogue& Schema& Schema

DB 1DB 1

DB3DB3

DB2DB2

Backup Tapes 1.7.99Backup Tapes 1.7.99

FederationFederationCatalogueCatalogue& Schema& Schema

DB 1DB 1

DB3DB3

DB2DB2

Partial backup procedure high on wish-list of Objy customers (ETF)

Page 14: Status Report of the RD45 Project z Based on LCB Review, November 1999.

Database Production Service - What is missing?

Transparent non-blocking interface with MSSUser capability to:

export, extract, replicate data and schema manipulate data and schema outside

production database and while accessing data and schema from production database

Fully functional, reliable high-quality database system including VLDB support (>>1PB) management tools

From L. Silvestris: Review of application software services for the LHC era, FOCUS 07/10/99

Objy V5.2

Objy V6?

BaBar

Page 15: Status Report of the RD45 Project z Based on LCB Review, November 1999.

O(R)DBMS Evolution

From CMS Computing Technical Proposal:

“If the ODBMS industry flourishes it is very likely that by 2005 CMS will be able to obtain products, embodying thousands of man-years of work, that are well matched to its worldwide data management and access needs. The cost of such products to CMS will be equivalent to at most a few man-years. We believe that the ODBMS industry and the corresponding market are likely to flourish. However, if this is not the case, a decision will have to be made in approximately the year 2000 to devote some tens of man-years of effort to the development of a less satisfactory data management system for the LHC experiments.”

Page 16: Status Report of the RD45 Project z Based on LCB Review, November 1999.

RDBMS + “object extensions” Can store ADTs “Methods” on server

Complex Data with Queries

$8B in 1996Likely to become

dominant DBMS technology

Complex DataPerformance,

scalabilityTight Language

Binding OQL - SQL3 query

subset

Growth similar to RDBMS in ’80s

~$1B market by 2001

ODBMS / RDBMS / ORDBMS

~$100M?

Page 17: Status Report of the RD45 Project z Based on LCB Review, November 1999.

Risk Analysis: Issues

Choice of Technology ODBMS, ORDBMS, RDBMS, light-weight

POM, files + meta-data etc.Choice of Vendor

#1 Objectivity, #2 VersantThe Home-Grown approach

Estimate resources requiredImplies proof-of-concept prototype

Versant

Page 18: Status Report of the RD45 Project z Based on LCB Review, November 1999.

Risk Analysis:Summary of Options Evaluate C++ binding to e.g. ORACLE Add ESCROW clause to Objectivity contract Pursue possibility of source license Visit key Objectivity customers Produce new requirements list Estimate manpower to support Objy in house Estimate manpower for “clean-sheet”

solution Continue to monitor alternatives

The LCB agrees with the other suggested steps to mitigate risk, with the addition of trying to insure that user code in reconstruction and analysis programs is kept as standards compliant as possible.

Page 19: Status Report of the RD45 Project z Based on LCB Review, November 1999.

Risk Analysis: Conclusions

A solution is certainly possible! How much should we align ourselves with

industry trends / standards?ODBMS unlikely to dominate DBMS market

Likely to survive foreseeable future - market!Need to complete current prototype to

make meaningful manpower estimates Target: end-1999; present at this workshop!

Page 20: Status Report of the RD45 Project z Based on LCB Review, November 1999.

Future Activities

Production ServicesConsidered essential

by several experiments

Tools, documentation, regular releases, … general production

level support

Push for VLDB and other enhancements

“2001” milestoneRevise requirementsVisit other HEP labs

(BNL, FNAL, SLAC, …)Provide ODBMS-

independent s/w layerEstimate man-power for

alternative POMEvaluate ORDBMS

technology

Feb. meeting at Objy

Page 21: Status Report of the RD45 Project z Based on LCB Review, November 1999.

Summary (+)

We have a good understanding of ODBMS technology & Objectivity/DB in particular

System has been demonstrated to work in production up to level of today’s (BaBar) experiments

Many enhancements have been delivered, others in pipeline

Production experience will be invaluable for LHC (product enhancements, tools, etc.)

Page 22: Status Report of the RD45 Project z Based on LCB Review, November 1999.

Summary (-)

The ODBMS market has not taken off as was previously predicted

We need to assure ourselves that there is sufficient non-HEP demand (and $$$)

We need to (in any case) understand how an eventual migration could be handled

We need to develop at least one realistic fallback scenario

Page 23: Status Report of the RD45 Project z Based on LCB Review, November 1999.

Conclusions

R&D phase of RD45 has now led to production ODBMS services

Risks of current strategy well understood - risk management must continue

We are well placed to prepare for “2001 milestone”

Future focus: Production Road-map to 2001 and beyond