Event Data History David Adams BNL Atlas Software Week December 2001.

27
Event Data History David Adams BNL Atlas Software Week December 2001

Transcript of Event Data History David Adams BNL Atlas Software Week December 2001.

Page 1: Event Data History David Adams BNL Atlas Software Week December 2001.

Event Data History

David Adams

BNL

Atlas Software Week

December 2001

Page 2: Event Data History David Adams BNL Atlas Software Week December 2001.

Atlas software week December 2001Event Data History David Adams BNL 2

ContentsDefinitions

Event Processing

Use cases

Requirements

Design issues

Status

Future

Page 3: Event Data History David Adams BNL Atlas Software Week December 2001.

Atlas software week December 2001Event Data History David Adams BNL 3

DefinitionsData object

• Unit of data transfer to and from data store

Event data object (EDO)• Data object associated with a particular event• Restrict to the highest level (not subobjects)

Event data contained object• Objects contained in EDO’s• E.g. cluster, track or electron• Only found in and accessed through an EDO

Page 4: Event Data History David Adams BNL Atlas Software Week December 2001.

Atlas software week December 2001Event Data History David Adams BNL 4

Definitions (cont)Event

• A collection of EDO’s (and their histories) associated with one beam crossing

• Plus virtual EDO’s (histories without data)• Not necessarily all such data anywhere• Depends on scope; for example:

– All EDO’s in a file– All EDO’s accessible at a site– All EDO’s registered in central store– Someone’s view of the data– All EDO’s visible to an algorithm

Page 5: Event Data History David Adams BNL Atlas Software Week December 2001.

Atlas software week December 2001Event Data History David Adams BNL 5

Definitions (cont)Algorithm

• Event data is processed by running a series of algorithms

• Input to an algorithm are EDO’s and possibly non-event data such as geometry or calibration

• Output is typically one EDO (can be more)• Characterized by type, version and a collection

of run-time parameters• Similar to the Gaudi Algorithm class except

does not include specification of input data– Gaudi definition might depend on event

Page 6: Event Data History David Adams BNL Atlas Software Week December 2001.

Atlas software week December 2001Event Data History David Adams BNL 6

Definitions (cont)Parent EDO

• Each EDO is constructed by an algorithm from a well-specified collection of input EDO’s

• These input EDO’s are the parents• Each EDO has a well defined ancestry

– Parents

– Parents of parents

– And so on

Page 7: Event Data History David Adams BNL Atlas Software Week December 2001.

Atlas software week December 2001Event Data History David Adams BNL 7

Definitions (cont)Replicated data

• Copy a collection of EDO’s• Levels:

– File replication

– EDO replication (more difficult)

Regenerated data• Data reconstructed by providing input and then

running a collection of algorithms equivalent to those used in the original data generation

• Regenerate at EDO level

Page 8: Event Data History David Adams BNL Atlas Software Week December 2001.

Atlas software week December 2001Event Data History David Adams BNL 8

Definitions (cont)Event data history

• Includes– Input data

– Algorithms used to produce the data

– Run-time environment

– Nonessential information (time stamp…)

• Provide history for each EDO– Combine with ancestor histories to recover the full

production chain

– Share information to save space

Page 9: Event Data History David Adams BNL Atlas Software Week December 2001.

Atlas software week December 2001Event Data History David Adams BNL 9

Event Processing

RawData

TrackClusters

Tracks 1 Tracks 2

Find 1Cluster Refit

Find 2

Tracks 3

EDO + history

Algorithm

constnon-const

Page 10: Event Data History David Adams BNL Atlas Software Week December 2001.

Atlas software week December 2001Event Data History David Adams BNL 10

Use cases1. Check history

• User wishes to discover the track fitting algorithm used for an electron

• Electron consists of a track and an EM cluster– The track and EM cluster EDO’s are parents of the

electron EDO

• The history for the electron EDO is used to find the history for the parent track EDO

• The specification of the track fitting algorithm is obtained from the history of the track EDO

Page 11: Event Data History David Adams BNL Atlas Software Week December 2001.

Atlas software week December 2001Event Data History David Adams BNL 11

Use cases (cont)2. Select on history

• User has a collection of track EDO’s and wishes to find those generated with particular algorithm characteristics

• User iterates over the associated EDO histories– Extract the algorithm data for each

– Save histories with matching algorithm characteristics

• User fetches the EDO’s associated with the saved histories

Page 12: Event Data History David Adams BNL Atlas Software Week December 2001.

Atlas software week December 2001Event Data History David Adams BNL 12

Use cases (cont)3. Virtual (regenerated) data

• User wishes to reproduce refit tracks which have been deleted (or never created)

• Original (parent) track EDO is present• Refit track history is present and provides

– fitting algorithm (including runtime parameters)

– parent track EDO

• The fitting algorithm is (re)run– Original tracks are used as input

– Regenerated refit tracks as output

Page 13: Event Data History David Adams BNL Atlas Software Week December 2001.

Atlas software week December 2001Event Data History David Adams BNL 13

Use cases (cont)4. Replicated data

• Track EDO’s for interesting events are replicated in a file at a remote site

• Later a user desires to refit these tracks:– Parent clusters are replicated in a separate file

– Job is run with both files as input

– The cluster EDO’s in the second file are recognized as parents of the track EDO’s in the first file

– These clusters are used to refit the tracks

Page 14: Event Data History David Adams BNL Atlas Software Week December 2001.

Atlas software week December 2001Event Data History David Adams BNL 14

Use cases (cont)5. History creation in Gaudi

• The history data is extracted while running in the Gaudi framework

– A historian is created at the beginning of the job and it extracts job level history from the OS and Gaudi

– The historian extracts algorithm-specific history from each algorithm

– Each time an EDO is created, the historian uses the algorithm and input data (EDO’s and global) to construct history for that EDO

– The job and algorithm histories can be shared

Page 15: Event Data History David Adams BNL Atlas Software Week December 2001.

Atlas software week December 2001Event Data History David Adams BNL 15

Requirements1. History includes essential information:

• Parent EDO’s• Relevant global data (calib, alignment, …)• Algorithm

– Type, version and run-time parameters

• Release version• Run time environment

– OS, OS and shared lib versions

2. Above must be sufficient to reproduce EDO

Page 16: Event Data History David Adams BNL Atlas Software Week December 2001.

Atlas software week December 2001Event Data History David Adams BNL 16

Requirements (cont)3. History includes nonessential information:

• Event identifier• Time stamp• Computer identifier• Job identifier• CPU time consumed• Algorithm return status• Checksum to verify data

Page 17: Event Data History David Adams BNL Atlas Software Week December 2001.

Atlas software week December 2001Event Data History David Adams BNL 17

Requirements (cont)4. History can exist even if its EDO is deleted

5. EDO index• There must be a way to label (index) an EDO

so references to parent EDO’s can be persistent

6. EDO indices provide identity• Indices are unique• Indices span files, federations, DB technologies

and geographical locations• References to parent EDO’s remain valid when

parents are replicated or regenerated

Page 18: Event Data History David Adams BNL Atlas Software Week December 2001.

Atlas software week December 2001Event Data History David Adams BNL 18

Requirements (cont)7. EDO’s can be replicated or regenerated

• Copies have the same data, essential history and index as the original

• For regeneration, much of the nonessential history will differ

8. Replicated and regenerated EDO’s are equivalent to the originals• Either may be provided in place of the original

9. The history for a regenerated EDO should indicate its secondary nature

Page 19: Event Data History David Adams BNL Atlas Software Week December 2001.

Atlas software week December 2001Event Data History David Adams BNL 19

Design issuesObject identifiers

• We require a mechanism to assign a unique index to each EDO and its associated history

• Here is an example implementation:– Use 64 bits (20 years of 109 events/yr with 5k

EDO’s/event uses 47 bits)

– A central source serves collections of unused indices to local disks

– Each job gains exclusive access to a collection

– Collection hands out unique indices

Page 20: Event Data History David Adams BNL Atlas Software Week December 2001.

Atlas software week December 2001Event Data History David Adams BNL 20

Design Issues (cont)2. Distributing history information

• Much history info is common to many EDO’s– Within a file share data without duplicating it

– Separate out job and algorithm histories

• Job history– Release version

– Runtime environment > OS, shared libraries and their versions

– CPU identifier (e.g. hostname)

– Start time

Page 21: Event Data History David Adams BNL Atlas Software Week December 2001.

Atlas software week December 2001Event Data History David Adams BNL 21

Design issues (cont)• Algorithm history

– Type

– Version

– Name or identifier (Gaudi name)

– Run time parameters (Gaudi properties)

– Subalgorithm histories

Page 22: Event Data History David Adams BNL Atlas Software Week December 2001.

Atlas software week December 2001Event Data History David Adams BNL 22

Design issues (cont)• Data history (for each EDO)

– Event ID

– EDO

– Job history

– Algorithm history

– Parent EDO’s

– Global data indices (calibration, alignment, …)

– Start and stop times

– CPU time consumed

– Algorithm return status

– Data checksum

Page 23: Event Data History David Adams BNL Atlas Software Week December 2001.

Atlas software week December 2001Event Data History David Adams BNL 23

Design issues (cont)3. Transient interface

• For now we define transient classes describing the three types of histories

• StoreGate converters will make these persistent

4. Historian• Provides a convenient mechanism for

generating history objects

Page 24: Event Data History David Adams BNL Atlas Software Week December 2001.

Atlas software week December 2001Event Data History David Adams BNL 24

StatusCurrent implementation

• All classes in package Control/AthenaHistory– JobHistory

– AlgorithmHistory

– DataHistory

– Historian

• Each class has a component test• Builds and tests successfully in 2.4.1

– Tests must be run by hand> (waiting for support from ATLAS/CMT)

Page 25: Event Data History David Adams BNL Atlas Software Week December 2001.

Atlas software week December 2001Event Data History David Adams BNL 25

Future1. Modify athena to create history objects

2. Make history objects persist

3. Teach algorithm to specify parents• Or should this come from athena?

4. Teach algorithm to return parameters• Relevant parameters instead of all properties

5. Design and implement EDO identifiers

Page 26: Event Data History David Adams BNL Atlas Software Week December 2001.

Atlas software week December 2001Event Data History David Adams BNL 26

Future (cont)6. Add history for secondary objects

• Flag for replicas• More data for regenerated data

7. Missing history• Because history never written or was deleted• Ancestry chain is broken• Merge missing history into that of the child

Page 27: Event Data History David Adams BNL Atlas Software Week December 2001.

Atlas software week December 2001Event Data History David Adams BNL 27

Future (cont)8. Mutable EDO’s

• Updates– If an EDO is updated in a separate algorithm, then

history must include the updates

• Early references– Complications if an EDO is updated after it is used

as a parent.> Child may not be reproducible from the update

– Reference to parent will need to be extended to include the state of the EDO

– Or treat each state as a separate EDO