N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER Charles Leggett A Lightweight Histogram...

20
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER Charles Leggett <[email protected]> A Lightweight Histogram A Lightweight Histogram Interface Layer Interface Layer CHEP 2000 Session F (F320) Thursday Feb 10 2000

Transcript of N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER Charles Leggett A Lightweight Histogram...

Page 1: N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER Charles Leggett A Lightweight Histogram Interface Layer CHEP 2000 Session F (F320) Thursday.

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

Charles Leggett <[email protected]>

A Lightweight Histogram Interface A Lightweight Histogram Interface LayerLayer

CHEP 2000Session F (F320)

Thursday Feb 10 2000

Page 2: N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER Charles Leggett A Lightweight Histogram Interface Layer CHEP 2000 Session F (F320) Thursday.

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

2Charles Leggett <[email protected]>

CHEP 2000 Feb 10

IntroductionIntroduction

Current histogramming software packages, such as PAW, ROOT, JAS have enormous functionality.

They are no longer simply histogramming packages, but have added data analysis and visualization features.

The tight integration between these features has made it difficult to separate the statistical data gathering feature from the analysis and graphical presentation features.

This results in significant overheads, if only the histogramming aspect is needed.

Page 3: N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER Charles Leggett A Lightweight Histogram Interface Layer CHEP 2000 Session F (F320) Thursday.

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

3Charles Leggett <[email protected]>

CHEP 2000 Feb 10

Introduction Introduction (cont)(cont)

Many histogramming packages are wedded to a specific i/o format.

Very few translation programs exist to convert between various formats.

Makes it very hard to use analysis and visualization tools that are not part of the package used to generate the histogram.

Users have very little freedom to chose the package best suited to their needs, or the ones they are most familiar with.

Page 4: N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER Charles Leggett A Lightweight Histogram Interface Layer CHEP 2000 Session F (F320) Thursday.

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

4Charles Leggett <[email protected]>

CHEP 2000 Feb 10

Why an “Interface Layer”Why an “Interface Layer”

Since it is format independent, and has no i/o (file or visual) requirements, it is not wedded to a specific part of the analysis procedure.

It can sit between components, such as between the data acquisition component and the analysis component, offering the ability to use various formats in different applications.

Page 5: N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER Charles Leggett A Lightweight Histogram Interface Layer CHEP 2000 Session F (F320) Thursday.

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

5Charles Leggett <[email protected]>

CHEP 2000 Feb 10

Design RequirementsDesign Requirements

Platform and i/o format independent Lightweight - low overhead, minimal non-

histogram features Possibility to histogram any data type Ability to use within an analysis schema, as an

interface between different components, or as a standalone utility

Ability to use as a translator between various i/o formats

i/o formats user extensible Easy implementation by user

Page 6: N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER Charles Leggett A Lightweight Histogram Interface Layer CHEP 2000 Session F (F320) Thursday.

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

6Charles Leggett <[email protected]>

CHEP 2000 Feb 10

Required Qualities of a HistogramRequired Qualities of a Histogram

A collection of statistical data related to a particular process.

Should not contain any information unrelated to the statistical data, such as colour, fitting parameters, line width, cuts, etc.

Number of bins + overflow/underflow Bin edges Entries per bin + associated errors Identification information, such as an ID or name

= n+3 + 2n

Page 7: N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER Charles Leggett A Lightweight Histogram Interface Layer CHEP 2000 Session F (F320) Thursday.

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

7Charles Leggett <[email protected]>

CHEP 2000 Feb 10

Minimal Set of Useful MethodsMinimal Set of Useful Methods

weighted entries reset() bin contents, errors, centers, edges bin numbers <-> bin edges/centers simple operations: =, +, - mean(), rms() min(), max() rebin(), resize() change title

Page 8: N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER Charles Leggett A Lightweight Histogram Interface Layer CHEP 2000 Session F (F320) Thursday.

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

8Charles Leggett <[email protected]>

CHEP 2000 Feb 10

What Gets HistogrammedWhat Gets Histogrammed

Normally we used to histogram ints and floats. What about entire objects?

To histogram an object, have to define which aspect of the object is used to order the histogram.

Can provide this ordering every time a histogram is filled, but nicer to associate an ordering mechanism with the histogram itself.

Define a function which provides this ordering, give pointer to histogram object.

Page 9: N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER Charles Leggett A Lightweight Histogram Interface Layer CHEP 2000 Session F (F320) Thursday.

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

9Charles Leggett <[email protected]>

CHEP 2000 Feb 10

Types of HistogramsTypes of Histograms

BINNED– bin edges defined when created.

– Either fixed or variable width

UNBINNED– only for very small data samples

– can be converted to BINNED

AUTO-BINNED– starts off as UNBINNED, automatically converted to BINNED after

a set number of entries.

– Conversion routines calculate bin edges with either fixed width, or to maximize occupancy in each bin.

Page 10: N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER Charles Leggett A Lightweight Histogram Interface Layer CHEP 2000 Session F (F320) Thursday.

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

10Charles Leggett <[email protected]>

CHEP 2000 Feb 10

Use OverviewUse Overview

Book as:•Binned•Unbinned•Auto

Output:• hbook/PAW•ROOT•JAS•text•User Defined

Basic Operations:• Fill•Weighted Fill•Add, Subtract,...•Resize, Rebin•Convert Type•etc

User definedquantization

function

User Object

Continued Analysis

Page 11: N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER Charles Leggett A Lightweight Histogram Interface Layer CHEP 2000 Session F (F320) Thursday.

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

11Charles Leggett <[email protected]>

CHEP 2000 Feb 10

Internal StorageInternal Storage

If memory utilization is very tight, the user may want to limit the precision of the statistical data

User can chose between 4 and 8 byte internal record keeping – bin contents

– bin errors

– number of entries

– number of equivalent entries

Page 12: N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER Charles Leggett A Lightweight Histogram Interface Layer CHEP 2000 Session F (F320) Thursday.

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

12Charles Leggett <[email protected]>

CHEP 2000 Feb 10

Memory UsageMemory Usage

Dynamic memory allocation is neat, but implementation (often) sucks. Will always be an overhead to using it.

Pre-allocate memory - fairly easy to do with a BINNED histogram.

Limit use of dynamic structures. Only run into trouble if need to re-size or re-bin

a histogram after it’s been created. UNBINNED histograms can either pre-allocate

memory, or dynamically allocate on the fly. Total overhead per histogram: 80 bytes.

Page 13: N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER Charles Leggett A Lightweight Histogram Interface Layer CHEP 2000 Session F (F320) Thursday.

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

13Charles Leggett <[email protected]>

CHEP 2000 Feb 10

Implementation DetailsImplementation Details

The requirement to be able to histogram objects has a serious implication - use of templates.

The histogram object becomes a templated object, with parameters the type of object to be histogrammed and the type of internal record keeping data:

Histogram<object type, (float|double)>

For UNBINNED histograms, STL vectors are used if dynamic memory management is chosen.

Similar syntax for 2D histograms.

Page 14: N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER Charles Leggett A Lightweight Histogram Interface Layer CHEP 2000 Session F (F320) Thursday.

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

14Charles Leggett <[email protected]>

CHEP 2000 Feb 10

UsageUsage

Simple histogram of floats, fixed bin widthHistogram<> h1(-10.,10.,100);

h1.Fill(X);

Histogram of ints, variable bin width, double precision

Histogram<int,double> h2(Xedge);

Histogram of Muon object, automatically binned to maximize occupancy

float MuonQuantFunction(const Muon &M){};

Histogram<Muon> h3(AUTOBINNED);

h3.SetQuantFunction( MuonQuantFunction );

Page 15: N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER Charles Leggett A Lightweight Histogram Interface Layer CHEP 2000 Session F (F320) Thursday.

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

15Charles Leggett <[email protected]>

CHEP 2000 Feb 10

I/OI/O

File manager class used to read and write histograms from/to disk in a variety of formats

Internal histograms are only converted to a particular format when they are written.

File manager can easily be extended to encompass new file formats.

Current formats:– ASCII flat file

– HBOOK

– ROOT

– XDR / DSL

Page 16: N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER Charles Leggett A Lightweight Histogram Interface Layer CHEP 2000 Session F (F320) Thursday.

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

16Charles Leggett <[email protected]>

CHEP 2000 Feb 10

NtuplesNtuples

ntuples are trickier than histograms, as there are several different types (column-wise vs. row-wise, ROOT trees, etc)

For the moment, have implemented them in the most trivial way: arrays/vectors of structs.

struct S { float E; int np; Muon M; };

ntuple<S> nt;S.E = .... ;nt.Fill(S);

Simple accessor methods also provided.

Page 17: N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER Charles Leggett A Lightweight Histogram Interface Layer CHEP 2000 Session F (F320) Thursday.

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

17Charles Leggett <[email protected]>

CHEP 2000 Feb 10

Additional FunctionalityAdditional Functionality

Even though no complex functions are provided within the package, users may find it necessary to create them at needed.

Library functions can easily be added to provide user-specific histogram/ntuple operations.

For instance, if a user needs to perform a double gaussian fit to a histogram, it is very easy to add this function in an external library, declared as a friend.

Page 18: N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER Charles Leggett A Lightweight Histogram Interface Layer CHEP 2000 Session F (F320) Thursday.

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

18Charles Leggett <[email protected]>

CHEP 2000 Feb 10

Additions in the PipelineAdditions in the Pipeline

Ability to use shared memory Extend i/o format to include JAS Internal conversion to ROOT/HBOOK/JAS Profile histograms Further support for ntuples Adhere to AIDA interface

Page 19: N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER Charles Leggett A Lightweight Histogram Interface Layer CHEP 2000 Session F (F320) Thursday.

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

19Charles Leggett <[email protected]>

CHEP 2000 Feb 10

PipedreamsPipedreams

Create an adaptor to a memory resident histogram object to allow multi-format access.

Basic histogram object sits in memory, presents different representations of itself to various components - eg looks like an HBOOK histogram to minuit, a ROOT histogram to a ROOT specific process. If modifications are made to histogram by other applications, can re-synchronize and update itself.

Page 20: N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER Charles Leggett A Lightweight Histogram Interface Layer CHEP 2000 Session F (F320) Thursday.

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

20Charles Leggett <[email protected]>

CHEP 2000 Feb 10

ConclusionsConclusions

Makes a clean break between statistical data gathering, and analysis and visualization tasks.

Enables histogramming of complex types. Simple and small implementation that is well

suited to memory restricted tasks, such as online data taking.

Provides the user with the freedom to chose a wide variety of different analysis and visualization tools.

Easily extensible, whether to new i/o formats or specific analysis functions.