N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER Charles Leggett A Lightweight Histogram...
-
Upload
meredith-summers -
Category
Documents
-
view
214 -
download
1
Transcript of N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER Charles Leggett A Lightweight Histogram...
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
Charles Leggett <[email protected]>
A Lightweight Histogram Interface A Lightweight Histogram Interface LayerLayer
CHEP 2000Session F (F320)
Thursday Feb 10 2000
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
2Charles Leggett <[email protected]>
CHEP 2000 Feb 10
IntroductionIntroduction
Current histogramming software packages, such as PAW, ROOT, JAS have enormous functionality.
They are no longer simply histogramming packages, but have added data analysis and visualization features.
The tight integration between these features has made it difficult to separate the statistical data gathering feature from the analysis and graphical presentation features.
This results in significant overheads, if only the histogramming aspect is needed.
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
3Charles Leggett <[email protected]>
CHEP 2000 Feb 10
Introduction Introduction (cont)(cont)
Many histogramming packages are wedded to a specific i/o format.
Very few translation programs exist to convert between various formats.
Makes it very hard to use analysis and visualization tools that are not part of the package used to generate the histogram.
Users have very little freedom to chose the package best suited to their needs, or the ones they are most familiar with.
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
4Charles Leggett <[email protected]>
CHEP 2000 Feb 10
Why an “Interface Layer”Why an “Interface Layer”
Since it is format independent, and has no i/o (file or visual) requirements, it is not wedded to a specific part of the analysis procedure.
It can sit between components, such as between the data acquisition component and the analysis component, offering the ability to use various formats in different applications.
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
5Charles Leggett <[email protected]>
CHEP 2000 Feb 10
Design RequirementsDesign Requirements
Platform and i/o format independent Lightweight - low overhead, minimal non-
histogram features Possibility to histogram any data type Ability to use within an analysis schema, as an
interface between different components, or as a standalone utility
Ability to use as a translator between various i/o formats
i/o formats user extensible Easy implementation by user
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
6Charles Leggett <[email protected]>
CHEP 2000 Feb 10
Required Qualities of a HistogramRequired Qualities of a Histogram
A collection of statistical data related to a particular process.
Should not contain any information unrelated to the statistical data, such as colour, fitting parameters, line width, cuts, etc.
Number of bins + overflow/underflow Bin edges Entries per bin + associated errors Identification information, such as an ID or name
= n+3 + 2n
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
7Charles Leggett <[email protected]>
CHEP 2000 Feb 10
Minimal Set of Useful MethodsMinimal Set of Useful Methods
weighted entries reset() bin contents, errors, centers, edges bin numbers <-> bin edges/centers simple operations: =, +, - mean(), rms() min(), max() rebin(), resize() change title
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
8Charles Leggett <[email protected]>
CHEP 2000 Feb 10
What Gets HistogrammedWhat Gets Histogrammed
Normally we used to histogram ints and floats. What about entire objects?
To histogram an object, have to define which aspect of the object is used to order the histogram.
Can provide this ordering every time a histogram is filled, but nicer to associate an ordering mechanism with the histogram itself.
Define a function which provides this ordering, give pointer to histogram object.
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
9Charles Leggett <[email protected]>
CHEP 2000 Feb 10
Types of HistogramsTypes of Histograms
BINNED– bin edges defined when created.
– Either fixed or variable width
UNBINNED– only for very small data samples
– can be converted to BINNED
AUTO-BINNED– starts off as UNBINNED, automatically converted to BINNED after
a set number of entries.
– Conversion routines calculate bin edges with either fixed width, or to maximize occupancy in each bin.
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
10Charles Leggett <[email protected]>
CHEP 2000 Feb 10
Use OverviewUse Overview
Book as:•Binned•Unbinned•Auto
Output:• hbook/PAW•ROOT•JAS•text•User Defined
Basic Operations:• Fill•Weighted Fill•Add, Subtract,...•Resize, Rebin•Convert Type•etc
User definedquantization
function
User Object
Continued Analysis
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
11Charles Leggett <[email protected]>
CHEP 2000 Feb 10
Internal StorageInternal Storage
If memory utilization is very tight, the user may want to limit the precision of the statistical data
User can chose between 4 and 8 byte internal record keeping – bin contents
– bin errors
– number of entries
– number of equivalent entries
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
12Charles Leggett <[email protected]>
CHEP 2000 Feb 10
Memory UsageMemory Usage
Dynamic memory allocation is neat, but implementation (often) sucks. Will always be an overhead to using it.
Pre-allocate memory - fairly easy to do with a BINNED histogram.
Limit use of dynamic structures. Only run into trouble if need to re-size or re-bin
a histogram after it’s been created. UNBINNED histograms can either pre-allocate
memory, or dynamically allocate on the fly. Total overhead per histogram: 80 bytes.
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
13Charles Leggett <[email protected]>
CHEP 2000 Feb 10
Implementation DetailsImplementation Details
The requirement to be able to histogram objects has a serious implication - use of templates.
The histogram object becomes a templated object, with parameters the type of object to be histogrammed and the type of internal record keeping data:
Histogram<object type, (float|double)>
For UNBINNED histograms, STL vectors are used if dynamic memory management is chosen.
Similar syntax for 2D histograms.
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
14Charles Leggett <[email protected]>
CHEP 2000 Feb 10
UsageUsage
Simple histogram of floats, fixed bin widthHistogram<> h1(-10.,10.,100);
h1.Fill(X);
Histogram of ints, variable bin width, double precision
Histogram<int,double> h2(Xedge);
Histogram of Muon object, automatically binned to maximize occupancy
float MuonQuantFunction(const Muon &M){};
Histogram<Muon> h3(AUTOBINNED);
h3.SetQuantFunction( MuonQuantFunction );
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
15Charles Leggett <[email protected]>
CHEP 2000 Feb 10
I/OI/O
File manager class used to read and write histograms from/to disk in a variety of formats
Internal histograms are only converted to a particular format when they are written.
File manager can easily be extended to encompass new file formats.
Current formats:– ASCII flat file
– HBOOK
– ROOT
– XDR / DSL
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
16Charles Leggett <[email protected]>
CHEP 2000 Feb 10
NtuplesNtuples
ntuples are trickier than histograms, as there are several different types (column-wise vs. row-wise, ROOT trees, etc)
For the moment, have implemented them in the most trivial way: arrays/vectors of structs.
struct S { float E; int np; Muon M; };
ntuple<S> nt;S.E = .... ;nt.Fill(S);
Simple accessor methods also provided.
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
17Charles Leggett <[email protected]>
CHEP 2000 Feb 10
Additional FunctionalityAdditional Functionality
Even though no complex functions are provided within the package, users may find it necessary to create them at needed.
Library functions can easily be added to provide user-specific histogram/ntuple operations.
For instance, if a user needs to perform a double gaussian fit to a histogram, it is very easy to add this function in an external library, declared as a friend.
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
18Charles Leggett <[email protected]>
CHEP 2000 Feb 10
Additions in the PipelineAdditions in the Pipeline
Ability to use shared memory Extend i/o format to include JAS Internal conversion to ROOT/HBOOK/JAS Profile histograms Further support for ntuples Adhere to AIDA interface
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
19Charles Leggett <[email protected]>
CHEP 2000 Feb 10
PipedreamsPipedreams
Create an adaptor to a memory resident histogram object to allow multi-format access.
Basic histogram object sits in memory, presents different representations of itself to various components - eg looks like an HBOOK histogram to minuit, a ROOT histogram to a ROOT specific process. If modifications are made to histogram by other applications, can re-synchronize and update itself.
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER
20Charles Leggett <[email protected]>
CHEP 2000 Feb 10
ConclusionsConclusions
Makes a clean break between statistical data gathering, and analysis and visualization tasks.
Enables histogramming of complex types. Simple and small implementation that is well
suited to memory restricted tasks, such as online data taking.
Provides the user with the freedom to chose a wide variety of different analysis and visualization tools.
Easily extensible, whether to new i/o formats or specific analysis functions.