10/15/08HDF & HDF-EOS Workshop XII11 Introduction to HDF5 HDF & HDF-EOS Workshop XII October 15,...

80
10/15/08 HDF & HDF-EOS Workshop XII 1 Introduction to HDF5 HDF & HDF-EOS Workshop XII October 15, 2008
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    220
  • download

    0

Transcript of 10/15/08HDF & HDF-EOS Workshop XII11 Introduction to HDF5 HDF & HDF-EOS Workshop XII October 15,...

10/15/08 HDF & HDF-EOS Workshop XII 11

Introduction to HDF5

HDF & HDF-EOS Workshop XII

October 15, 2008

10/15/08 HDF & HDF-EOS Workshop XII 22

Topics Covered

- Introduce HDF5

- Describe HDF5 Data and Programming Models

- Walk Through Example Code

- Introduce HDF5

- Describe HDF5 Data and Programming Models

- Walk Through Example Code

10/15/08 HDF & HDF-EOS Workshop XII 3

For More Information …

All workshop slides will be available from:

http://hdfeos.org/workshops/ws12/workshop_twelve.php

10/15/08 HDF & HDF-EOS Workshop XII 4

What is HDF5?

HDF = Hierarchical Data Format

• Data model, library and file format for managing data

• Tools for accessing data in the HDF5 format

10/15/08 HDF & HDF-EOS Workshop XII 5

Brief History of HDF

1987 At NCSA (University of Illinois), a task force formed to create an architecture-independent format and library:

AEHOO (All Encompassing Hierarchical Object Oriented format) Became HDF

Early NASA adopted HDF for Earth Observing System project 1990’s

1996 DOE’s ASC (Advanced Simulation and Computing) Project began collaborating with the HDF group (NCSA) to create “Big HDF”

(Increase in computing power of DOE systems at LLNL, LANL and Sandia National labs, required bigger, more complex data files).

“Big HDF” became HDF5. 1998 HDF5 was released with support from National Labs, NASA, NCSA

2006 The HDF Group spun off from University of Illinois as non-profit corporation

10/15/08 HDF & HDF-EOS Workshop XII 66

Why HDF5?

In one sentence ...

10/15/08 HDF & HDF-EOS Workshop XII 77

Matter and the universe

Weather and climate

August 24, 2001 August 24, 2002

Total Column Ozone (Dobson)

60 385 610

Life and nature

Answering big questions …

10/15/08 HDF & HDF-EOS Workshop XII 88

… involves big data …

10/15/08 HDF & HDF-EOS Workshop XII 9

LCI Tutorial

9

… varied data …

Thanks to Mark Miller, LLNL

10/15/08 HDF & HDF-EOS Workshop XII 1010

… and complex relationships …

Contig Summaries

Discrepancies

Contig Qualities

Coverage Depth

Read Read qualityquality

Aligned bases

ContigContig

Reads

Percent match

TraceTrace

SNP ScoreSNP Score

10/15/08 HDF & HDF-EOS Workshop XII 1111

… on big computers …

… and small computers …

10/15/08 HDF & HDF-EOS Workshop XII 1212

How do we…

• Describe our data? • Read it? Store it? Find it? Share it? Mine it? • Move it into, out of, and between computers and

repositories?• Achieve storage and I/O efficiency?• Give applications and tools easy access our data?

10/15/08 HDF & HDF-EOS Workshop XII 13

Solution: HDF5!

• Can store all kinds of data in a variety of ways

• Runs on most systems

• Lots of tools to access data

• Emphasis on standards (HDF-EOS, CGNS)

• Library and format emphasis on I/O efficiency and storage

10/15/08 HDF & HDF-EOS Workshop XII 14

File or other “storage”

Virtual file I/O

Library internals

Structure of HDF5 Library

Object API (C, F90, C++, Java)

ApplicationsApplications

10/15/08 HDF & HDF-EOS Workshop XII 1515

HDF Tools

- HDFView and Java Products

- Command-line utilities (h5dump, h5ls, h5cc, h5diff, h5repack)

10/15/08 HDF & HDF-EOS Workshop XII 16

HDF5 Applications & Domains

Simulation, visualization, remote sensing…

Examples: Thermonuclear simulationsProduct modelingData mining tools

Visualization toolsClimate models

HDF-EOS CGNS ASC

Storage

File on parallelfile systemFile

Split metadata and raw data files

User-defineddevice

??HDF5 format

HDF5 Data Model & APIHDF5 Data Model & APIStdioStdio CustomCustomSplit FilesSplit Files MPI I/OMPI I/O

CommunitiesCommunities

Virtual File Layer(I/O Drivers)Virtual File Layer(I/O Drivers)

10/15/08 HDF & HDF-EOS Workshop XII 17

Lots of Layers in HDF5!

“Ogres are like onions.” “Ogres are like onions.”

Shrek HDF5 Monster?? Shrek HDF5 Monster??

Just like Shrek, once you get to know HDF5 you will really like it!!Just like Shrek, once you get to know HDF5 you will really like it!!

10/15/08 HDF & HDF-EOS Workshop XII 1818

The HDF5 Format

10/15/08 HDF & HDF-EOS Workshop XII 1919

An HDF5 file is a container…

lat | lon | temp----|-----|----- 12 | 23 | 3.1 15 | 24 | 4.2 17 | 21 | 3.6

palette

palette

……into into which you which you can put can put your data your data objects.objects.

10/15/08 HDF & HDF-EOS Workshop XII 2020

HDF5 Structures for Organizing Objects

palettepalette

Raster imageRaster image

3-D array3-D array

2-D array2-D arrayRaster imageRaster image

lat | lon | templat | lon | temp----|-----|---------|-----|----- 12 | 23 | 3.112 | 23 | 3.1 15 | 24 | 4.215 | 24 | 4.2 17 | 21 | 3.617 | 21 | 3.6

TableTable

““/” (root)/” (root)““/” (root)/” (root)

““foo”foo”““foo”foo”

10/15/08 HDF & HDF-EOS Workshop XII 2121

HDF5 Data Model

Primary Objects• Groups

• Datasets

Additional ways to organize and annotate data• Attributes

• Storage and access properties

Everything else is built from these parts.Everything else is built from these parts.

10/15/08 HDF & HDF-EOS Workshop XII 2222

HDF5 Dataset

DataMetadataDataspace

3

RankRank

Dim_2 = 5Dim_1 = 4

DimensionsDimensions

Time = 32.4

Pressure = 987

Temp = 56

AttributesAttributes

Chunked

Compressed

Dim_3 = 7

Storage InfoStorage Info

Integer DatatypeDatatype

10/15/08 HDF & HDF-EOS Workshop XII 2323

Dataspaces

Two roles:• Dataspace contains spatial info about a dataset

stored in a file• Rank and dimensions• Permanent part of dataset

definition

• Partial I/0: Dataspace describes application’s data buffer and data elements participating in I/O

Rank = 2Rank = 2

Dimensions = 4x6Dimensions = 4x6

Rank = 1Rank = 1

Dimension = 10Dimension = 10

10/15/08 HDF & HDF-EOS Workshop XII 2424

Write – from memory to disk

memorymemory diskdisk

10/15/08 HDF & HDF-EOS Workshop XII 2525

Partial I/O

(b) Regular series of blocks from a 2D array to a contiguous sequence at a certain offset in a 1D array

memorymemorydiskdisk(a) Slab from a 2D array to the corner of a smaller 2D array

memorymemory diskdisk

Move just part of a dataset

Elements in each must be same.Elements in each must be same.

10/15/08 HDF & HDF-EOS Workshop XII 2626

Datatypes (array elements)

• Datatype – how to interpret a data element• Permanent part of the dataset definition

• Two classes: atomic and compound

10/15/08 HDF & HDF-EOS Workshop XII 2727

Datatypes

• HDF5 atomic types include:integer & floatuser-definable (e.g., 13-bit integer)variable length types (e.g., strings)references to objects/dataset regionsenumeration - names mapped to integers

• HDF5 compound typesComparable to C structs (“records”)Members can be atomic or compound types

10/15/08 HDF & HDF-EOS Workshop XII 2828

RecordRecord

int8 int4 int16 2x3x2 array of float322x3x2 array of float32Datatype:Datatype:

HDF5 dataset: array of records

Dimensionality: 5 x 3Dimensionality: 5 x 3

3

5

10/15/08 HDF & HDF-EOS Workshop XII 29

Properties

• Properties are characteristics of HDF5 objects that can be modified

• Default properties handle most needs

• By changing properties can take advantage of the more powerful features in HDF5

10/15/08 HDF & HDF-EOS Workshop XII 3030

Special Storage Properties

Better subsetting access time; extensible

chunked

Improves storage efficiency, transmission speed

compressed

Arrays can be extended in any direction

extensible

Metadata for FredMetadata for Fred

Dataset “Fred”Dataset “Fred”

File A

File B

Data for Fred

Metadata in one file, raw data in anothersplit file

10/15/08 HDF & HDF-EOS Workshop XII 3131

Attributes (optional)

• Attribute – data of the form “name = value”, attached to an object

• Operations similar to dataset operations, but … Not extensible No compression or partial I/O

• Can be overwritten, deleted, added during the “life” of a dataset

10/15/08 HDF & HDF-EOS Workshop XII 3232

HDF5 Dataset (again)

DataMetadataDataspace

3

RankRank

Dim_2 = 5Dim_1 = 4

DimensionsDimensions

Time = 32.4

Pressure = 987

Temp = 56

AttributesAttributes

Chunked

Compressed

Dim_3 = 7

Storage infoStorage info

Integer DatatypeDatatype

10/15/08 HDF & HDF-EOS Workshop XII 3333

Groups

“/”A B

C

k l m

• A mechanism for organizing collections• Every file starts with a root group• Similar to UNIX directories• Can have attributes

• A mechanism for organizing collections• Every file starts with a root group• Similar to UNIX directories• Can have attributes

10/15/08 HDF & HDF-EOS Workshop XII 3434

“/”x

temp

temp

/ (root)/x/foo/foo/temp/foo/bar/temp

Path to HDF5 Object in a File

foo

bar

10/15/08 HDF & HDF-EOS Workshop XII 3535

Shared Objects

/A/P/A/P/B/R/B/R/C/P/C/P

“/”A B C

PR P

10/15/08 HDF & HDF-EOS Workshop XII 36

Questions So Far?

10/15/08 HDF & HDF-EOS Workshop XII 37

Useful Tools For New Users

h5dump:Tool to “dump” or display contents of HDF5 files

h5cc, h5c++, h5fc:Scripts to compile applications

HDFView: Java browser to view HDF4 and HDF5 files

h5dump:Tool to “dump” or display contents of HDF5 files

h5cc, h5c++, h5fc:Scripts to compile applications

HDFView: Java browser to view HDF4 and HDF5 files

10/15/08 HDF & HDF-EOS Workshop XII 38

H5dump Command-line Utility To View HDF5 File

h5dump [--header] [-a ] [-d <names>] [-g <names>] [-l <names>] [-t <names>] [-p] <file>

--header Display header only; no data is displayed. -a <names> Display the specified attribute(s). -d <names> Display the specified dataset(s). -g <names> Display the specified group(s) and all the members. -l <names> Displays the value(s) of the specified soft link(s). -t <names> Display the specified named datatype(s). -p Display properties. <names> is one or more appropriate object names.

10/15/08 HDF & HDF-EOS Workshop XII 39

HDF5 "dset.h5" {GROUP "/" { DATASET "dset" { DATATYPE { H5T_STD_I32BE } DATASPACE { SIMPLE ( 4, 6 ) / ( 4, 6 ) } DATA { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24 } }}}

“/”

Example of h5dump Output

‘dset’‘dset’

10/15/08 HDF & HDF-EOS Workshop XII 4040

HDF5 Compile Scripts

• h5cc – HDF5 C compiler command• h5fc – HDF5 F90 compiler command

• h5c++ – HDF5 C++ compiler command

To compile:% h5cc h5prog.c

% h5fc h5prog.f90

10/15/08 HDF & HDF-EOS Workshop XII 4141

Compile option: -show

-show: displays the compiler commands and options without executing them

% h5cc –show Sample_c.cgcc -I/home/packages/hdf5_1.6.6/Linux_2.6/include -UH5_DEBUG_API -DNDEBUG -I/home/packages/szip/static/encoder/Linux2.6-gcc/include -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -D_POSIX_SOURCE -D_BSD_SOURCE -std=c99 -Wno-long-long -O -fomit-frame-pointer -finline-functions -c Sample_c.c

gcc -std=c99 -Wno-long-long -O -fomit-frame-pointer -finline-functions -L/home/packages/szip/static/encoder/Linux2.6-gcc/lib Sample_c.o -L/home/packages/hdf5_1.6.6/Linux_2.6/lib /home/packages/hdf5_1.6.6/Linux_2.6/lib/libhdf5_hl.a /home/packages/hdf5_1.6.6/Linux_2.6/lib/libhdf5.a -lsz -lz -lm -Wl,-rpath -Wl,/home/packages/hdf5_1.6.6/Linux_2.6/lib

10/15/08 HDF & HDF-EOS Workshop XII 42

Browsing HDF5 Files with HDFView

10/15/08 HDF & HDF-EOS Workshop XII 43

HDFView

Structure of FileStructure of File Contents of DatasetContents of Dataset

10/15/08 HDF & HDF-EOS Workshop XII 44

HDFView File Menu

10/15/08 HDF & HDF-EOS Workshop XII 45

10/15/08 HDF & HDF-EOS Workshop XII 46

Simple HDF5 File in HDFView

Right-click and select “Open” with mouseRight-click and select “Open” with mouse

Right-click and select “Show Properties” with mouse

Right-click and select “Show Properties” with mouse

10/15/08 HDF & HDF-EOS Workshop XII 47

Simple HDF5 File in HDFView

10/15/08 HDF & HDF-EOS Workshop XII 48

HDF-EOS5 File in HDFView

10/15/08 HDF & HDF-EOS Workshop XII 49

Right-click and select “Open As” with mouseRight-click and select “Open As” with mouse

10/15/08 HDF & HDF-EOS Workshop XII 50

What you can’t seewith slides:

-Picture displayed instantly-File size is 906,229,176

What you can’t seewith slides:

-Picture displayed instantly-File size is 906,229,176

10/15/08 HDF & HDF-EOS Workshop XII 5151

Introduction to HDF5 Programming Model

and APIs

10/15/08 HDF & HDF-EOS Workshop XII 5252

Operations Supported by the API

• Create objects (groups, datasets, attributes, complex data types, …)

• Assign storage and I/O properties to objects

• Perform complex subsetting during read/write

• Use variety of I/O “devices” (parallel, remote, etc.)

• Transform data during I/O

• Make inquiries on file and object structure, content, properties

10/15/08 HDF & HDF-EOS Workshop XII 5353

General Programming Paradigm

• Properties of object are optionally defined Creation propertiesAccess property lists

• Object is opened or created• Object is accessed, possibly many times• Object is closed

10/15/08 HDF & HDF-EOS Workshop XII 5454

Order of Operations

• An order is imposed on operations by argument dependencies

For Example:

A file must be opened before a dataset -because-

the dataset open call requires a file handle as an argument.

• Objects can be closed in any order.

10/15/08 HDF & HDF-EOS Workshop XII 5555

The General HDF5 API

• Currently C, Fortran 90, Java, and C++ bindings. • C routines begin with prefix H5?

? is a character corresponding to the type of object the function acts on

Example Functions:

H5D : Dataset interface e.g., H5Dread H5F : File interface e.g., H5Fopen

H5S : dataSpace interface e.g., H5Sclose

10/15/08 HDF & HDF-EOS Workshop XII 5656

HDF5 Defined Types

For portability, the HDF5 library has its own defined types:

hid_t: object identifiers (native integer) hsize_t: size used for dimensions (unsigned long or

unsigned long long) hssize_t: for specifying coordinates and sometimes for

dimensions (signed long or signed long long)

herr_t: function return value

hvl_t: variable length datatype

For C, include hdf5.h in your HDF5 application.

10/15/08 HDF & HDF-EOS Workshop XII 5757

The HDF5 API

• For flexibility, the API is extensive 300+ functions

• This can be daunting… but there is hopeA few functions can do a lotStart simple Build up knowledge as more features are needed

Victronix Swiss Army Cybertool 34

10/15/08 HDF & HDF-EOS Workshop XII 58

Basic Functions

H5Fcreate (H5Fopen) create (open) File

H5Screate_simple create dataSpace

H5Dcreate (H5Dopen) create (open) Dataset

H5Dread, H5Dwrite access Dataset

H5Dclose close Dataset

H5Sclose close dataSpace

H5Fclose close File

10/15/08 HDF & HDF-EOS Workshop XII 59

Other Common Functions

DataSpaces: H5Sselect_hyperslab (Partial I/O) H5Sselect_elements (Partial I/O)

Groups: H5Gcreate, H5Gopen, H5Gclose

Attributes: H5Acreate, H5Aopen_name, H5Aclose, H5Aread, H5Awrite

Property lists: H5Pcreate, H5Pclose H5Pset_chunk, H5Pset_deflate

10/15/08 HDF & HDF-EOS Workshop XII 60

High Level APIs

• Included along with the HDF5 library• Simplify steps for creating, writing, and reading

objects• Do not entirely ‘wrap’ HDF5 library

10/15/08 HDF & HDF-EOS Workshop XII 61

Example HDF5 Code

10/15/08 HDF & HDF-EOS Workshop XII 6262

Steps to Create a File

1. Decide on special properties the file should have • Creation properties, like size of user block

• Access properties, such as metadata cache size

• Use default properties (H5P_DEFAULT)

2. Create property lists, if necessary

3. Create the file

4. Close the file and the property lists, as needed

10/15/08 HDF & HDF-EOS Workshop XII 6363

Code: Create a File

hid_t file_id; herr_t status; file_id = H5Fcreate ("file.h5", H5F_ACC_TRUNC,

H5P_DEFAULT, H5P_DEFAULT);

status = H5Fclose (file_id);

Note: Return codes not checked for errors in code samples.

“/” (root)“/” (root)

10/15/08 HDF & HDF-EOS Workshop XII 6464

Dataset Components

DataMetadataDataspace

3

RankRank

Dim_2 = 5Dim_1 = 4

DimensionsDimensions

Time = 32.4

Pressure = 987

Temp = 56

AttributesAttributes

Chunked

Compressed

Dim_3 = 7

Storage infoStorage info

Integer DatatypeDatatype

10/15/08 HDF & HDF-EOS Workshop XII 6565

Steps to Create a Dataset

1. Define dataset characteristics• Dataspace - 4x6

• Datatype – integer

• Properties if needed, or use H5P_DEFAULT

2. Decide where to put it• Obtain location ID:

- Group ID puts it in a Group- File ID puts it in Root Group

3. Create dataset in file

4. Close everything

A“/” (root)

10/15/08 HDF & HDF-EOS Workshop XII 66

HDF5 Pre-defined Datatype IdentifiersHDF5 defines* set of Datatype Identifiers per HDF5

session.For example:

C Type HDF5 File Type HDF5 Memory Typeint H5T_STD_I32BE H5T_NATIVE_INT

H5T_STD_I32LE

float H5T_IEEE_F32BE H5T_NATIVE_FLOAT H5T_IEEE_F32LE

double H5T_IEEE_F64BE H5T_NATIVE_DOUBLEH5T_IEEE_F64LE

* Value of datatype is NOT fixed

10/15/08 HDF & HDF-EOS Workshop XII 67

Pre-defined File Datatype Identifiers

Examples:

H5T_IEEE_F64LE Eight-byte, little-endian, IEEE floating-pointH5T_STD_I32LE Four-byte, little-endian, signed two's

complement integer

NOTE: What you see in the file. Name is the same everywhere andexplicitly defines a datatype.

*STD= “An architecture with a semi-standard type like 2’s complement integer, unsigned integer…”

Architecture*ProgrammingType

10/15/08 HDF & HDF-EOS Workshop XII 68

Pre-defined Native Datatypes

Examples of predefined native types in C:

H5T_NATIVE_INT (int)H5T_NATIVE_FLOAT (float )H5T_NATIVE_UINT (unsigned int)H5T_NATIVE_LONG (long )H5T_NATIVE_CHAR (char )

NOTE: Memory types. Different for each machine.Used for reading/writing.

10/15/08 HDF & HDF-EOS Workshop XII 6969

Dataset Creation Property List

Dataset creation property list: information on how to organize data in storage.

ChunkedChunked

Chunked & Chunked & compressedcompressed

H5P_DEFAULT: contiguousH5P_DEFAULT: contiguous

10/15/08 HDF & HDF-EOS Workshop XII 7070

1 hid_t file_id, dataset_id, dataspace_id; 2 hsize_t dims[2];3 herr_t status; 4 file_id = H5Fcreate (”file.h5", H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT); 5 dims[0] = 4;6 dims[1] = 6;7 dataspace_id = H5Screate_simple (2, dims, NULL);

8 dataset_id = H5Dcreate(file_id,”A",H5T_STD_I32BE, dataspace_id, H5P_DEFAULT);

9 status = H5Dclose (dataset_id); 10 status = H5Sclose (dataspace_id); 11 status = H5Fclose (file_id);

Code: Create a Dataset

Terminate access to dataset, dataspace, file

Create a dataspace rank current

dims

Create a dataset

dataspace

datatype

property list (default)

pathname

10/15/08 HDF & HDF-EOS Workshop XII 71

Example Code - H5Dwrite

status = H5Dwrite (dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, dset_data);

Memory DatatypeDataset Identifier fromH5Dcreate or H5Dopen

10/15/08 HDF & HDF-EOS Workshop XII 72

Example Code – H5Dwrite

status = H5Dwrite (dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, dset_data);

MemoryDataspace File

Dataspace

Data Transfer Property List(MPI I/O, Transformations, …)

H5S_ALL selects entiredataspace

10/15/08 HDF & HDF-EOS Workshop XII 73

Partial I/O

Memory Dataspace File Dataspace (disk)

H5S_ALLH5S_ALL H5S_ALLH5S_ALL

Get a Dataspace: H5Screate_simple H5Dget_space

Modify Dataspace: H5Sselect_hyperslab H5Sselect_elements

10/15/08 HDF & HDF-EOS Workshop XII 74

Example Code – H5Dread

status = H5Dread (dataset_id, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, dset_rdata);

10/15/08 HDF & HDF-EOS Workshop XII 75

High Level APIs: HDF5 Lite (H5LT)

#include "H5LT.h"… file_id = H5Fcreate (“file.h5", H5F_ACC_TRUNC,

H5P_DEFAULT, H5P_DEFAULT);

status = H5LTmake_dataset (file_id,“A", 2, dims,

H5T_STD_I32BE, data);

status = H5Fclose (file_id);

10/15/08 HDF & HDF-EOS Workshop XII 76

High Level APIs

• HDF5 Lite• HDF5 Image• HDF5 Table • HDF5 Dimension Scales• HDF5 Packet Table

10/15/08 HDF & HDF-EOS Workshop XII 7777

A B“/” (root)

Example: Create a Group

4x6 array of integers

file.h5

10/15/08 HDF & HDF-EOS Workshop XII 7878

Steps to Create a Group

1. Decide where to put it – “root group”• Obtain location ID

2. Decide name – “B”

3. Create group in file

4. (Eventually) close the group.

10/15/08 HDF & HDF-EOS Workshop XII 7979

Code: Create a Group

hid_t file_id, group_id; .../* Open “file.h5” */ file_id = H5Fopen (“file.h5”, H5F_ACC_RDWR,

H5P_DEFAULT);

/* Create group "/B" in file. */ group_id = H5Gcreate (file_id,"B",0);

/* Close group and file. */ status = H5Gclose (group_id); status = H5Fclose (file_id);

Size hint for number of bytes to store names of objects. 0=default

10/15/08 HDF & HDF-EOS Workshop XII 80

Thank you!

This work was supported by the Cooperative Agreement with the National Aeronautics and Space Administration (NASA) under NASA grant NNX06AC83A and NNX08A077A. Any opinions, findings, conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of NASA.