Using HDF5 tools for performance tuning and troubleshooting

12
06/20/22 HDF and HDF-EOS Workshop X, Landover, MD 1 Using HDF5 tools for performance tuning and troubleshooting

Transcript of Using HDF5 tools for performance tuning and troubleshooting

Page 1: Using HDF5 tools for performance tuning and troubleshooting

04/12/23 HDF and HDF-EOS Workshop X, Landover, MD

1

Using HDF5 tools for performance tuning and

troubleshooting

Page 2: Using HDF5 tools for performance tuning and troubleshooting

04/12/23 HDF and HDF-EOS Workshop X, Landover, MD

2

Introduction

• HDF5 tools may be very useful for performance tuning and troubleshooting• Discover objects and their properties in

HDF5 filesh5dump -p

• Get file size overhead informationh5stat

• Get locations of the objects in a fileh5ls

• Discover differencesh5diff, h5ls

• Location of raw datah5ls –vra

Page 3: Using HDF5 tools for performance tuning and troubleshooting

04/12/23 HDF and HDF-EOS Workshop X, Landover, MD

3

h5stat

• Prints different statistics about HDF5 file• Helps

• To troubleshoot size overhead in HDF5 files• To choose specific object’s properties and

storage strategies

• To use h5stat --help

h5stat file.h5

• Spec can be found http://www.hdfgroup.org/RFC/h5stat/

• Let us know if you need some “special” type of statistics

Page 4: Using HDF5 tools for performance tuning and troubleshooting

04/12/23 HDF and HDF-EOS Workshop X, Landover, MD

4

h5stat

• Reports two types of statistics:• High-level information about objects

(examples):• Number of different objects (groups,

datasets, datatypes) in a file• Number of unique datatypes• Size of raw data in a file

• Information about object’s structural metadata • Sizes of structural metadata (total/free)

• Object headers, local and global heaps• Sizes of B-trees

• Object headers fragmentation

Page 5: Using HDF5 tools for performance tuning and troubleshooting

04/12/23 HDF and HDF-EOS Workshop X, Landover, MD

5

h5stat

• Examples of high-level information:

File information # of unique groups: 10008 # of unique datasets: 30 # of unique named datatypes: 0……………………Max. # of links to object: 1 Max. depth of hierarchy: 4 Max. # of objects in group: 19……………………Group bins: # of groups of size 0: 10000 # of groups of size 1 - 9: 7 # of groups of size 10 - 99: 1……………………

Max. dimension size of 1-D datasets: 1643……………………Dataset filters information: Number of datasets with ……………… SZIP filter: 2 ……………… NBIT filter: 10 USER-DEFINED filter: 1

Page 6: Using HDF5 tools for performance tuning and troubleshooting

04/12/23 HDF and HDF-EOS Workshop X, Landover, MD

6

h5stat

• Conclusion:

• There are a lot of empty groups in the file; good candidate for compact group feature

• Some datasets use “user-defined” filters and may not be readable by HDF5 library

• SZIP compression is needed to read some datasets

Oh… my application uses buffers of size 1024 to read data…No wonder it crashes on reading…Do I have all filters needed to read the data?

Page 7: Using HDF5 tools for performance tuning and troubleshooting

04/12/23 HDF and HDF-EOS Workshop X, Landover, MD

7

h5stat

• Examples of structural metadata information:Object header size: (total/unused)

Groups: 1808/72

Datasets: 15792/832

………

Dataset storage information:

Total raw data size: 6140688

………

Dataset datatype #3:

Count (total/named) = (2/0)

Size (desc./elmt) = (10/65535)

Dataset datatype #4:

Count (total/named) = (1/0)

Size (desc./elmt) = (10/32000)

Page 8: Using HDF5 tools for performance tuning and troubleshooting

04/12/23 HDF and HDF-EOS Workshop X, Landover, MD

8

h5stat

• Conclusions• File size: 6228197• 1.5% overhead (not bad at all!)• There some elements are of size 65535

and 32000

Oh… Is it really what I want?Should I use other datatype and get advantage of compression?

Page 9: Using HDF5 tools for performance tuning and troubleshooting

04/12/23 HDF and HDF-EOS Workshop X, Landover, MD

9

Case study: Using HDF5tools to debug a problem

• My applications creates files on Windows with VS2005 and VS2003. I can read the VS2003 file but not the VS2005 one. H5dump reads both files OK and there are no differences. What am I doing wrong?

• h5diff good.h5 bad.h5 Datatype: </Definitions/timespec> and </Definitions/timespec> 1

differences found

• h5ls –vr good.h5 /Definitions/timespec Type Location: 0:1:0:900

• h5debug good.h5 900Message Information:Type class: compoundSize: 8 bytes

• h5debug bad.h5 900Message Information:Type class: compoundSize: 16 bytes

Page 10: Using HDF5 tools for performance tuning and troubleshooting

04/12/23 HDF and HDF-EOS Workshop X, Landover, MD

10

Case study: Using HDF5tools to debug a problem

• Conclusions• Compound datatype “timespec” requires

different number of bytes on VS2005 (16 bytes; 2x8bytes) and on VS2003 (8bytes; 2x4bytes)

Oh… How do I read my data back?I assumed that my struct would need only 8 bytes for each elements but it needs 16 bytes on VS2005. I need H5Tget_native_type functionto find the type of my data in memory

Page 11: Using HDF5 tools for performance tuning and troubleshooting

04/12/23 HDF and HDF-EOS Workshop X, Landover, MD

11

Where is my data?

• h5ls –var be_data.h5:Opened "be_data.h5" with sec2 driver.

/Array Dataset {5/5, 6/6}

Location: 0:1:0:792

Links: 1

Modified: 2006-04-07 15:08:39 CDT

Storage: 240 logical bytes, 240 allocated bytes, 100.00% utilization

Type: IEEE 64-bit big-endian float

Address: 2048

• 30 8-byte elements can be read from address 2048 by non-HDF5 application

Page 12: Using HDF5 tools for performance tuning and troubleshooting

04/12/23 HDF and HDF-EOS Workshop X, Landover, MD

12

Questions? Comments?

?

Thank you!