What NetCDF Users should know about HDF5The HDF Group •Non-for-profit company with a mission to...
Transcript of What NetCDF Users should know about HDF5The HDF Group •Non-for-profit company with a mission to...
![Page 1: What NetCDF Users should know about HDF5The HDF Group •Non-for-profit company with a mission to sustain and develop HDF technology affiliated with University of Illinois •Spun-off](https://reader036.fdocuments.in/reader036/viewer/2022070910/5f9d81a3f936806d7e3dbb6c/html5/thumbnails/1.jpg)
7/23/07 1
What NetCDF users shouldknow about HDF5?
Elena PourmalThe HDF Group
July 20, 2007
![Page 2: What NetCDF Users should know about HDF5The HDF Group •Non-for-profit company with a mission to sustain and develop HDF technology affiliated with University of Illinois •Spun-off](https://reader036.fdocuments.in/reader036/viewer/2022070910/5f9d81a3f936806d7e3dbb6c/html5/thumbnails/2.jpg)
7/23/07 2
Outline
• The HDF Group and HDF software• HDF5 Data Model• Using HDF5 tools to work with NetCDF-4 programs
files• Performance issues
ChunkingVariable-length datatypesParallel performance
• Crash proofing in HDF5
![Page 3: What NetCDF Users should know about HDF5The HDF Group •Non-for-profit company with a mission to sustain and develop HDF technology affiliated with University of Illinois •Spun-off](https://reader036.fdocuments.in/reader036/viewer/2022070910/5f9d81a3f936806d7e3dbb6c/html5/thumbnails/3.jpg)
7/23/07 3
The HDF Group
• Non-for-profit company with a mission to sustain and developHDF technology affiliated with University of Illinois
• Spun-off NCSA University of Illinois in July 2006• Located at the U of I Campus South Research Park• 17 team members, 5 graduate and undergraduate students• Owns IP for HDF fie format and software• Funded by NASA, DOE, others
![Page 4: What NetCDF Users should know about HDF5The HDF Group •Non-for-profit company with a mission to sustain and develop HDF technology affiliated with University of Illinois •Spun-off](https://reader036.fdocuments.in/reader036/viewer/2022070910/5f9d81a3f936806d7e3dbb6c/html5/thumbnails/4.jpg)
7/23/07 4
HDF5 file format and I/O library
• General simple data model
• Flexible store data of diverse origins, sizes, types supports complex data structures
• Portable available for many operating systems and machines
• Scalable works in high end computing environments accommodates date of any size or multiplicity
• Efficient fast access, including parallel i/o stores big data efficiently
![Page 5: What NetCDF Users should know about HDF5The HDF Group •Non-for-profit company with a mission to sustain and develop HDF technology affiliated with University of Illinois •Spun-off](https://reader036.fdocuments.in/reader036/viewer/2022070910/5f9d81a3f936806d7e3dbb6c/html5/thumbnails/5.jpg)
7/23/07 5
HDF5 file format and I/O library
• File formatComplex
Objects headers Raw data B-trees Local and Global heaps etc
• C Library500+ APIsC++, Fortran90 and Java wrappersHigh-level APIs (images, tables, packets)
![Page 6: What NetCDF Users should know about HDF5The HDF Group •Non-for-profit company with a mission to sustain and develop HDF technology affiliated with University of Illinois •Spun-off](https://reader036.fdocuments.in/reader036/viewer/2022070910/5f9d81a3f936806d7e3dbb6c/html5/thumbnails/6.jpg)
7/23/07 6
StorageStorage
File on parallelFile on parallelfile systemfile systemFileFile
Split metadata Split metadata and raw data filesand raw data files
User-definedUser-defineddevicedevice
?? Across the networkAcross the networkor to/from anotheror to/from another
application or libraryapplication or libraryHDF5 formatHDF5 format
HDF5HDF5 data model & API data model & API
Apps: simulation, visualization, remote sensing…Examples: Thermonuclear simulations
Product modelingData mining tools
Visualization toolsClimate models
Common application-specific data models
HDF5 virtual file layer (I/O drivers)HDF5 virtual file layer (I/O drivers)
MPI I/OMPI I/OSplit FilesSplit FilesSec2Sec2 CustomCustom StreamStreamHDF5 serial &HDF5 serial &
parallel I/Oparallel I/O
NetCDF-4 SAF hdf5mesh HDF-EOSIDLappl-specificappl-specificAPIs
UnidataLANL LLNL, SNL Grids COTS NASA
![Page 7: What NetCDF Users should know about HDF5The HDF Group •Non-for-profit company with a mission to sustain and develop HDF technology affiliated with University of Illinois •Spun-off](https://reader036.fdocuments.in/reader036/viewer/2022070910/5f9d81a3f936806d7e3dbb6c/html5/thumbnails/7.jpg)
7/23/07 7
HDF5 file format and I/O library
For NetCDF-4 users HDF5 complexity is hidden behindNetCDF-4 APIs
![Page 8: What NetCDF Users should know about HDF5The HDF Group •Non-for-profit company with a mission to sustain and develop HDF technology affiliated with University of Illinois •Spun-off](https://reader036.fdocuments.in/reader036/viewer/2022070910/5f9d81a3f936806d7e3dbb6c/html5/thumbnails/8.jpg)
7/23/07 8
HDF5 Tools
• Command line utilitieshttp://www.hdfgroup.org/hdf5tools.html• Readers
h5dumph5ls
• Writersh5repackh5copyh5import
• Miscellaneoush5diff, h5repart, h5mkgrp, h5stat, h5debug, h5jam/h5unjam
• Convertersh52gif, gif2h5, h4toh5, h5toh4
• HDFView (Java browser and editor)
![Page 9: What NetCDF Users should know about HDF5The HDF Group •Non-for-profit company with a mission to sustain and develop HDF technology affiliated with University of Illinois •Spun-off](https://reader036.fdocuments.in/reader036/viewer/2022070910/5f9d81a3f936806d7e3dbb6c/html5/thumbnails/9.jpg)
7/23/07 9
Other HDF5 Tools
HDF ExplorerWindows only, works with NetCDF-4 files
http://www.space-research.org/ PyTables IDL Matlab Labview Mathematica See
http://www.hdfgroup.org/tools5app.html
![Page 10: What NetCDF Users should know about HDF5The HDF Group •Non-for-profit company with a mission to sustain and develop HDF technology affiliated with University of Illinois •Spun-off](https://reader036.fdocuments.in/reader036/viewer/2022070910/5f9d81a3f936806d7e3dbb6c/html5/thumbnails/10.jpg)
7/23/07 10
HDF Information
• HDF Information Centerhttp://hdfgroup.org
• HDF Help email [email protected]
• HDF users mailing [email protected]@hdfgroup.org
![Page 11: What NetCDF Users should know about HDF5The HDF Group •Non-for-profit company with a mission to sustain and develop HDF technology affiliated with University of Illinois •Spun-off](https://reader036.fdocuments.in/reader036/viewer/2022070910/5f9d81a3f936806d7e3dbb6c/html5/thumbnails/11.jpg)
7/23/07 11
NetCDF and HDF5 terminology
DataspaceDimensions
AttributeAttribute
Dimension scaleCoordinate variable
DatasetVariable
HDF5 fileDataset
HDF5NetCDF
![Page 12: What NetCDF Users should know about HDF5The HDF Group •Non-for-profit company with a mission to sustain and develop HDF technology affiliated with University of Illinois •Spun-off](https://reader036.fdocuments.in/reader036/viewer/2022070910/5f9d81a3f936806d7e3dbb6c/html5/thumbnails/12.jpg)
7/23/07 12
Mesh Example, in HDFView
![Page 13: What NetCDF Users should know about HDF5The HDF Group •Non-for-profit company with a mission to sustain and develop HDF technology affiliated with University of Illinois •Spun-off](https://reader036.fdocuments.in/reader036/viewer/2022070910/5f9d81a3f936806d7e3dbb6c/html5/thumbnails/13.jpg)
7/23/07 13
HDF5 Data Model
![Page 14: What NetCDF Users should know about HDF5The HDF Group •Non-for-profit company with a mission to sustain and develop HDF technology affiliated with University of Illinois •Spun-off](https://reader036.fdocuments.in/reader036/viewer/2022070910/5f9d81a3f936806d7e3dbb6c/html5/thumbnails/14.jpg)
7/23/07 14
HDF5 data model
• HDF5 file – container for scientific data• Primary Objects
• Groups• Datasets
• Additional ways to organize data• Attributes• Sharable objects• Storage and access properties
NetCDF-4 builds from these parts.
NetCDF-4 builds from these parts.
![Page 15: What NetCDF Users should know about HDF5The HDF Group •Non-for-profit company with a mission to sustain and develop HDF technology affiliated with University of Illinois •Spun-off](https://reader036.fdocuments.in/reader036/viewer/2022070910/5f9d81a3f936806d7e3dbb6c/html5/thumbnails/15.jpg)
7/23/07 15
HDF5 Dataset
DataMetadataDataspaceDataspace
3
RankRank
Dim_2 = 5Dim_1 = 4
DimensionsDimensions
time = 32.4
pressure = 987
temp = 56
AttributesAttributes
chunkedcompressed
Dim_3 = 7
Storage infoStorage info
IEEE 32-bit floatDatatypeDatatype
checksum
![Page 16: What NetCDF Users should know about HDF5The HDF Group •Non-for-profit company with a mission to sustain and develop HDF technology affiliated with University of Illinois •Spun-off](https://reader036.fdocuments.in/reader036/viewer/2022070910/5f9d81a3f936806d7e3dbb6c/html5/thumbnails/16.jpg)
7/23/07 16
Datatypes
• HDF5 atomic types normal integer & float user-definable (e.g. 13-bit integer) variable length types (e.g. strings, ragged arrays) pointers - references to objects/dataset regions enumeration - names mapped to integers array opaque
• HDF5 compound types Comparable to C structs Members can be atomic or compound types No restriction on comlexity
![Page 17: What NetCDF Users should know about HDF5The HDF Group •Non-for-profit company with a mission to sustain and develop HDF technology affiliated with University of Illinois •Spun-off](https://reader036.fdocuments.in/reader036/viewer/2022070910/5f9d81a3f936806d7e3dbb6c/html5/thumbnails/17.jpg)
7/23/07 17
RecordRecord
int8int8 int4int4 int16int16 2x3x2 array of float322x3x2 array of float32Datatype:Datatype:
HDF5 dataset: array of records
Dimensionality: 5 x 3Dimensionality: 5 x 3
3
5
![Page 18: What NetCDF Users should know about HDF5The HDF Group •Non-for-profit company with a mission to sustain and develop HDF technology affiliated with University of Illinois •Spun-off](https://reader036.fdocuments.in/reader036/viewer/2022070910/5f9d81a3f936806d7e3dbb6c/html5/thumbnails/18.jpg)
7/23/07 18
Groups
• A mechanism for collectionsof related objects
• Every file starts with a rootgroup
• Similar to UNIXdirectories
• Can have attributes• Objects are identified by a path e.g. /d/b, /t/a
“/”t d
h
a b c a
![Page 19: What NetCDF Users should know about HDF5The HDF Group •Non-for-profit company with a mission to sustain and develop HDF technology affiliated with University of Illinois •Spun-off](https://reader036.fdocuments.in/reader036/viewer/2022070910/5f9d81a3f936806d7e3dbb6c/html5/thumbnails/19.jpg)
7/23/07 19
Attributes
• Attribute – data of the form “name = value”, attached toan object (group, dataset, named datatype)
• Operations scaled down versions of dataset operationsNot extendibleNo compressionNo partial I/O
• Optional• Can be overwritten, deleted, added during the “life” of a
dataset• Size under 64K in releases before HDF5 1.8.0
![Page 20: What NetCDF Users should know about HDF5The HDF Group •Non-for-profit company with a mission to sustain and develop HDF technology affiliated with University of Illinois •Spun-off](https://reader036.fdocuments.in/reader036/viewer/2022070910/5f9d81a3f936806d7e3dbb6c/html5/thumbnails/20.jpg)
7/23/07 20
Using HDF5 tools withNetCDF-4 programs and files
![Page 21: What NetCDF Users should know about HDF5The HDF Group •Non-for-profit company with a mission to sustain and develop HDF technology affiliated with University of Illinois •Spun-off](https://reader036.fdocuments.in/reader036/viewer/2022070910/5f9d81a3f936806d7e3dbb6c/html5/thumbnails/21.jpg)
7/23/07 21
Example
• Create netCDF-4 file• /Users/epourmal/Working/_NetCDF-4
• s.c creates simple_xy.nc (NetCDF3 file)• sh5.c creates simple_xy_h5.nc (NetCDF4 file)• Use h5cc script to compile both examples• See contents simple_xy_h5.nc with ncdump and
h5dump• Useful flags
-h to print help menu -b to export data to binary file -H to display metadata information only
• HDF Explorer
![Page 22: What NetCDF Users should know about HDF5The HDF Group •Non-for-profit company with a mission to sustain and develop HDF technology affiliated with University of Illinois •Spun-off](https://reader036.fdocuments.in/reader036/viewer/2022070910/5f9d81a3f936806d7e3dbb6c/html5/thumbnails/22.jpg)
7/23/07 22
NetCDF view: ncdump output
% ncdump -h simple_xy_h5.ncnetcdf simple_xy_h5 {dimensions: x = 6 ; y = 12 ;variables: int data(x, y) ;data:}
% h5dump -H simple_xy.nch5dump error: unable to open file "simple_xy.nc” This is NetCDF3 file, h5dump will not work
![Page 23: What NetCDF Users should know about HDF5The HDF Group •Non-for-profit company with a mission to sustain and develop HDF technology affiliated with University of Illinois •Spun-off](https://reader036.fdocuments.in/reader036/viewer/2022070910/5f9d81a3f936806d7e3dbb6c/html5/thumbnails/23.jpg)
7/23/07 23
HDF5 view: h5dump output
% h5dump -H simple_xy_h5.ncHDF5 "simple_xy_h5.nc" {GROUP "/" { DATASET "data" { DATATYPE H5T_STD_I32LE DATASPACE SIMPLE { ( 6, 12 ) / ( 6, 12 ) } ATTRIBUTE "DIMENSION_LIST" { DATATYPE H5T_VLEN { H5T_REFERENCE} DATASPACE SIMPLE { ( 2 ) / ( 2 ) } } } DATASET "x" { DATATYPE H5T_IEEE_F32BE DATASPACE SIMPLE { ( 6 ) / ( 6 ) } ……. }
![Page 24: What NetCDF Users should know about HDF5The HDF Group •Non-for-profit company with a mission to sustain and develop HDF technology affiliated with University of Illinois •Spun-off](https://reader036.fdocuments.in/reader036/viewer/2022070910/5f9d81a3f936806d7e3dbb6c/html5/thumbnails/24.jpg)
7/23/07 24
HDF Explorer
![Page 25: What NetCDF Users should know about HDF5The HDF Group •Non-for-profit company with a mission to sustain and develop HDF technology affiliated with University of Illinois •Spun-off](https://reader036.fdocuments.in/reader036/viewer/2022070910/5f9d81a3f936806d7e3dbb6c/html5/thumbnails/25.jpg)
7/23/07 25
HDF Explorer
![Page 26: What NetCDF Users should know about HDF5The HDF Group •Non-for-profit company with a mission to sustain and develop HDF technology affiliated with University of Illinois •Spun-off](https://reader036.fdocuments.in/reader036/viewer/2022070910/5f9d81a3f936806d7e3dbb6c/html5/thumbnails/26.jpg)
7/23/07 26
Performance issues
![Page 27: What NetCDF Users should know about HDF5The HDF Group •Non-for-profit company with a mission to sustain and develop HDF technology affiliated with University of Illinois •Spun-off](https://reader036.fdocuments.in/reader036/viewer/2022070910/5f9d81a3f936806d7e3dbb6c/html5/thumbnails/27.jpg)
7/23/07 27
Performance issues
• Choose appropriate HDF5 library features to organizeand access data in HDF5 files
• Three examples:• Collective vs. Independent access in parallel HDF5
library• Chunking• Variable length data
![Page 28: What NetCDF Users should know about HDF5The HDF Group •Non-for-profit company with a mission to sustain and develop HDF technology affiliated with University of Illinois •Spun-off](https://reader036.fdocuments.in/reader036/viewer/2022070910/5f9d81a3f936806d7e3dbb6c/html5/thumbnails/28.jpg)
7/23/07 28
Layers – parallel example
NetCDF-4 Application
Parallel computing system (Linux cluster)Compute
node
I/O library (HDF5)
Parallel I/O library (MPI-I/O)
Parallel file system (GPFS)
Switch network/I/O servers
Computenode
Computenode
Computenode
Disk architecture & layout of data on diskDisk architecture & layout of data on disk
I/O flowsthrough manylayers fromapplication todisk.
![Page 29: What NetCDF Users should know about HDF5The HDF Group •Non-for-profit company with a mission to sustain and develop HDF technology affiliated with University of Illinois •Spun-off](https://reader036.fdocuments.in/reader036/viewer/2022070910/5f9d81a3f936806d7e3dbb6c/html5/thumbnails/29.jpg)
7/23/07 29
h5perf
• An I/O performance measurement tool• Test 3 File I/O API
• Posix I/O (open/write/read/close…)• MPIO (MPI_File_{open,write,read.close})• PHDF5
• H5Pset_fapl_mpio (using MPI-IO)• H5Pset_fapl_mpiposix (using Posix I/O)
![Page 30: What NetCDF Users should know about HDF5The HDF Group •Non-for-profit company with a mission to sustain and develop HDF technology affiliated with University of Illinois •Spun-off](https://reader036.fdocuments.in/reader036/viewer/2022070910/5f9d81a3f936806d7e3dbb6c/html5/thumbnails/30.jpg)
7/23/07 30
H5perf: Some features
• Check (-c) verify data correctness• Added 2-D chunk patterns in v1.8
![Page 31: What NetCDF Users should know about HDF5The HDF Group •Non-for-profit company with a mission to sustain and develop HDF technology affiliated with University of Illinois •Spun-off](https://reader036.fdocuments.in/reader036/viewer/2022070910/5f9d81a3f936806d7e3dbb6c/html5/thumbnails/31.jpg)
7/23/07 31
My PHDF5 Application I/O “inhales”
• If my application I/O performance is bad, what can Ido?• Use larger I/O data sizes• Independent vs Collective I/O• Specific I/O system hints• Parallel File System limits
![Page 32: What NetCDF Users should know about HDF5The HDF Group •Non-for-profit company with a mission to sustain and develop HDF technology affiliated with University of Illinois •Spun-off](https://reader036.fdocuments.in/reader036/viewer/2022070910/5f9d81a3f936806d7e3dbb6c/html5/thumbnails/32.jpg)
7/23/07 32
Independent Vs Collective Access
• User reported Independentdata transfer was muchslower than the Collectivemode
• Data array was tall and thin:230,000 rows by 6 columns
::
230,000 rows::
![Page 33: What NetCDF Users should know about HDF5The HDF Group •Non-for-profit company with a mission to sustain and develop HDF technology affiliated with University of Illinois •Spun-off](https://reader036.fdocuments.in/reader036/viewer/2022070910/5f9d81a3f936806d7e3dbb6c/html5/thumbnails/33.jpg)
7/23/07 33
4.12881.392.75180300
3.63528.152.29150000
3.11276.571.88122918
2.68108.201.0065536
1.8065.120.5032768
1.728.260.2516384
Collective (Sec.)Independent (Sec.)Data Size(MB)
# of Rows
Independent vs. Collective write
(6 processes, IBM p-690, AIX, GPFS)
![Page 34: What NetCDF Users should know about HDF5The HDF Group •Non-for-profit company with a mission to sustain and develop HDF technology affiliated with University of Illinois •Spun-off](https://reader036.fdocuments.in/reader036/viewer/2022070910/5f9d81a3f936806d7e3dbb6c/html5/thumbnails/34.jpg)
7/23/07 34
Independent vs Collective write(6 processes, IBM p-690, AIX, GPFS)
Performance (non-contiguous)
0
100
200
300
400
500
600
700
800
900
1000
0.00 0.50 1.00 1.50 2.00 2.50 3.00
Data space size (MB)
Tim
e (
s)
Independent
Collective
![Page 35: What NetCDF Users should know about HDF5The HDF Group •Non-for-profit company with a mission to sustain and develop HDF technology affiliated with University of Illinois •Spun-off](https://reader036.fdocuments.in/reader036/viewer/2022070910/5f9d81a3f936806d7e3dbb6c/html5/thumbnails/35.jpg)
7/23/07 35
1. A parallel version of NetCDF-3 from ANL/NorthwesternUniversity/University of Chicago (PnetCDF)
2. HDF5 parallel library 1.6.53. NetCDF-4 beta14. For more details see
http://www.hdfgroup.uiuc.edu/papers/papers/ParallelPerformance.pdf
Some performance results
![Page 36: What NetCDF Users should know about HDF5The HDF Group •Non-for-profit company with a mission to sustain and develop HDF technology affiliated with University of Illinois •Spun-off](https://reader036.fdocuments.in/reader036/viewer/2022070910/5f9d81a3f936806d7e3dbb6c/html5/thumbnails/36.jpg)
7/23/07 36
Flash I/O Website http://flash.uchicago.edu/~zingale/flash_benchmark_io/
Robb Ross, etc.”Parallel NetCDF: A Scientific High-Performance I/O Interface
HDF5 and PnetCDF Performance Comparison
![Page 37: What NetCDF Users should know about HDF5The HDF Group •Non-for-profit company with a mission to sustain and develop HDF technology affiliated with University of Illinois •Spun-off](https://reader036.fdocuments.in/reader036/viewer/2022070910/5f9d81a3f936806d7e3dbb6c/html5/thumbnails/37.jpg)
7/23/07 37
HDF5 and PnetCDF performance comparison
Bluesky: Power 4 uP: Power 5
Flash I/O Benchmark (Checkpoint files)
0
500
1000
1500
2000
2500
10 110 210 310
Number of Processors
MB
/s
PnetCDF HDF5 independent
Flash I/O Benchmark (Checkpoint files)
0
10
20
30
40
50
60
10 60 110 160
Number of Processors
MB
/s
PnetCDF HDF5 independent
![Page 38: What NetCDF Users should know about HDF5The HDF Group •Non-for-profit company with a mission to sustain and develop HDF technology affiliated with University of Illinois •Spun-off](https://reader036.fdocuments.in/reader036/viewer/2022070910/5f9d81a3f936806d7e3dbb6c/html5/thumbnails/38.jpg)
7/23/07 38
HDF5 and PnetCDF performance comparison
Bluesky: Power 4 uP: Power 5
Flash I/O Benchmark (Checkpoint files)
0
10
20
30
40
50
60
10 60 110 160
Number of Processors
MB
/s
PnetCDF HDF5 collective HDF5 independent
Flash I/O Benchmark (Checkpoint files)
0
500
1000
1500
2000
2500
10 110 210 310
Number of Processors
MB
/s
PnetCDF HDF5 collective HDF5 independent
![Page 39: What NetCDF Users should know about HDF5The HDF Group •Non-for-profit company with a mission to sustain and develop HDF technology affiliated with University of Illinois •Spun-off](https://reader036.fdocuments.in/reader036/viewer/2022070910/5f9d81a3f936806d7e3dbb6c/html5/thumbnails/39.jpg)
7/23/07 39
Parallel NetCDF-4 and PnetCDF
• Fixed problem size = 995 MB• Performance of PnetCDF4 is close to PnetCDF
020406080
100120140160
0 16 32 48 64 80 96 112 128 144
Number of processors
Ban
dwid
th (M
B/S
)PNetCDF from ANL NetCDF4
![Page 40: What NetCDF Users should know about HDF5The HDF Group •Non-for-profit company with a mission to sustain and develop HDF technology affiliated with University of Illinois •Spun-off](https://reader036.fdocuments.in/reader036/viewer/2022070910/5f9d81a3f936806d7e3dbb6c/html5/thumbnails/40.jpg)
7/23/07 40
HDF5 chunked dataset
•Dataset is partitioned into fixed-size chunks•Data can be added along any dimension•Compression is applied to each chunk•Datatype conversion is applied to each chunk•Chunking storage creates additional overhead in a file•Do not use small chunks
![Page 41: What NetCDF Users should know about HDF5The HDF Group •Non-for-profit company with a mission to sustain and develop HDF technology affiliated with University of Illinois •Spun-off](https://reader036.fdocuments.in/reader036/viewer/2022070910/5f9d81a3f936806d7e3dbb6c/html5/thumbnails/41.jpg)
7/23/07 41
Writing chunked dataset
C BA
…………..
• Each chunk is written as a contiguous blob• Chunks may be scattered all over the file • Compression is performed when chunk is evicted from the chunk cache• Other filters when data goes through filter pipeline (e.g. encryption)
AB C
C
File
Chunk cacheChunked dataset
Filter pipeline
![Page 42: What NetCDF Users should know about HDF5The HDF Group •Non-for-profit company with a mission to sustain and develop HDF technology affiliated with University of Illinois •Spun-off](https://reader036.fdocuments.in/reader036/viewer/2022070910/5f9d81a3f936806d7e3dbb6c/html5/thumbnails/42.jpg)
7/23/07 42
Writing chunked datasets
Dataset_1 header…………
Application memory
Metadata cache
Chunking B-tree nodesChunk cache
Default size is 1MB
• Size of chunk cache is set for file • Each chunked dataset has its own chunk cache• Chunk may be too big to fit into cache• Memory may grow if application keeps opening datasets
Dataset_N header…………
………
![Page 43: What NetCDF Users should know about HDF5The HDF Group •Non-for-profit company with a mission to sustain and develop HDF technology affiliated with University of Illinois •Spun-off](https://reader036.fdocuments.in/reader036/viewer/2022070910/5f9d81a3f936806d7e3dbb6c/html5/thumbnails/43.jpg)
7/23/07 43
Partial I/O for chunked dataset
• Build list of chunks and loop through the list• For each chunk:• Bring chunk into memory• Map selection in memory to selection in file• Gather elements into conversion buffer and perform conversion• Scatter elements back to the chunk• Apply filters (compression) when chunk is flushed from chunk cacheFor each element 3 memcopy performed
1 2
3 4
![Page 44: What NetCDF Users should know about HDF5The HDF Group •Non-for-profit company with a mission to sustain and develop HDF technology affiliated with University of Illinois •Spun-off](https://reader036.fdocuments.in/reader036/viewer/2022070910/5f9d81a3f936806d7e3dbb6c/html5/thumbnails/44.jpg)
7/23/07 44
Partial I/O for chunked dataset
3
Application memory
memcopy
Application buffer
Chunk
Elements participated in I/O are gathered into corresponding chunk
![Page 45: What NetCDF Users should know about HDF5The HDF Group •Non-for-profit company with a mission to sustain and develop HDF technology affiliated with University of Illinois •Spun-off](https://reader036.fdocuments.in/reader036/viewer/2022070910/5f9d81a3f936806d7e3dbb6c/html5/thumbnails/45.jpg)
7/23/07 45
Partial I/O for chunked dataset
3Conversion buffer
Gather data
Scatter dataApplication memory
Chunk cache
On eviction from cache chunk is compressed and is written to the file
File Chunk
![Page 46: What NetCDF Users should know about HDF5The HDF Group •Non-for-profit company with a mission to sustain and develop HDF technology affiliated with University of Illinois •Spun-off](https://reader036.fdocuments.in/reader036/viewer/2022070910/5f9d81a3f936806d7e3dbb6c/html5/thumbnails/46.jpg)
7/23/07 46
Chunking and selections
Great performance Poor performance
Selection coincides with a chunk Selection spans over all chunks
![Page 47: What NetCDF Users should know about HDF5The HDF Group •Non-for-profit company with a mission to sustain and develop HDF technology affiliated with University of Illinois •Spun-off](https://reader036.fdocuments.in/reader036/viewer/2022070910/5f9d81a3f936806d7e3dbb6c/html5/thumbnails/47.jpg)
7/23/07 47
Things to remember about HDF5 chunking
Use appropriate chunk sizesMake sure that cache is big enough to contain chunks
for partial I/OUse hyperslab selections that are aligned with chunksMemory may grow when application opens and
modifies a lot of chunked datasets
![Page 48: What NetCDF Users should know about HDF5The HDF Group •Non-for-profit company with a mission to sustain and develop HDF technology affiliated with University of Illinois •Spun-off](https://reader036.fdocuments.in/reader036/viewer/2022070910/5f9d81a3f936806d7e3dbb6c/html5/thumbnails/48.jpg)
7/23/07 48
Variable length datasets and I/O
• Examples of variable-length data• String
A[0] “the first string we want to write”…………………………………A[N-1] “the N-th string we want to write”
• Each element is a record of variable-lengthA[0] (1,1,0,0,0,5,6,7,8,9) length of the first record is 10A[1] (0,0,110,2005)………………………..A[N] (1,2,3,4,5,6,7,8,9,10,11,12,….,M) length of the N+1
record is M
![Page 49: What NetCDF Users should know about HDF5The HDF Group •Non-for-profit company with a mission to sustain and develop HDF technology affiliated with University of Illinois •Spun-off](https://reader036.fdocuments.in/reader036/viewer/2022070910/5f9d81a3f936806d7e3dbb6c/html5/thumbnails/49.jpg)
7/23/07 49
Variable length datasets and I/O
• Variable length description in HDF5 applicationtypedef struct { size_t length; void *p;}hvl_t;
• Base type can be any HDF5 typeH5Tvlen_create(base_type)
• ~ 20 bytes overhead for each element• Raw data cannot be compressed
![Page 50: What NetCDF Users should know about HDF5The HDF Group •Non-for-profit company with a mission to sustain and develop HDF technology affiliated with University of Illinois •Spun-off](https://reader036.fdocuments.in/reader036/viewer/2022070910/5f9d81a3f936806d7e3dbb6c/html5/thumbnails/50.jpg)
7/23/07 50
Variable length datasets and I/O
Global heapGlobal heap
Application bufferApplication buffer
Raw dataRaw data
Elements in application buffer point to global heaps where actual data is stored
Global heapGlobal heap
![Page 51: What NetCDF Users should know about HDF5The HDF Group •Non-for-profit company with a mission to sustain and develop HDF technology affiliated with University of Illinois •Spun-off](https://reader036.fdocuments.in/reader036/viewer/2022070910/5f9d81a3f936806d7e3dbb6c/html5/thumbnails/51.jpg)
7/23/07 51
VL chunked dataset in a file
File
Dataset header
Chunking B-tree
Dataset chunksRaw data
![Page 52: What NetCDF Users should know about HDF5The HDF Group •Non-for-profit company with a mission to sustain and develop HDF technology affiliated with University of Illinois •Spun-off](https://reader036.fdocuments.in/reader036/viewer/2022070910/5f9d81a3f936806d7e3dbb6c/html5/thumbnails/52.jpg)
7/23/07 52
Variable length datasets and I/O
• Hints• Avoid closing/opening a file while writing VL datasets
• global heap information is lost• global heaps may have unused space
• Avoid writing VL datasets interchangeably• data from different datasets will is written to the same heap
• If maximum length of the record is known, use fixed-length records and compression
![Page 53: What NetCDF Users should know about HDF5The HDF Group •Non-for-profit company with a mission to sustain and develop HDF technology affiliated with University of Illinois •Spun-off](https://reader036.fdocuments.in/reader036/viewer/2022070910/5f9d81a3f936806d7e3dbb6c/html5/thumbnails/53.jpg)
7/23/07 53
Crash-proofing
![Page 54: What NetCDF Users should know about HDF5The HDF Group •Non-for-profit company with a mission to sustain and develop HDF technology affiliated with University of Illinois •Spun-off](https://reader036.fdocuments.in/reader036/viewer/2022070910/5f9d81a3f936806d7e3dbb6c/html5/thumbnails/54.jpg)
7/23/07 54
Why crash proofing?
• HDF5 applications tend to run long times (sometimesuntil system crashes)
• Application crash may leave HDF5 file in a corruptedstate
• Currently there is no way to recover data• One of the main obstacles for productions codes that
use NetCDF-3 to move to NetCDF-4• Funded by ASC project• Prototype release is scheduled for the end of 2007
![Page 55: What NetCDF Users should know about HDF5The HDF Group •Non-for-profit company with a mission to sustain and develop HDF technology affiliated with University of Illinois •Spun-off](https://reader036.fdocuments.in/reader036/viewer/2022070910/5f9d81a3f936806d7e3dbb6c/html5/thumbnails/55.jpg)
7/23/07 55
HDF5 Solution
• Journaling• Modifications to HDF5 metadata are stored in an external
journal file• HDF5 will be using asynchronous writes to the journal file
for efficiency• Recovering after crash
• HDF5 recovery tool will replay the journal and apply allmetadata writes bringing HDF5 file to a consistent state
• Raw data will consist of data that made to disk• Solution will be applicable for both sequential and parallel
modes
![Page 56: What NetCDF Users should know about HDF5The HDF Group •Non-for-profit company with a mission to sustain and develop HDF technology affiliated with University of Illinois •Spun-off](https://reader036.fdocuments.in/reader036/viewer/2022070910/5f9d81a3f936806d7e3dbb6c/html5/thumbnails/56.jpg)
7/23/07 56
Thank you!
Questions ?