9/17/2015The HDF Group1 HDF Update Mike Folk The HDF Group HDF and HDF-EOS Workshop XI November 7,...

72
03/14/22 The HDF Group 1 HDF Update Mike Folk The HDF Group HDF and HDF-EOS Workshop XI November 7, 2007

Transcript of 9/17/2015The HDF Group1 HDF Update Mike Folk The HDF Group HDF and HDF-EOS Workshop XI November 7,...

04/19/23 The HDF Group 1

HDF Update

Mike Folk

The HDF Group

HDF and HDF-EOS Workshop XI

November 7, 2007

04/19/23 The HDF Group 2

Outline

• What is The HDF Group?• HDF Software Update• Other Activities of Interest

04/19/23 The HDF Group 3

What is The HDF Group

(THG)?

04/19/23 The HDF Group 4

THG, the Company

• Spun-off from University of Illinois July 2006• Non-profit• 20+ scientific, technology, professional staff• Intellectual property:

THG owns HDF4 and HDF5 HDF formats and libraries to remain open Libraries have BSD-type license

• Continue ties to U of I and NCSA

04/19/23 The HDF Group 5

The mission of The HDF Group is to ensure long-term

accessibility of HDF data through sustainable development and support of HDF technologies.

04/19/23 The HDF Group 6

Goals

• Maintain, evolve HDF for sponsors and communities that depend on it

• Do consulting, training, tuning, development, research

• Sustain The HDF Group for long term to assure data access over time

04/19/23 The HDF Group 7

 THG Services

• Helpdesk and Mailing Lists Available to all users as a first level of support

• Standard Support Rapid issue resolution support

• Consulting Needs assessment, troubleshooting, design reviews, etc.

• Enterprise Support Coordinating HDF activities across divisions

• Special Projects Adapting customer applications to HDF New features and tools, with changes normally incorporated into

open source product Research and Development

• Training Tutorials and hands-on practical experience

04/19/23 The HDF Group 8

HDF Software Update

04/19/23 The HDF Group 9

HDF4 update

04/19/23 The HDF Group 10

HDF 4.2r2 Released in October

04/19/23 The HDF Group 11

New features and changes

• New APIs added to the SD and GR interfaces: SDreset_maxopenfiles, SDget_maxopenfiles, Modifies, reports

maximum allowable number of files SDget_numopenfiles:Gets number of open files SDgetcompinfo, GRgetcompinfo: Gets compression info SDgetfilename: Retrieves name of file, given its ID SDgetnamelen: Retrieves length of object name, given its ID

• SZIP compression Now can be invoked by Fortran API Now available for raster images via GR interface

• SDS, Vgroup names no longer limited to 64 characters

04/19/23 The HDF Group 12

New features and changes

• HDF configuration changes --enable-netcdf flag introduced Autotools versions updated

• Many bug fixes made to hrepack and hdiff• See RELEASE.txt for a full list of changes

04/19/23 The HDF Group 13

Platforms to drop/add next release

• Drop Windows XP with MSVC+

+ 6.0 Linux 2.4 IRIX64 6.5 SunOS 5.8, 5.9

• Add Windows 64-bit (32 and

64-bit binaries)

04/19/23 The HDF Group 14

Platforms tested

• Systems AIX 5.3 (32-bit, 64-bit) Free BSD 6.2 (32-bit, 64-bit)* HP-UX B.11.23 (32-bit, 64-bit)* IRIX 64 v6.5 (32-bit, 64-bit) Linux 2.4, 2.6* Linux ia64 Linux x86_64 Sun OS 5.8, 5.10* (32-bit, 64-

bit) SunOS 5.10 on Intel Windows XP, Vista Mac OS X Intel*

* New platformsFor detailed info, see RELEASE.txt

• Compilers IBM C and Fortran compilers GNU gcc 3.4* and GNU Fortran HPUX C and Fortran compilers GNU gcc 3.4 and 4.* Intel C and Fortran versions 9.1 and

10.00 SUN WorkShop C and Fortran Visual Studio .NET and 2005 and

Intel Fortran Visual Studio 2005 (no fortran) GNU gcc 4.0.1 with gfortran and

g95

04/19/23 The HDF Group 15

HDF5 Update

04/19/23 The HDF Group 16

HDF5 1.6.6

04/19/23 The HDF Group 17

HDF5 1.6.6 release

• Primarily a bug-fix release• Some tool changes (see later slide)• http://hdfgroup.org/HDF5/release/obtain5.html

04/19/23 The HDF Group 18

Platforms dropped

• Operating systems AIX 5.3 Solaris 2.8 and 2.9 OSF1 Windows XP with MSVC++ 6.0

• Compilers PGI 6.5-*

http://www.hdfgroup.org/HDF5/release/alpha/obtain518.html

04/19/23 The HDF Group 19

Platforms added

• Systems Alpha Open VMS MAC OSX 10.4 (Intel) Solaris 2.* on Intel Cray XT3 Windows 64-bit (32 and 64-

bit) BG/L

• Compilers PGI V. 7.* Intel 10.* MPICH 1.2.7 MPICH2

04/19/23 The HDF Group 20

HDF5 1.8

04/19/23 The HDF Group 21

HDF5 1.8 new library features

• Datatype and dataspace features Create datatype from text description Integer to float conversions during I/O Compact storage for N-bit datatypes Offset+size storage filter, saving space “Null” dataspace – datasets with no elements Data transformation filter

04/19/23 The HDF Group 22

HDF5 1.8 – new library features

• Group improvements Creation order access Compact groups – small groups take less space Large group storage improvements Intermediate group creation

• Link improvements Unicode names allowed External links – to objects in another file User defined links – create own kinds of links

04/19/23 The HDF Group 23

HDF5 1.8 – new library features

• Attribute improvements Improved storage for large number of attributes Iterate or look up by creation order Unicode names allowed

• Support for Unicode UTF-8 character set• Shared header information, possibly saving space • Metadata cache improvements – faster I/O on

files with many objects• Better UNIX/Linux portability

04/19/23 The HDF Group 24

HDF5 1.8 – new APIs

• New extendible error-handling API• New APIs to copy objects between files quickly• Dimension scale model and API• “HDFpacket” API, to read/write packets efficiently

04/19/23 The HDF Group 25

HDF5 1.8 – Backward and Forward Compatibility

04/19/23 The HDF Group 26

HDF5 1.8 and 1.6

• Differences between 1.8 and 1.6.x Some file format changes Several new routines added Old APIs deprecated – may be removed in later

release

• Consequences Applications requiring 1.8 format changes will

generate objects that cannot be read by 1.6 library To exploit 1.8 changes, applications need to be

rewritten

04/19/23 The HDF Group 27

“The art of progress is to preserve order amid change, and to preserve change amid order.”

Alfred North Whitehead

04/19/23 The HDF Group 2804/19/23 The HDF Group 28

Principle of Maximum File Format CompatibilityUnless instructed otherwise, the HDF5 library will write objects Unless instructed otherwise, the HDF5 library will write objects using the earliest version of the format possible for describing using the earliest version of the format possible for describing the informationthe information.

Assures older library versions are forward compatible whenever possible:

Objects in new files can be read with old versions of the library, if the objects are “known” to the old libraries.

New versions of the library can always read objects in files written with older versions.

04/19/23 The HDF Group 3204/19/23 The HDF Group 32

Command Line Tools

04/19/23 The HDF Group 3304/19/23 The HDF Group 33

New features for existing tools

• -V option for all tools Prints HDF5 library version number used by tool

• h5repack: -L option Use latest version of file format to create objects

• h5dump: dumps groups/attributes in creation or name order -q Q, --sort_by=Q    Sort groups and attributes by index Q

-z Z, --sort_order=Z Sort groups and attributes by order Z

04/19/23 The HDF Group 3404/19/23 The HDF Group 34

New command line tools

• h5mkgrp Creates new groups and group hierarchies in an HDF5 file

• h5stat Provides statistics regarding the file, such as number of

objects per group, sizes of datasets, amount of free space in file

• h5copy Copy object within a file or cross files

• h5check Verifies an HDF5 file against the defined HDF5 File Format

Specification Completed for 1.6. In progress for 1.8

04/19/23 The HDF Group 3504/19/23 The HDF Group 35

Tool work in the pipeline

• Export numeric data formatted in several different ways (such as MS excel, XML, etc)

• Import ASCII data that conforms to certain format• Use a common text format for h5import and

h5dump• Support NaN in tools such as h5diff.

Challenges: NaN is platform specific NaN can have different values for the same

machine Checking NaN can be a performance hit

04/19/23 The HDF Group 3604/19/23 The HDF Group 36

HDF Java Products

04/19/23 The HDF Group 37

HDF5 Java is Growing UP

04/19/23 The HDF Group 3804/19/23 The HDF Group 38

HDFView changes

• HDFView 2.4 released • Many new features, such as

Support for compound datatypes of 2D+ arrays Support for "filtering fill value" in Image Viewer Effective handling of large 3D images Support large fonts in GUI components New autogain algorithm for image Brightness/Contrast

• New platforms Mac intel Linux 64-bit AMD Solaris 64-bit

04/19/23 The HDF Group 3904/19/23 The HDF Group 39

Other Java products

• 36 new enhancements and 44 bugs fixed• Test suite (using junit testing framework)

Tests all public methods in the object package Added “make check” to run the test suite

• Enhanced documentation All public methods in the object package are fully

documented

04/19/23 The HDF Group 4004/19/23 The HDF Group 40

Future work for Java

• Update HDF5 JNI APIs for HDF5 1.8 release• Release HDFView with bug fixes/new features

with HDF5 1.8 release• Port HDF5-SRB model to HDF5-iRODS model• Writing capability for HDF5-iRODS model

04/19/23 The HDF Group 41

Other Activities of Interest

04/19/23 The HDF Group 42

New THG Website

04/19/23 The HDF Group 4304/19/23 The HDF Group 43

New THG WebsiteNew THG Website

04/19/23 The HDF Group 44

HDF Performance Framework

04/19/23 The HDF Group 45

Goals

• A framework for performance regression testing• A tool for

Testing on multiple platforms Testing different versions Long term regression testing Assistance in debugging

04/19/23 The HDF Group 46

A User’s Benchmark

Performance Library

Database

Web Server

cron

www

HDF5 1.6 HDF5 1.8

PHP

Graph/Text

Solution

04/19/23 The HDF Group 47

| 178820 | 2007-08-17 21:51:14 | 10000 groups | creating 10000 empty groups | 1.8.0 | hdfdap | 0.670198 | 4384 |

for(i=0;i<1000 ;i++) { H5Gcreate(fileid,group_name,(size_t)0)); // Add groups}

H5Perf_endTimer(&time);

H5Perf_startTimer(&time);

H5Perf_addInstance(db_host, date, time);

00 21 * * * /home/local/hyoklee/src/chicago/test-perf-hdfdap-3.sh

Timestamp Instance Name Version Platform Time

Sample Usage

04/19/23 The HDF Group 4804/19/23 The HDF Group 48

Improved Crash Survivability

in the HDF5 Library

04/19/23 The HDF Group 4904/19/23 The HDF Group 49

Crash Survivability in HDF5

• Problem: Data in HDF5 files susceptible to corruption in the

event of an application or system crash. Corruption possible if structural metadata is being

written when the crash occurs.

• Initial Objective: Guarantee an HDF5 file with consistent metadata

can be reconstructed in the event of a crash. No guarantee on state of raw data – contains

whatever made it to disk prior to crash.

04/19/23 The HDF Group 5004/19/23 The HDF Group 50

Crash Survivability in HDF5

• Approach: Metadata Journaling When a piece of metadata is modified and in a

consistent state, make a journal note.  If the application crashes, a recovery program can

replay the journal by applying in order all metadata writes until the end of the last completed transaction written to the journal file.

04/19/23 The HDF Group 51

Faster HDF5 Data Appends

04/19/23 The HDF Group 5204/19/23 The HDF Group 52

Fast Data Appends

• Problem: Metadata operations limit the rate at which HDF5 can append data to datasets.

• Solution: new data structure for indexing chunks: Allows constant time extend, shrink and lookup of

chunks in datasets with single unlimited dimension # of metadata I/O operations to append to dataset

is independent of # of chunks Allows single-writer/multiple-reader access

• Details at: http://www.hdfgroup.uiuc.edu/RFC/HDF5/SkipListChunkIndex/SkipListChunkIndex.html

04/19/23 The HDF Group 53

netCDF-4

04/19/23 The HDF Group 54

netCDF-4 Project

• Enhanced NetCDF-4 Interface to HDF5 Combine features of netCDF and HDF5 Take advantage of their separate strengths

• Collaboration between NCSA, THG, Unidata• Currently in beta release• Will be released after HDF5 1.8

04/19/23 The HDF Group 55

NetCDF-4 Architecture

HDF5 Library

netCDF-4netCDF-4LibraryLibrary

netCDF-3Interface

netCDF-3applications

netCDF-3applications

netCDF-4netCDF-4applicationsapplications

netCDF-4netCDF-4applicationsapplications

HDF5applications

HDF5applications

netCDFfiles

netCDFfiles

netCDF-4HDF5 files

HDF5files

• Supports access to netCDF files and HDF5 files created through netCDF-4 interface

04/19/23 The HDF Group 5604/19/23 The HDF Group 56

HDF5 OPeNDAP Project

04/19/23 The HDF Group 5704/19/23 The HDF Group 57

Project description

• Investigate integrated DAP-aware HDF5 library that can provide seamless access to both local and remote data

• A NASA ROSES NRA project• See Kent Yang’s talk and poster

04/19/23 The HDF Group 58

NOAA – Science Data Stewardship

04/19/23 The HDF Group 5904/19/23 The HDF Group 59

NOAA – Science Data Stewardship

• Use HDF5 Archival Information Package (AIP) to archive HDF EOS2 data

• A collaboration between NSIDC and THG• See Ruth Duerr and Kent Yang’s poster

04/19/23 The HDF Group 6004/19/23 The HDF Group 60

HDF5 and .NET Framework

04/19/23 The HDF Group 6104/19/23 The HDF Group 61

Why .NET?

• The Microsoft .NET framework is used by most new applications created for Windows. Makes it easier to develop applications Reduces application vulnerability to security threats

• Supports development in multiple programming languages, in particular C#.

• Increased level of interest in .NET from users of HDF5.

04/19/23 The HDF Group 6204/19/23 The HDF Group 62

HDF and .NET Status

• Received funding to implement prototype .NET wrapper API for Windows XP Based on HDF5 C API Focus on C# binding Functionality limited to subset of API routines

• If funded, we would like to move beyond the prototype to Create .NET wrappers for all HDF C functions Offer full support for .NET wrappers with HDF5 1.8

04/19/23 The HDF Group 63

BioinformaticscaacaagccaaaactcgtacaacaacaagccaaaactcgtacaaCgagatatctcttggaaaaactCgagatatctcttggaaaaactgctcacaatattgacgtacaaggctcacaatattgacgtacaaggttgttcatgaaactttcggtagttgttcatgaaactttcggtaAcaatcgttgacattgcgacctAcaatcgttgacattgcgacctaatacagcccagcaagcagaataatacagcccagcaagcagaat

Managing genomic dataManaging genomic data

04/19/23 The HDF Group 64

Electron tomography

25-80Å resolution25-80Å resolution4k x 4k x 500 images now4k x 4k x 500 images now8k x 8k x 1k images soon (256 GB)8k x 8k x 1k images soon (256 GB)

04/19/23 The HDF Group 65

Next Generation DNA Sequencing

• Next Gen Sequencing platforms produce ~1500 X more data than CE (Sanger)

• A single Next Gen instrument can produce 20 times more data a single run than a day’s operation of a genome center with 100 CE instruments

04/19/23 The HDF Group 66

An email on Sept 21…

“… A little background, we're doing genetic association studies, these result in large 2-d matrices (40K x 1M40K x 1M before applying threshholds).  Each of Each of the cells in this matrix has ~10 numerical the cells in this matrix has ~10 numerical statisticsstatistics (e.g. some sort of pvalue)… ”

40K x 1M x 10 x 4 = 1,600,000,000,000 (1.6 TB)40K x 1M x 10 x 4 = 1,600,000,000,000 (1.6 TB)

04/19/23 The HDF Group 67

Product Data

STEPSTEP

04/19/23 The HDF Group 68

Product data

• HDF5 proposed to ISO as binary representation for product data representation and exchange

• Would be a binary option to the STEP format• ISO/NWI-CD 10303-026, STEP Part 26

04/19/23 The HDF Group 69

SQL Server and HDF5

04/19/23 The HDF Group 7004/19/23 The HDF Group 70

SQL Server and HDF5

• THG discussing possible project with Microsoft• Microsoft envisions a dream environment for

scientists that would encompass both computing and data management

• Possible SQL Server solution Combine RDBMS and scientific analysis tools in a

single integrated system Use HDF5 to manage scientific objects not handled

well by traditional database

04/19/23 The HDF Group 71

HDF5 in SQL server

Entity Framework (EDM, eSQL, O-R mapping)HDF5 EDM model

Visualization Libraries (MATLAB,…)

HDF5 files

Web Services(XML, REST, RSS)

OLAP and Data Mining Reporting

HDF5 typeHDF5 type

HDF5 Index

HDF5 FS blob

HDF5 FS blob

HDF5 TVFsHDF5 TVFs

.NET Languages with Language Integrated Query

SQL Server

04/19/23 The HDF Group 72

Thank You Alland

Thank You NASA!

04/19/23 The HDF Group 73

Acknowledgement

This report is based upon work supported in part by a Cooperative Agreement with NASA under NASA

NNG05GC60A. Any opinions, findings, and conclusions or recommendations expressed in this material are

those of the author(s) and do not necessarily reflect the views of the National Aeronautics and Space

Administration.

04/19/23 The HDF Group 74

Questions/comments?

04/19/23 The HDF Group 75

Information Sources

• HDF websitehttp://hdfgroup.org/

• HDF5 Information Centerhttp://hdfgroup.org/HDF5/

• HDF [email protected]

• HDF users mailing [email protected]

coming soon: [email protected]