Marcel Ritter , Werner Benger , Joseph Stoeckl , Donna Delparte , Mike Folk, Quincey Koziol,
description
Transcript of Marcel Ritter , Werner Benger , Joseph Stoeckl , Donna Delparte , Mike Folk, Quincey Koziol,
CROSS DISCIPLINARY APPLICATIONS OF MULTIPLEX OBSERVATIONAL AND COMPUTATIONAL DATASETS USING
FOR ARCHIVING AND HIGH PERFORMANCE PROCESSING.
Marcel Ritter, Werner Benger, Joseph Stoeckl, Donna Delparte, Mike Folk, Quincey Koziol,
Frank Steinbacher and Markus Aufleger
HDF5
Center for Computation & Technology
ASTRO@UIBK
Outlook
• Motivation• Requirements on a Data Format• Introduction HDF5 • F5– Introduction– Examples of Data Sets
• Application Example: – The Hawaiian Geospatial Data Repository
• Conclusion
Motivation
Workgroup A Workgroup B
Scientific Collaboration
Motivation
Workgroup A Workgroup B
Software Tool 1
Software Tool 2
File Format 2
Scientific Collaboration
File Format 1
Motivation
Workgroup A Workgroup B
Software Tool 1
Software Tool 2
File Format 2File Format
1
Data Exchange
Motivation
File Format 2
File Format 1
File Format 3
File Format 4
File Format 5
…File Format N
Motivation
File Format 2
File Format 1
File Format 3
File Format 4
File Format 5
…File Format N
Huge Implementation Effort
o(N2)
Motivation
File Format 2
File Format 1
File Format 3
File Format 4
File Format 5
…File Format N
Common Data
Format
Less Implementation Effort o(N)
Motivation
Workgroup A Workgroup B
Workgroup C Workgroup DSoftware 3
Software Tool 1
Software 4
Software Tool 2
Common Data
Format
Easier collaborationMore time for science
Requirements on a Data Format
Easy access
Fast and efficient
Huge data(Terabytes)
Huge variety of
data
Self-descriptive
Well documented and user
community
Sustainable (>10 years)
!
HDF5Hierarchical Data Format 5
http://www.hdfgroup.org/HDF5
- A Few Analogies
• File system (in a file)• Binary XML file• PDF for numerical data• Database (container for
array variables)
HDF5
- Relationships
lat | lon | temp----|-----|----- 12 | 23 | 3.1 15 | 24 | 4.2 17 | 21 | 3.6
/
SimOutCity A
Parameters10;100;1000
Timestep36,000
HDF5
Group
Dataset
Attribute
Relation
-What Users Get…
• A multi-platform library and tools built on over 10 years experience in large data handling from the high performance computing community (HPC).
• A capability that:– Lets them organize large and/or complex collections of data– Gives them efficient and scalable data storage and access– Lets them integrate a wide variety of types of data and data
sources
– Guarantees long-term data integrity and preservation
HDF5
• Shapefiles: HDF5 as container format
HDF5
Browser application
• Shapefiles: HDF5 as container format
HDF5
Browser application
Vector dataPixel data
Attribute data
- More ApplicationsHDF5
Billions of elements/dozens associated values
Earth Science (Earth Observing System) Big simulations
Movie Making
Flight Testing
Aqua (6/01)
AuraTES
HRDLSMLS OMI
Terra CERES MISR
MODISMOPITT
AquaCERES
MODIS
AMSR
HDF5• More than a ZIP or TAR• also allows to describe the
structure of the contents of a file
• How to store different kinds of data sets consistently in HDF5?
• Based on HDF5• Inspired by concepts of:
– Topology– Differential Geometry – Geometric Algebra
• Separation of Geometry (Grids) and Datafield (Fields)
F5
Grid
Field
• Hierarchical Structure:
F5
Fiber Bundle
Time Slice
Grid
Topology
Coordinates
Field
F5
Visible to the end user
• Hierarchical Structure:
Fiber Bundle
Time Slice
Grid
Topology
Coordinates
Field
Fiber: 0D 1D 3D 6D BA
SE:
3D
2D
1
D
0
D
• Multi Channel – Multi Resolution Images:
F5
• Multi Channel – Multi Resolution Images:
F5
Time Grid Topology Representation Field [Datatype]/1.4/Satellite/VertexRefinement1x1/Cartesian/Positions [uniform-grid]
/RGB [byte,byte,byte] /N-IR [float64] /T-IR [float64]
/VertexRefinement2x2/Cartesian/Positions /RGB “ /N-IR
/T-IR/1.6/ …
• Full Waveform LIDAR:
F5
t1 t2 t3
t_emission
F5
Time Grid Topology Representation Field [Datatype]/CorseTime/LASER/POINTS/CartesianCoords/Positions [point3D]
/TimeStamp [float64] /Waveform [uint16,uint16] /Reflectance [float32]
/SHOTS /SHOTSAsPOINTS/Positions vlen[uint32] /Origin [point3D] /Direction [vector3D]
/EmissionTime [float64]
• Full Waveform LIDAR: - Laser Data
t1 t2 t3
t_emission
• Full Waveform LIDAR: - Airplane Data
F5
/CorseTime/PLANE/POINTS/CartesianCoords/Positions [point3D] /Rotation [rotor3D] /TimeStamps [float64]
• Bringing together in F5:– Satellite data– LIDAR– Shapefiles
F5
• Features of HDF5• Sustainable storage• Meta data• Compression• Parallel IO• Hyperslab access
• Consistent data organization of simple and complex spatial-temporal data
• Handle time series of data easily
• Make tools of other disciplines applicable to the Geo-science Community, such as astrophysics imaging mosaic tools for satellite data: Montage, http://montage.ipac.caltech.edu
Benefits
HAWAIIAN DATA REPOSITORY
http://www.epscor.hawaii.edu
Application Example
HAWAIIAN DATA REPOSITORY
Centralized integrative capability to store and manage access to massive (terabytes) research datasets
Goal:
Users:University of Hawaii
research teamsBroad statewide
research community
Objectives:
Collect, store and manage access to data
Utilize user portals
Utilize and link to the Maui High Performance Computing Center
(MHPCC)
Discovery, manipulation, fusion and visualization
Mission:
Geospatial Information and Mass Storage
How to manage and store large complex datasets?!!
Geospatial Information and Mass Storage
Geospatial Information and Mass Storage
HDF5
Geospatial Information and Mass Storage
F5
HDF5
CONCLUSION• A common data format eases and
reduces wasted time spent on data conversions
• Data formats for sustainable transparent storage of huge and complex data exist, one just has to use them –
• captures observational and simulation data consistently.
• Geoscience repositories, such as the
can be built upon this format.
F5
HDF5
COLLABORATIONS
HAWAIIAN DATA REPOSITORY
THANK YOU
References:
http://www.hdfgroup.org/HDF5
http://www.fiberbundle.net
http://www.epscor.hawaii.edu
http://montage.ipac.caltech.edu
http://sciviz.cct.lsu.edu
http://www.marcel-ritter.com
- HDFViewHDF5
screenshot of shapefiles
Geospatial Information and Mass Storage
• Weather station data• Marine buoy sensor data• GPS data collection• Database datasets, excel files• Spatial data - imagery, LiDAR, GIS
• Geoweb application services – WMS, WFS, WPC
• Database management • Data streaming• Data storage of statewide datasets
• Upload and download capability• Metadata search capacity• Visualization of spatial and non-
spatial datasets
• Access to HPC services • real-time modeling and analysis
• Grid– Manifold describing the base space
• Topology• Refinement level• Coordinate representation• Vertex positions in representation
F5