GridWorld Case Study1 Barbara Ericson Georgia Tech [email protected] Jan 2008.
Http:// Jay Lofstead ([email protected]) Input/Output APIs and Data Organization for High...
-
Upload
matthew-dennis -
Category
Documents
-
view
212 -
download
0
Transcript of Http:// Jay Lofstead ([email protected]) Input/Output APIs and Data Organization for High...
1http://www.adiosapi.org Jay Lofstead ([email protected])
Input/Output APIs and Data Organization for HighPerformance Scientific
ComputingNovember 17, 2008
Petascale Data Storage Institute Workshop at Supercomputing 2008
Jay Lofstead (GT), Fang Zheng (GT), Scott Klasky (ORNL), Karsten Schwan (GT)
2http://www.adiosapi.org Jay Lofstead ([email protected])
Overview
• Motivation
• Architecture
• Performance
3http://www.adiosapi.org Jay Lofstead ([email protected])
Motivation
• Many codes write lots of data, but rarely read• TBs of data• different types and sizes
• HDF-5 and pNetCDF used• convenient• tool integration• portable format
4http://www.adiosapi.org Jay Lofstead ([email protected])
Performance/Resilience Challenges
• pNetCDF• “right sized” header• coordination for each data declaration• data stored as logically described
• HDF-5• b-tree format• coordination for each data declaration• single metadata store vulnerable to
corruption.
5http://www.adiosapi.org Jay Lofstead ([email protected])
Architecture (ADIOS)
• Change IO method by changing XML file
• Switch between synchronous and asynchronous
• Hook into other systems like visualization and workflow
ExternalMetadata(XML file)
Scientific Codes
ADIOS API
DART
LIVE/DataTap
MPI-IO
POSIX IO
HD
F-5
pnetCDF
Viz Engines
Others (plug-in)
buffering schedule feedback
6http://www.adiosapi.org Jay Lofstead ([email protected])
Architecture (BP)
• Individual outputs into “process group” segments
• Metadata indices next
• Index offsets and version flag at end
ProcessGroup 1
ProcessGroup 2 ...
ProcessGroup n
ProcessGroup Index
VarsIndex
AttributesIndex
Index Offsetsand Version #
7http://www.adiosapi.org Jay Lofstead ([email protected])
Resilience Features
• Random node failure• timeouts• mark index entries as suspect
• Root node failure• scan file to rebuild index• use local size values to find offsets
8http://www.adiosapi.org Jay Lofstead ([email protected])
Data Characteristics
• Identify file contents efficiently• min/max• local array sizes
• Local-only makes it “free”• no communication
• Indices for summaries/direct access• copies for resilience
9http://www.adiosapi.org Jay Lofstead ([email protected])
Architecture (Strategy)
• ADIOS API for flexibility• Use PHDF-5/PNetCDF during
development for “correctness”• Use POSIX/MPI-IO methods (BP output
format) during production runs for performance
10http://www.adiosapi.org Jay Lofstead ([email protected])
Performance Overview
• Chimera (supernova) (8192 processes)• relatively small writes (~1 MB per process)• 1400 seconds pHDF-5 vs. 1.4 seconds
POSIX (or 10 seconds MPI-IO independent)
• GTC (fusion) (29,000 processes)• 25 GB/sec (out of 40 GB/sec theoretical
max) writing restarts• 3% of wall clock time spent on IO• > 60 TB of total output
11http://www.adiosapi.org Jay Lofstead ([email protected])
Performance Overview
• Collecting characteristics unmeasurable• 10, 50, 100 million entry arrays per
processes• 128-2048 processes• weak scaling
12http://www.adiosapi.org Jay Lofstead ([email protected])
Performance Overview
• Data conversion• Chimera 8192 process run took 117
seconds to convert to HDF-5 (compare 1400 seconds to write directly) on a single process
• Other tests have shown linear conversion performance with size
Parallel conversion will be faster...
13http://www.adiosapi.org Jay Lofstead ([email protected])
Summary
• Use ADIOS API• selectively choose consistency
• BP intermediate format• performance• resilience• later convert to HDF-5/NetCDF
Questions?