Productivity and High Performance, Can we have both? An ...Anaconda includes h5py package ² H5py...

48
Jialin Liu Data Analytics & Service Group Productivity and High Performance, Can we have both? An Exploration of Parallel- H5py from I/O Perspective -1- May 26, 2017

Transcript of Productivity and High Performance, Can we have both? An ...Anaconda includes h5py package ² H5py...

Page 1: Productivity and High Performance, Can we have both? An ...Anaconda includes h5py package ² H5py 2.6.0 ² Built-in hdf5 library, 1.8.17 ² Easy use with other packages ² No ...

Jialin Liu !Data Analytics & Service Group!

Productivity and High Performance, Can we have both? An Exploration of Parallel-H5py from I/O Perspective

-1-May26,2017

Page 2: Productivity and High Performance, Can we have both? An ...Anaconda includes h5py package ² H5py 2.6.0 ² Built-in hdf5 library, 1.8.17 ² Easy use with other packages ² No ...

Outlines

-2-

Ø HDF5 and H5py Ø Productivity

Ø H5py Internal

Ø Performance

Ø Case Studies

² Warp

² H5Boss

Page 3: Productivity and High Performance, Can we have both? An ...Anaconda includes h5py package ² H5py 2.6.0 ² Built-in hdf5 library, 1.8.17 ² Easy use with other packages ² No ...

QuinceyKoziol

HDF5

-3-

lat|lon|temp----|-----|-----12|23|3.115|24|4.217|21|3.6

Ø  HDF5areamongthetop5librariesatNERSC,2015²  750+uniqueusers@NERSC,millionof

usersworldwideØ  1987,NCSA&UIUC.NASAsendHDF-

EOSto2.4millionsendusersØ  HierarchicaldataorganizaWonØ  ParallelI/O

Page 4: Productivity and High Performance, Can we have both? An ...Anaconda includes h5py package ² H5py 2.6.0 ² Built-in hdf5 library, 1.8.17 ² Easy use with other packages ² No ...

HDF5

-4-

Integer: 32-bit, LE

HDF5 Datatype

Multi-dimensional array of identically typed data elements

Specifications for single data element and array dimensions

2 Rank Dimensions

Dim[0] = 4 Dim[1] = 5

HDF5 Dataspace

Page 5: Productivity and High Performance, Can we have both? An ...Anaconda includes h5py package ² H5py 2.6.0 ² Built-in hdf5 library, 1.8.17 ² Easy use with other packages ² No ...

H5py

-5-

The h5py package is a Pythonic interface to the HDF5 binary data format. Ø  H5py provides easy-to-use high level interface, which

allows you to store huge amounts of numerical data, Ø  Easily manipulate that data from NumPy. Ø  H5py uses straightforward NumPy and Python

metaphors, like dictionary and NumPy array syntax.

Page 6: Productivity and High Performance, Can we have both? An ...Anaconda includes h5py package ² H5py 2.6.0 ² Built-in hdf5 library, 1.8.17 ² Easy use with other packages ² No ...

-6-

H5py: a Productive HDF5 Interface

Page 7: Productivity and High Performance, Can we have both? An ...Anaconda includes h5py package ² H5py 2.6.0 ² Built-in hdf5 library, 1.8.17 ² Easy use with other packages ² No ...

Similar & Simpler Interface

-7-

Filename Mode

file_id=H5Fcreate(“output.h5”,H5F_ACC_TRUNC,H5P_DEFAULT,H5P_DEFAULT);

Page 8: Productivity and High Performance, Can we have both? An ...Anaconda includes h5py package ² H5py 2.6.0 ² Built-in hdf5 library, 1.8.17 ² Easy use with other packages ² No ...

Similar & Simpler Interface

-8-

H5Py HDF5

w-orx H5F_ACC_EXCL

w H5F_ACC_TRUNC

r H5F_RDONLY

r+ H5F_ACC_RDWR

a(default) H5F_ACC_RDWR&H5F_ACC_EXCL

Page 9: Productivity and High Performance, Can we have both? An ...Anaconda includes h5py package ² H5py 2.6.0 ² Built-in hdf5 library, 1.8.17 ² Easy use with other packages ² No ...

Everything is Object

-9-

FileObject

Page 10: Productivity and High Performance, Can we have both? An ...Anaconda includes h5py package ² H5py 2.6.0 ² Built-in hdf5 library, 1.8.17 ² Easy use with other packages ² No ...

One Line to Enable Parallel I/O

-10-

Page 11: Productivity and High Performance, Can we have both? An ...Anaconda includes h5py package ² H5py 2.6.0 ² Built-in hdf5 library, 1.8.17 ² Easy use with other packages ² No ...

-11-

CollecWveIO²  ReducestheIOcontenWononserverside²  AggregatessmallIOintolargerconWguousIO

IndependentIO CollecWveIO

Two-Phase Collective IO, NERSC Contributions

Jean-Luc.Vay,Remi.Lehe,LBNL

WARP

Page 12: Productivity and High Performance, Can we have both? An ...Anaconda includes h5py package ² H5py 2.6.0 ² Built-in hdf5 library, 1.8.17 ² Easy use with other packages ² No ...

Looks Like Numpy Arrays

-12-

Pathtothedataset

Field

²  Indices:anythingthatcanbeconvertedtoaPythonlong² Slices(i.e.[:]or[0:10])² Fieldnames,inthecaseofcompounddata² AtmostoneEllipsis(...)object² Limitedfancyslicing,e.g.,dset[1:6,[5,8,9]],usewithcau?on

Slicing

Page 13: Productivity and High Performance, Can we have both? An ...Anaconda includes h5py package ² H5py 2.6.0 ² Built-in hdf5 library, 1.8.17 ² Easy use with other packages ² No ...

Beyond Numpy Arrays

-13-

Chunking

Compression

²  Error-detecWon²  Chunking²  Compression

DatasetObject

Checksum

Page 14: Productivity and High Performance, Can we have both? An ...Anaconda includes h5py package ² H5py 2.6.0 ² Built-in hdf5 library, 1.8.17 ² Easy use with other packages ² No ...

-14-

Coding Efforts

Page 15: Productivity and High Performance, Can we have both? An ...Anaconda includes h5py package ² H5py 2.6.0 ² Built-in hdf5 library, 1.8.17 ² Easy use with other packages ² No ...

-15-

Coding Efforts: Implicit IO

NoDataIO

Yes

Yes,butparWal

Yes,butparWal

Page 16: Productivity and High Performance, Can we have both? An ...Anaconda includes h5py package ² H5py 2.6.0 ² Built-in hdf5 library, 1.8.17 ² Easy use with other packages ² No ...

-16-

Exploring Interactively on Notebook

hCps://ipython.nersc.gov

Page 17: Productivity and High Performance, Can we have both? An ...Anaconda includes h5py package ² H5py 2.6.0 ² Built-in hdf5 library, 1.8.17 ² Easy use with other packages ² No ...

-17-

Learning the Data Easily

HDF5DataLayerinCaffe

moduleloaddeeplearning

Page 18: Productivity and High Performance, Can we have both? An ...Anaconda includes h5py package ² H5py 2.6.0 ² Built-in hdf5 library, 1.8.17 ² Easy use with other packages ² No ...

-18-

Ø  H5py: Productivity ²  Similar/Simpler Interface ²  Everything is Object ²  One Line to Parallel I/O ²  Beyond Numpy ²  Productive Coding ²  Seamlessly Importable in Notebook, etc

Ø  H5py: Performance ²  ? ²  ?

Productivity --> Performance?

Page 19: Productivity and High Performance, Can we have both? An ...Anaconda includes h5py package ² H5py 2.6.0 ² Built-in hdf5 library, 1.8.17 ² Easy use with other packages ² No ...

-19-

h5py mpi4py numpy

Challenging: More than Single IO Layer

“H5py performance is slow, parallel IO is not as good as serial IO”

Page 20: Productivity and High Performance, Can we have both? An ...Anaconda includes h5py package ² H5py 2.6.0 ² Built-in hdf5 library, 1.8.17 ² Easy use with other packages ² No ...

-20-

h5py mpi4py numpy

Views of Performance

VerGcalView:•  Performancepenaltyofpython

layer.e.g.,H5py,Cython

HorizontalView:•  Scalability.e.g.,mpi4py,srun

Page 21: Productivity and High Performance, Can we have both? An ...Anaconda includes h5py package ² H5py 2.6.0 ² Built-in hdf5 library, 1.8.17 ² Easy use with other packages ² No ...

-21-

H5py Implementation (Vertical View)

HDF5 C API (libhdf5)

hsize_tH5Dget_storage_size(hid_tdset_id)

Cython (h5d.pyx)

cdefclassDatasetID(ObjectID):defget_storage_size(self):returnH5Dget_storage_size(self.id)

Python(_hl/dataset.py)

classDataset(HLObject):@propertydefstoragesize(self):returnself.id.get_storage_size()

DatasetID

Dataset

File

Group

FileID

Page 22: Productivity and High Performance, Can we have both? An ...Anaconda includes h5py package ² H5py 2.6.0 ² Built-in hdf5 library, 1.8.17 ² Easy use with other packages ² No ...

-22-

OperaGon H5py HDF5 Details RaGo

1KFileCreaWon(s) 4.7 3.0 Createafilethenclose

thefile63.8%

1KObjectScanning(s) 4.5 2.7

Openagroupthenscanallobjects:group,dataset,link,etc

60.0%

H5py Metadata Performance

Page 23: Productivity and High Performance, Can we have both? An ...Anaconda includes h5py package ² H5py 2.6.0 ² Built-in hdf5 library, 1.8.17 ² Easy use with other packages ² No ...

-23-

H5py vs. HDF5 Single Node Independent I/O

StrongScaling,800MB WeakScaling,800MB/Process

97.8%100%

Page 24: Productivity and High Performance, Can we have both? An ...Anaconda includes h5py package ² H5py 2.6.0 ² Built-in hdf5 library, 1.8.17 ² Easy use with other packages ² No ...

-24-

H5py vs. HDF5 Multi-node Independent I/O

StrongScaling WeakScaling

100%97.1%

Page 25: Productivity and High Performance, Can we have both? An ...Anaconda includes h5py package ² H5py 2.6.0 ² Built-in hdf5 library, 1.8.17 ² Easy use with other packages ² No ...

-25-

H5py vs. HDF5 Single Node Collective I/O

StrongScaling,800MB WeakScaling,800MB/Process

100%98.6%

Page 26: Productivity and High Performance, Can we have both? An ...Anaconda includes h5py package ² H5py 2.6.0 ² Built-in hdf5 library, 1.8.17 ² Easy use with other packages ² No ...

-26-

H5py vs. HDF5 Multi-node Collective I/O

StrongScaling WeakScaling

88%,81%,99%AVG:90%

84%,101%,75%AVG:87%

Page 27: Productivity and High Performance, Can we have both? An ...Anaconda includes h5py package ² H5py 2.6.0 ² Built-in hdf5 library, 1.8.17 ² Easy use with other packages ² No ...

-27-

H5py vs. HDF5 Performance

SingleNode MulG-nodes

Metadata1kFileCreaWon 63.8%

1kObjectScanning 60.0%

IndependentI/OWeakScaling 97.8% 100%

StrongScaling 100% 97.1%

CollecWveI/OWeakScaling 100% 90%

StrongScaling 98.6% 87%

H5PyPerformance/HDF5Performance

Page 28: Productivity and High Performance, Can we have both? An ...Anaconda includes h5py package ² H5py 2.6.0 ² Built-in hdf5 library, 1.8.17 ² Easy use with other packages ² No ...

-28-

Case Study I: Warp ²  ParWcle-in-cellsimulaWoncodes²  AlexFriedman,DavidGrote,

1980s²  LBNL,LLNL,andPPPL²  Broadvarietyofintegrated

physicsmodelsandextensivediagnosWcs

²  Laser-wakefieldRemiLehe,LBNL,GTC2017

Page 29: Productivity and High Performance, Can we have both? An ...Anaconda includes h5py package ² H5py 2.6.0 ² Built-in hdf5 library, 1.8.17 ² Easy use with other packages ² No ...

-29-

Case Study I: Warp ²  Laser-wakefield

RemiLehe,LBNL,GTC2017

500metersà9cm

Page 30: Productivity and High Performance, Can we have both? An ...Anaconda includes h5py package ² H5py 2.6.0 ² Built-in hdf5 library, 1.8.17 ² Easy use with other packages ² No ...

-30-

Case Study I: Warp IO with H5py

iteraWon0

iteraWon1

Page 31: Productivity and High Performance, Can we have both? An ...Anaconda includes h5py package ² H5py 2.6.0 ² Built-in hdf5 library, 1.8.17 ² Easy use with other packages ² No ...

-31-

Case Study I: Warp IO with H5py

²  172-600MBperfile

² Withparallel_output=False,thesimulaWonfinishedinlessthan20min,soittook

lessthan20mintowriteallthesefiles.

² Withparallel_output=True,thesimulaWononlyhadWmetowritethefirst2files(out

of80!)

Page 32: Productivity and High Performance, Can we have both? An ...Anaconda includes h5py package ² H5py 2.6.0 ² Built-in hdf5 library, 1.8.17 ² Easy use with other packages ² No ...

-32-

Case Study I: Warp IO with H5py

32bit 64bit

“LettherebeParallelI/O”

“Sorry,youbrokemyrules”

Page 33: Productivity and High Performance, Can we have both? An ...Anaconda includes h5py package ² H5py 2.6.0 ² Built-in hdf5 library, 1.8.17 ² Easy use with other packages ² No ...

-33-

Case Study I: Warp IO with H5py

²  NumpyarrayinWarpisusing64bits²  H5pydatasetinWarpiscreatedwith32bits

float,dtype=‘f’²  HDF5internallychecksthetypeconsistency²  RefusestousecollecWveI/Oincaseof

inconsistency

AlexSim,CRD/LBNL

²  BeforeFix:527seconds²  AwerFix:1.3seconds

Page 34: Productivity and High Performance, Can we have both? An ...Anaconda includes h5py package ² H5py 2.6.0 ² Built-in hdf5 library, 1.8.17 ² Easy use with other packages ² No ...

-34-

Ø  BOSSBaryonOscillaWonSpectroscopicSurvey–fromSDSS

Ø  Performtypicalrandomlygeneratedquerytoextractsmallamountofstars/galaxiesfrommillions

Ø  RunonfinalreleaseofSDSS-IIIcompleteBOSSdataset

Ø  H5Boss:AH5pybasedpythonpackagefor:²  ReformayngFitstoHDF5files²  Querying/SubseyngFiberdatasets

Baryon acousWc oscillaWons in earlyuniverse, sWll can be seen in survey likeBOSS, (courtesy of Chris Blake and SamMoorfield)

Case Study II: H5Boss

JialinLiu,DebbieBard,QuinceyKoziol,StephenBailey,Prabhat,“H5Boss:AHDF5basedPythonPackageforBOSSSpectroscopicSurveyData”,InSubmission

Page 35: Productivity and High Performance, Can we have both? An ...Anaconda includes h5py package ² H5py 2.6.0 ² Built-in hdf5 library, 1.8.17 ² Easy use with other packages ² No ...

-35-

H5boss.subset

User`cat1`islookingfor1millionfibers

DataAnalysis

....

ConvertedFitsFilesinH5

Read

Que

ry

plate mjd fiber

4322 55532 32

5892 55598 25

6298 55133 11

Case Study II: H5Boss IO

Write

SubsetFile

PostAnalysis

Share

Page 36: Productivity and High Performance, Can we have both? An ...Anaconda includes h5py package ² H5py 2.6.0 ² Built-in hdf5 library, 1.8.17 ² Easy use with other packages ² No ...

-36-

BeforeopWmizaWon,with1kquery,strongscaling:²  Scalableonsinglenode²  NotscalableonmulWplenodes

Case Study II: H5Boss IO

Page 37: Productivity and High Performance, Can we have both? An ...Anaconda includes h5py package ² H5py 2.6.0 ² Built-in hdf5 library, 1.8.17 ² Easy use with other packages ² No ...

-37-

1

2

Query

Read1 2

From1nodeto32nodes

1

2

Case Study II: H5Boss IO

Page 38: Productivity and High Performance, Can we have both? An ...Anaconda includes h5py package ² H5py 2.6.0 ² Built-in hdf5 library, 1.8.17 ² Easy use with other packages ² No ...

-38-

EachProcess:Query1.  Filesopen2.  Plate/mjd/fiberscanning/searching3.  Key-valueconstrucWon,neededforcreaWngthesharedfile

AllProcesses:CommunicaWon4.AlltoallreducWontoformaglobalsharedk-vlist

Checklist:1.  Theoverheadinallreduce2.  Thek,vstructure

Case Study II: H5Boss IO

Write

SubsetFile

PostAnalysis

Share

e.g.,Groups,Datasets

Page 39: Productivity and High Performance, Can we have both? An ...Anaconda includes h5py package ² H5py 2.6.0 ² Built-in hdf5 library, 1.8.17 ² Easy use with other packages ² No ...

-39-

² Whympi4py’sallreducecouldbeanissue?

1.Allreducevsallreduce

•  Lowercase:genericPythonobjects,send(),recv(),etc•  Upper-case:bufferlikeobject,Send(),Recv(),etc

2.pickling&unpickling

sender receiver

pickling unpickling

dispatching allocaWon packing allocaWon translaWon unpacking

direct

Cmpi

Case Study II: H5Boss IO

Page 40: Productivity and High Performance, Can we have both? An ...Anaconda includes h5py package ² H5py 2.6.0 ² Built-in hdf5 library, 1.8.17 ² Easy use with other packages ² No ...

-40-

K:•  b[‘3666/55159/599/coadd’]V:•  (([((WAVE, ‘<f4’), (FLUX, ‘<f4’), (IVAR, ‘<f4’),(AND_MAKS, ‘<i4’),

(OR_MASK, ‘<i4’), (‘WAVEDISP’, ‘<f4’), (SKY,‘<f4’), (MODEL,‘<f4’)]),•  (4619,) •  ‘/global/cscratch1/sd/jialin/h5boss/3666-55159.hdf5’)

Key:PathtoHDF5datasetValue:(type,shape,pathtofile)

Key:PathtoHDF5datasetValue:shape

K:(str)“3666/55159/599/coadd”V:(int)“4619”

²  Re-designthekey,valuepairtobebufferlike

Case Study II: H5Boss IO

Page 41: Productivity and High Performance, Can we have both? An ...Anaconda includes h5py package ² H5py 2.6.0 ² Built-in hdf5 library, 1.8.17 ² Easy use with other packages ² No ...

-41-

WithopWmized(k,v)structure,19Xfaster

Case Study II: H5Boss IO

Page 42: Productivity and High Performance, Can we have both? An ...Anaconda includes h5py package ² H5py 2.6.0 ² Built-in hdf5 library, 1.8.17 ² Easy use with other packages ² No ...

-42-

WithopWmized(k,v)structureWeakscalingto1.6millionfiberqueryand1600processes

Case Study II: H5Boss IO

Page 43: Productivity and High Performance, Can we have both? An ...Anaconda includes h5py package ² H5py 2.6.0 ² Built-in hdf5 library, 1.8.17 ² Easy use with other packages ² No ...

-43-

Ø  H5py: Productivity ²  Similar/simpler interface ²  .... ²  Seamlessly importable in notebook, …

Ø  H5py: Performance ²  H5py often reaches 90% of HDF5 performance in benchmarking ²  In practice, case by case:

²  Type consistency ²  Object vs. Buffer ²  …

Productivity --> Performance

Page 44: Productivity and High Performance, Can we have both? An ...Anaconda includes h5py package ² H5py 2.6.0 ² Built-in hdf5 library, 1.8.17 ² Easy use with other packages ² No ...

-44-

1. Optimal HDF5 file creation

2.25X

Choosethemostmodernformat[opWonal]

H5py other Best Practice

Page 45: Productivity and High Performance, Can we have both? An ...Anaconda includes h5py package ² H5py 2.6.0 ² Built-in hdf5 library, 1.8.17 ² Easy use with other packages ² No ...

-45-

2. Use low-level API in H5py

GetclosertotheHDF5Clibrary,finetuning

H5py other Best Practice

Page 46: Productivity and High Performance, Can we have both? An ...Anaconda includes h5py package ² H5py 2.6.0 ² Built-in hdf5 library, 1.8.17 ² Easy use with other packages ² No ...

H5py at NERSC

-46-

SerialH5py

Anacondaincludesh5pypackage²  H5py2.6.0²  Built-inhdf5library,1.8.17²  Easyusewithotherpackages²  Noparallelsupport

WorksonbothEdisonandCori

Page 47: Productivity and High Performance, Can we have both? An ...Anaconda includes h5py package ² H5py 2.6.0 ² Built-in hdf5 library, 1.8.17 ² Easy use with other packages ² No ...

H5py at NERSC

-47-

H5py-parallel@NERSCØ  H5py2.6.0Ø  Compiledwithcray-hdf5-parallel/1.8.16Ø Noconflictwithanaconda’sserialh5py

²  Importh5py(perfectlyfine)²  Canusetogetherwithanaconda

Ø Uptodatefeatures

RollinThomas

Page 48: Productivity and High Performance, Can we have both? An ...Anaconda includes h5py package ² H5py 2.6.0 ² Built-in hdf5 library, 1.8.17 ² Easy use with other packages ² No ...

-48-

High Performance H5py with Sample Codes h~p://www.nersc.gov/users/data-analyWcs/data-management/i-o-libraries/hdf5-2/h5py/

Thanks

H5py at NERSC