Scaling Up Parallel I/O on the SP David Skinner, NERSC Division, Berkeley Lab

21
NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER 1 Scaling Up Parallel I/O on the SP David Skinner, NERSC Division, Berkeley Lab

description

Scaling Up Parallel I/O on the SP David Skinner, NERSC Division, Berkeley Lab. Motivation. NERSC uses GPFS for $HOME and $SCRATCH Local disk filesystems on seaborg (/tmp) are tiny Growing data sizes and concurrencies often outpace I/O methodologies. Seaborg.nersc.gov. - PowerPoint PPT Presentation

Transcript of Scaling Up Parallel I/O on the SP David Skinner, NERSC Division, Berkeley Lab

Page 1: Scaling Up Parallel I/O on the SP David Skinner, NERSC Division, Berkeley Lab

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

1

Scaling Up Parallel I/O on the SP

David Skinner, NERSC Division, Berkeley Lab

Page 2: Scaling Up Parallel I/O on the SP David Skinner, NERSC Division, Berkeley Lab

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

2

Motivation

• NERSC uses GPFS for $HOME and $SCRATCH

• Local disk filesystems on seaborg (/tmp) are tiny

• Growing data sizes and concurrencies often outpace I/O methodologies

Page 3: Scaling Up Parallel I/O on the SP David Skinner, NERSC Division, Berkeley Lab

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

3

Seaborg.nersc.gov

Page 4: Scaling Up Parallel I/O on the SP David Skinner, NERSC Division, Berkeley Lab

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

4

Case Study: Data Intensive Computing at NERSC

• Binary black hole collisions • Finite differencing on a

1024x768x768x200 grid • Run on 64 NH2 nodes with

32GB RAM (2 TB total)• Need to save regular snapshots

of full grid

The first full 3D calculation of inward spiraling black holes done at NERSC by

Ed Seidel, Gabrielle Allen, Denis Pollney, and Peter Diener

Scientific American April 2002

Page 5: Scaling Up Parallel I/O on the SP David Skinner, NERSC Division, Berkeley Lab

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

5

Problems

• The binary black hole collision uses a modified version of the Cactus code to solve Einstein’s equations.

It’s choices for I/O are serial and MPI-I/O• CPU utilization suffers as time is lost to I/O• Variation in write times can be severe

Time to write 100GB

0100

200300

400500

1 2 3 4iteration

Tim

e (s

)

Page 6: Scaling Up Parallel I/O on the SP David Skinner, NERSC Division, Berkeley Lab

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

6

Finding solutions

• Data pattern is a common one

• Survey strategies to determine the rate and variation in rate

Page 7: Scaling Up Parallel I/O on the SP David Skinner, NERSC Division, Berkeley Lab

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

7

Page 8: Scaling Up Parallel I/O on the SP David Skinner, NERSC Division, Berkeley Lab

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

8

Parallel I/O Strategies

Page 9: Scaling Up Parallel I/O on the SP David Skinner, NERSC Division, Berkeley Lab

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

9

Multiple File I/O

if(private_dir) rank_dir(1,rank); fp=fopen(fname_r,"w"); fwrite(data,nbyte,1,fp); fclose(fp); if(private_dir) rank_dir(0,rank); MPI_Barrier(MPI_COMM_WORLD);

Page 10: Scaling Up Parallel I/O on the SP David Skinner, NERSC Division, Berkeley Lab

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

10

Single File I/O

fd=open(fname,O_CREAT|O_RDWR, S_IRUSR);lseek(fd,(off_t)(rank*nbyte)-1,SEEK_SET); write(fd,data,1); close(fd);

Page 11: Scaling Up Parallel I/O on the SP David Skinner, NERSC Division, Berkeley Lab

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

11

MPI-I/O

MPI_Info_set(mpiio_file_hints, MPIIO_FILE_HINT0);MPI_File_open(MPI_COMM_WORLD, fname, MPI_MODE_CREATE | MPI_MODE_RDWR, mpiio_file_hints, &fh);MPI_File_set_view(fh, (off_t)rank*(off_t)nbyte, MPI_DOUBLE, MPI_DOUBLE, "native", mpiio_file_hints);MPI_File_write_all(fh, data, ndata, MPI_DOUBLE, &status);MPI_File_close(&fh);

Page 12: Scaling Up Parallel I/O on the SP David Skinner, NERSC Division, Berkeley Lab

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

12

Results

Page 13: Scaling Up Parallel I/O on the SP David Skinner, NERSC Division, Berkeley Lab

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

13

Scaling of single file I/O

Page 14: Scaling Up Parallel I/O on the SP David Skinner, NERSC Division, Berkeley Lab

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

14

Scaling of multiple file and MPI I/O

Page 15: Scaling Up Parallel I/O on the SP David Skinner, NERSC Division, Berkeley Lab

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

15

Large block I/O

• MPI I/O on the SP includes the file hint IBM_largeblock_io

• IBM_largeblock_io=true used throughout, default values show large variation

• IBM_largeblock_io=true also turns off data shipping

Page 16: Scaling Up Parallel I/O on the SP David Skinner, NERSC Division, Berkeley Lab

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

16

Large block I/O = false

• MPI on the SP includes the file hint IBM_largeblock_io

• Except above IBM_largeblock_io=true used throughout• IBM_largeblock_io=true also turns off data shipping

Page 17: Scaling Up Parallel I/O on the SP David Skinner, NERSC Division, Berkeley Lab

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

17

Bottlenecks to scaling

• Single file I/O has a tendency to serialize• Scaling up with multiple files create filesystem problems • Akin to data shipping consider the intermediate case

Page 18: Scaling Up Parallel I/O on the SP David Skinner, NERSC Division, Berkeley Lab

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

18

Parallel IO with SMP aggregation (32 tasks)

Page 19: Scaling Up Parallel I/O on the SP David Skinner, NERSC Division, Berkeley Lab

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

19

Parallel IO with SMP aggregation (512 tasks)

Page 20: Scaling Up Parallel I/O on the SP David Skinner, NERSC Division, Berkeley Lab

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

20

Summary

2048

1024

512

256

128

64

32

16

1MB

10MB

100MB

1GB

10G

100G

Serial

Multiple File

Multiple File mod nMPI IO

MPI IO collective

Page 21: Scaling Up Parallel I/O on the SP David Skinner, NERSC Division, Berkeley Lab

NATIONAL ENERGY RESEARCH SCIENTIFIC COMPUTING CENTER

21

Future Work

• Testing NERSC port of NetCDF to MPI-I/O

• Comparison with Linux/Intel GPFS NERSC/LBL Alvarez Cluster 84 2way SMP Pentium Nodes Myrinet 2000 Fiber Optic Interconnect

• Testing GUPFS technologies as they become available