Post on 01-Apr-2020
IBM ATS Deep Computing
© 2007 IBM Corporation
High Performance IO HPC Workshop – University of KentuckyMay 9, 2007 – May 10, 2007
Andrew Komornicki, Ph. D.Balaji Veeraraghavan, Ph. D.
IBM ATS Deep Computing
© 2007 IBM Corporation
Agenda
Introduction
General IO performance
Results of some small tests.
Modular IO libraries, Linux and AIX
IBM ATS Deep Computing
© 2007 IBM Corporation
I/O Optimization
Analyze the IO pattern
Determine optimization method
Optimize in user space
Minimize source code changes
Possibly relink with libtkio.so
IBM ATS Deep Computing
© 2007 IBM Corporation
General I/O Performance
C: Do not use fopen(), fread(), or fwrite(); • These are inefficient due to small (4KB) IO blocks and extra
memory copies.Use instead: • POSIX open(), read(), write() • Direct (raw) IO will eliminate an additional memory copy
FORTRAN: Use unformatted IO
IBM ATS Deep Computing
© 2007 IBM Corporation
Asynchronous IO, an example
Non Blocking IO aio_read(), aio_write(), aio_return();
Completion Notification Polling with aio_error(); Block until complete with aio_suspend():
Cancellation of IO requests aio_cancel();
Large File enabled Removes the 2GB file size limitation
POSIX conforming
IBM ATS Deep Computing
© 2007 IBM Corporation
Results of Bonnie IO test
Run on Blade system in San Mateo Lab
System Memory, 5 Gbytes
File systems, ext2, and ext3
All tests done in four stages: Writing with putc()...done
Rewriting...done
Writing intelligently...done
Reading with getc()...done
Reading intelligently... done
IBM ATS Deep Computing
© 2007 IBM Corporation
Results of Bonnie IO test, Block IO performance
Size (MB) Write (Kbytes/sec) Read(Kbytes/sec)
__________________________________________
2000 841,524 2,233,282
4000 83,237 1,658,013
8000 56,599 50,974
16000 49,656 50,677
IBM ATS Deep Computing
© 2007 IBM Corporation
Results of Bonnie IO test
Results for ext2 file system, time in seconds
Size (MB) User System Elapsed
_________________________________
2000 48.7 10.8 84.3
4000 97.6 23.9 252.6
8000 194.9 56.4 1009.2
16000 388.4 111.8 2088.3
IBM ATS Deep Computing
© 2007 IBM Corporation
Results of Bonnie IO test
Results for ext3 file system, time in seconds
Size (MB) User System Elapsed
______________________________________
2000 48.7 19.2 96.3
4000 97.7 45.5 265.8
8000 194.6 90.1 1016.9
16000 396.9 201.8 2058.3
IBM ATS Deep Computing
© 2007 IBM Corporation
Modular I/O (MIO)Modular I/O (MIO)
Familiar and flexible runtime interfaceMIO modulesmiotracepf
MIO available on both Linux and AIX
IBM ATS Deep Computing
© 2007 IBM Corporation
MIO user code interface
open MIO_open
read MIO_read
write MIO_write
close MIO_close
lseek MIO_lseek
fcntl MIO_fcntl
ftruncate MIO_ftruncate
IBM ATS Deep Computing
© 2007 IBM Corporation
MIO run time interfaceMIO run time interface
MIO_STATS="file name"MIO_FILES=" *.dat* [trace|pf ] *.inp [aix]"MIO_DEBUG="ALL"MIO_DEFAULTS="trace/mbytes , pf/cache=10m“
IBM ATS Deep Computing
© 2007 IBM Corporation
trace moduletrace modulesummary of file activitybinary events filelow cpu overheadtypical options
/stats/mbytes /gbytes /tbytes/events=mio.evt
IBM ATS Deep Computing
© 2007 IBM Corporation
pf module
User selectable cache size
User selectable page size
User selectable prefetch depth
Direct or system buffered IO
Global or private cache
Usage summary
IBM ATS Deep Computing
© 2007 IBM Corporation
pf modulepf moduledetects sequential I/Ouser memory bufferingoptions
/global/cache_size=10m/page_size=1m/prefetch=1/stride=1/direct/stats
IBM ATS Deep Computing
© 2007 IBM Corporation
Relink with libtkio.a
libtkio.a has shared object memberstkio.so 32 bit and 64 bit
Entry points for• open,open64,close,read,write,lseek,lseek64• fcntl,ffinfo,fstat,fstat64,fstatfs,fsync• ftruncate,ftruncate64• unlink,aio_...
IBM ATS Deep Computing
© 2007 IBM Corporation
Default tkio behavior
Uses dlopen and dlsym for runtime linking
……libc(shr.o) fsyncfsynclibc(shr.o) lseek64lseek64libc(shr.o) writewritelibc(shr.o) readreadlibc(shr.o) closecloselibc(shr.o) open64open64
callstkio entry
IBM ATS Deep Computing
© 2007 IBM Corporation
tkio runtime interface
setenv TKIO_ALTLIB so_name/print/abort
export TKIO_ALTLIB=so_name/print/abort
so_name is name of shared library• Either name.so or libname.a(name.so)
tkio calls function in so_name that returns a structure filled with I/O entry points to replace default entry points
/print option outputs a print to stderr indicating success of load
/abort issues exit(-1) if load is not successfull
IBM ATS Deep Computing
© 2007 IBM Corporation
tkio using MIOsetenv TKIO_ALTLIB get_mio_ptrs_64.so
…libmio(mio.o) MIO_fsyncFsynclibmio(mio.o) MIO_lseek64Lseek64libmio(mio.o) MIO_writeWritelibmio(mio.o) MIO_readReadlibmio(mio.o) MIO_closeCloselibmio(mio.o) MIO_open64Open64
Calls tkio entry
IBM ATS Deep Computing
© 2007 IBM Corporation
kernel
Application libclibtkio
Fortran I/O
Demonstration only
open64writereadlseek64close
->open64->write->read->lseek64->close
stdiofopenfrwritefreadfclose
libmio
->MIO_open64->MIO_write->MIO_read->MIO_lseek64->MIO_close
X
IBM ATS Deep Computing
© 2007 IBM Corporation
kernel
libclibtkio
open64writereadlseek64close
->open64->write->read->lseek64->close
libmio
->MIO_open64->MIO_write->MIO_read->MIO_lseek64->MIO_close t
race
pf
aix
IBM ATS Deep Computing
© 2007 IBM Corporation
System buffered Data MovementSystem buffered Data Movement
user space
kernel
256kb
system buffers
MIO space
pf cached Data Movementpf cached Data Movement
user space
kernel
256kb
5 x 2mb
system buffers
MIO space
O_DIRECT Data MovementO_DIRECT Data Movement
user space
kernel
O_DIRECT
256kb
5 x 2mb
system buffers
MIO space
Asynchronous Data MovementAsynchronous Data Movement
user space
kernel
O_DIRECT
256kb
5 x 2mb
system buffers
MIO space
IBM ATS Deep Computing
© 2007 IBM Corporation
Trace close : program <-> pf : /bmwfs/cdh108.T20536_13.SCR300 :(281946/2162.61)=130.37 mbytes/s
current size=0 max_size=16277mode =0777 sector size=4096oflags =0x302=RDWR CREAT TRUNCopen 1 0.01write 478193 462.10 59774 59774 131072 131072read 1777376 1700.48 222172 222172 131072 131072seek 911572 2.83fcntl 3 0.00trunc 16 0.40close 1 0.03size 127787
MSC.NASTRANMSC.NASTRANtracetrace output from output from program <program <-->pf>pf
Min/MaxRequest sizein bytes
Mbytes requestedand Mbytes delivered
Number ofoccurances
IBM ATS Deep Computing
© 2007 IBM Corporation
Trace close : pf <-> aix : /bmwfs/cdh108.T20536_13.SCR300 : (276645/1460.73)=189.39 mbytes/s
current size=0 max_size=16276mode =0777 sector size=4096oflags =0x8000302=RDWR CREAT TRUNC DIRECTopen 1 0.01write 4382 154.86 684 684 131072 2097152awrite 33390 1.42 58491 58491 131072 2097152suspend 33390 240.00 242.27 mbytes/sread 5178 272.71 10354 10354 1048576 2097152aread 103560 5.70 207115 207115 524288 2097152suspend 103560 786.04 261.59 mbytes/sseek 136950 0.00fcntl 3 0.00trunc 16 0.40close 1 0.00size 11013pages 138477
MSC.NASTRAN MSC.NASTRAN tracetrace outputoutput
IBM ATS Deep Computing
© 2007 IBM Corporation
pf close for /bmwfs/cdh108.T20536_13.SCR300global cache 0: 150 pages of 2097152 bytes29739/29749 pages not preread for write138316/139754 prefetches : prefetch=3
29576 write behinds478193 writes1777376 reads
page writes 37772/33124mbytes transferredprogram --> 59774 --> pf --> 59176 --> aixprogram <-- 222172 <-- pf <-- 217469 <-- aix
MSC.NASTRAN MSC.NASTRAN pfpf outputoutput
IBM ATS Deep Computing
© 2007 IBM Corporation
time ( seconds )
file position ( bytes )
DataView file activity plotDataView file activity plot
IBM ATS Deep Computing
© 2007 IBM Corporation
time ( seconds )
file position ( bytes )
DataView file activity plotDataView file activity plot
IBM ATS Deep Computing
© 2007 IBM Corporation
time ( seconds )
file position ( bytes )
suspend time
hidden time
queuing time
Asynchronous I/O plottingAsynchronous I/O plotting
IBM ATS Deep Computing
© 2007 IBM Corporation
time ( seconds )
file position ( bytes )
cache page activitycache page activity
IBM ATS Deep Computing
© 2007 IBM Corporation
MSC.Nastran performance gainsMSC.Nastran performance gains
16 cpu 32GB NH2 node 2.2M dof, 767GB I/O, 8 copies2GB memory per copy
114MB/sec 198MB/sec
8 SSA, 16 loops, 4 disk/loop
IBM ATS Deep Computing
© 2007 IBM Corporation
MIO Summary
Demonstrated performance gains
Simple to implement
Flexible run time interface
Delivered as a shared object library
Contact: bauerj@us.ibm.com