NUG Meeting 1 File and Data Conversion Jonathan Carter NERSC User Services [email protected]...

20
NUG Meeting 1 File and Data Conversion File and Data Conversion Jonathan Carter NERSC User Services [email protected] 510-486-7514

Transcript of NUG Meeting 1 File and Data Conversion Jonathan Carter NERSC User Services [email protected]...

Page 1: NUG Meeting 1 File and Data Conversion Jonathan Carter NERSC User Services jcarter@nersc.gov 510-486-7514.

NUG Meeting1

File and Data ConversionFile and Data Conversion

Jonathan CarterNERSC User Services [email protected]

510-486-7514

Page 2: NUG Meeting 1 File and Data Conversion Jonathan Carter NERSC User Services jcarter@nersc.gov 510-486-7514.

NUG Meeting2

IntroductionIntroduction

Converting file and data for use on the IBM SP IBM uses IEEE data representation Industry standard Fortran unformatted file

structure

Tools available on the Cray systems

Tools available on the IBM SP

Page 3: NUG Meeting 1 File and Data Conversion Jonathan Carter NERSC User Services jcarter@nersc.gov 510-486-7514.

NUG Meeting3

Demand for File ConversionDemand for File Conversion

Currently, CTSS text filesctou, rlib will be available on the IBM SP

After decommissioning the Cray Systems in October 2002Cray Fortran unformatted filesCray C binary files

Page 4: NUG Meeting 1 File and Data Conversion Jonathan Carter NERSC User Services jcarter@nersc.gov 510-486-7514.

NUG Meeting4

Tools on the Cray Systems - Tools on the Cray Systems - FFIOFFIO

Flexible File I/O - general system of specifying how data should be written or readCan be used without recompiling or linking

(Fortran)Can be changed at runtimeVarious layers available to convert both file

structure and dataControlled via the assign command

Page 5: NUG Meeting 1 File and Data Conversion Jonathan Carter NERSC User Services jcarter@nersc.gov 510-486-7514.

NUG Meeting5

assign Commandassign Command

Can specify how I/O is doneOn a Fortran unit basis: assign –F f77 u:10

On a filename basis: assign –F f77 f:filename

Common optionsClear assigns: assign -R See current assigns in effect: assign -V

Page 6: NUG Meeting 1 File and Data Conversion Jonathan Carter NERSC User Services jcarter@nersc.gov 510-486-7514.

NUG Meeting6

Fortran Unformatted Fortran Unformatted Sequential-access FilesSequential-access Files

Cray uses a vendor specific format called COS blocked, or simply blocked

IBM (and most Unix vendors) use f77 blocking Use –F f77 option to have the FFIO f77 blocking

layer used instead of the default COS blocking:assign –F f77 u:10

T3E already uses IEEE arithmetic, so –F f77 is sufficientNote that default real and integer data types on

the T3E are 64 bit SV1 data needs to be converted, so an IEEE

conversion layer is needed-N ieee performs basic conversionassign –F f77 -N ieee f:filename

Page 7: NUG Meeting 1 File and Data Conversion Jonathan Carter NERSC User Services jcarter@nersc.gov 510-486-7514.

NUG Meeting7

Fortran Unformatted Direct-Fortran Unformatted Direct-access Filesaccess Files

Files are not blocked on Cray or IBM

Data conversion layers can be used as in sequential-access files for the SV1 machines

assign -N ieee u:20

T3E files don’t need any conversion

Page 8: NUG Meeting 1 File and Data Conversion Jonathan Carter NERSC User Services jcarter@nersc.gov 510-486-7514.

NUG Meeting8

C Binary FilesC Binary Files

Files are not blocked on Cray or IBM

FFIO conversion layer not easy to use

Use library routines such as cry2cri

Page 9: NUG Meeting 1 File and Data Conversion Jonathan Carter NERSC User Services jcarter@nersc.gov 510-486-7514.

NUG Meeting9

Using FFIO to Convert a FileUsing FFIO to Convert a File

Isolate I/O statements for the file from program to make a simple conversion program

Pair each read with a write

Use assign to have all written data converted, or use data conversion routines

Page 10: NUG Meeting 1 File and Data Conversion Jonathan Carter NERSC User Services jcarter@nersc.gov 510-486-7514.

NUG Meeting10

Tools on the IBM SP - NCARU Tools on the IBM SP - NCARU LibraryLibrary

Library developed by the SCD at NCARRead COS blocked fileConvert Cray data to IEEE data

Does not use Fortran API, so program modification is requiredBasic calls are crayopen, crayread, crayrew, crayback, crayclose

Calls to crayread can convert data if record is composed of one data type only, otherwise user must handle explicitlyConversion routines are ctodpf, ctospf, ctospi

Cray Fortran I/O sometimes inserts padding, user must handle explicitly

Page 11: NUG Meeting 1 File and Data Conversion Jonathan Carter NERSC User Services jcarter@nersc.gov 510-486-7514.

NUG Meeting11

Using the NCARU LibraryUsing the NCARU Library

To use:

module load ncaru

xlf -o a.out b.f $NCARU

Limitations2GB limit for unblocked filesCurrently no 64 bit address space supportNot thread-safeNo support for 128 bit data

Page 12: NUG Meeting 1 File and Data Conversion Jonathan Carter NERSC User Services jcarter@nersc.gov 510-486-7514.

NUG Meeting12

Dealing with Different FilesDealing with Different Files

Open using blocked option to crayopen for Fortran unformatted sequential access, open with unblocked option for Fortran unformatted direct access

If written on the SV1 use conversion option on read, or call conversion routines directly

C binary files can be read by the unblocked I/O calls or by usual C I/O followed by data conversion routines

Page 13: NUG Meeting 1 File and Data Conversion Jonathan Carter NERSC User Services jcarter@nersc.gov 510-486-7514.

NUG Meeting13

Records with Mixed Data Records with Mixed Data TypesTypes

Read into a buffer and convert items one by onereal x(50)integer n(50)real*8 buffer(100)

! open in blocked modeifc = crayopen(‘filename’,10,0)! read record without convertingnwds = crayread(ifc,buffer,100,0)! convert datacall ctospf(buffer,x,50)call ctospi(buffer(51),n,50)

Page 14: NUG Meeting 1 File and Data Conversion Jonathan Carter NERSC User Services jcarter@nersc.gov 510-486-7514.

NUG Meeting14

Data PaddingData Padding

With Cray Fortran I/O, extra bytes are inserted into the user data.

In cases where padding occurs, bytes are inserted so that any datum of length 8 bytes is at a byte offset, which is measured from the beginning of the record, that is a multiple of 8 bytes. Then the end of the record is padded so that the whole record length is a multiple of 8.

Padding will only occur if you have used character variables that are not of lengths that are a multiple of 8 or have used real*4 or integer*4 data on the T3E (on the SV1 systems, 8 bytes are used).

Page 15: NUG Meeting 1 File and Data Conversion Jonathan Carter NERSC User Services jcarter@nersc.gov 510-486-7514.

NUG Meeting15

ExampleExample

A Fortran record is written on an SV1:

real a(50)

integer n(50)

character*17 label

write(50) n, a, label

The lengths of n, a, and label are 8 bytes, 8 bytes, and 17 bytes respectively. Within the Fortran record, n starts at offset 0, a at offset 400, and label at offset 800. The only padding that occurs is at the end of the record, where 7 bytes are added to make the total record length 816 bytes, which is a multiple of 8.

Page 16: NUG Meeting 1 File and Data Conversion Jonathan Carter NERSC User Services jcarter@nersc.gov 510-486-7514.

NUG Meeting16

ExampleExample

A Fortran record is written on an SV1:

real a(50)

integer n(50)

character*17 label

write(50) label, n, a

Without padding, the alignments are label at offset 0, a at offset 17, and n at offset 417. Since a has elements of length 8 bytes, it must be written at an offset that is a multiple of 8 bytes; therefore a pad of 7 bytes is inserted between the end of label and the beginning of a. In the record that is written to the file, the alignments are label at offset 0, a at offset 24, and n at offset 424.

Page 17: NUG Meeting 1 File and Data Conversion Jonathan Carter NERSC User Services jcarter@nersc.gov 510-486-7514.

NUG Meeting17

ExampleExample

A Fortran record is written on the T3E:

real a(40), b(40)integer*4 n(13), m(13)character*12 label

write(50) label, n, a, m, b

The data has lengths: label 12 bytes, n and m 52 bytes, and a and b both 320 bytes. Without padding, the alignments are label at offset 0, n at offset 12, a at offset 64, m at offset 384, and b at offset 436. a and b need to be at offsets that are a multiple of 8 bytes; the offset of a is already correct, but 4 bytes must be inserted before b, so that it starts at offset 440.

Page 18: NUG Meeting 1 File and Data Conversion Jonathan Carter NERSC User Services jcarter@nersc.gov 510-486-7514.

NUG Meeting18

crayconv Utilitycrayconv Utility

crayconv automatically converts files written on the SV1 to IBM compatible formatBasic Fortran data types onlySequential access unformatted files onlyPossible problem if compiler option -Onofastint used, or integer*8 explicitly declared and written-- Integers over 246 not correctly interpreted

Pad data not removedExtension to T3E data and direct access

unformatted files planned

Page 19: NUG Meeting 1 File and Data Conversion Jonathan Carter NERSC User Services jcarter@nersc.gov 510-486-7514.

NUG Meeting19

More InformationMore Information

http://hpcf.nersc.gov/computers/SP/ffio.html -by Mike Stewart

http://hpcf.nersc.gov/computers/crayretire.html

man ncaru

Page 20: NUG Meeting 1 File and Data Conversion Jonathan Carter NERSC User Services jcarter@nersc.gov 510-486-7514.

NUG Meeting20