Oceangraphic data formats

download Oceangraphic data formats

If you can't read please download the document

  • date post

  • Category


  • view

  • download


Embed Size (px)

Transcript of Oceangraphic data formats

  • 1. 1. Data formats Large heterogeneity in data formats Data format = the physical or electronic shape in which data is stored Piece of paper with hand written text = data format However focuss here: Electronic data formats Commonly used data formats

2. 1. Data formats Why use which format? Historical reasons: Old data mostly in text based list formats Software and technology is accompagning certain formats Example: xml is only being used after its invention Other reasons: Depending on data generator: Machine generated data (mostly ascii format) Worldwide agreed formats for certain types of data Facilitate exchange of data packages 3. 1. Data formats Exchange of data formats Most formats are exchangeable into eachother Mostly top down: Relational structure spreadsheet txt-based 4. Data formats: different classifications Physical types: ASCII BINARY Format types : 15 often used data types 5. Dataformat ascii format (1) Ascii: American Standard Code for Information Interchange ASCII data are encoded so that the human reader can see and understand the values, because they are displayed as normal integers and real numbers. This means that the actual digital file contains print and display information for the human-readable characters, not the actual values of the data. The benefit of using ASCII data is that the user can see, understand and edit the file contents directly; the downside of using ASCII is that the data files are much larger. 6. Dataformat ascii format (2) Combination of letters and numbers Readable by any computer No complex software required 7. Dataformat Binary data Binary data are numeric data whose values are expressed in bits and bytes, instead of the human-readable ascii code. Number values can be stored in much smaller files: be read more rapidly (by machines) the method for large datafiles, especially gridded data. To use binary data: not so easy interpreting steps are required 8. Dataformat Binary data Contents and structure of binary files may vary: Type of data stored: Bit (0-1) 1 bit Byte (0-255) 8 bits Short integer (-32,768 32,767) 16 bits Interpreter translator is required 9. Data formats 15 common used types Text files Ascii/Binary Spreadsheets Relational structures Others Images Maps 10. 1 & 2 : Auxiliary Formats Auxiliary Formats - Information about data files; these are not really "data" files, but are included here for completeness 1 Header Formats - Information about the format, location or geo-referencing; usually very short 2 Metadata Formats - see also metadata 11. 3. Document Digital data in proprietary formats (or sometimes just simple ASCII) designed for visual inspection, but not for data processing ASCII ,MS Word DOC , WordPerfect , HTML , PDF - Adobe Acrobat , PS/EPS PostScript/Encapsulated PS , Desktop publisher programs - all proprietary ... 12. 3. Document Advantages: Very polished appearance; powerful editors available; compatibility with other major document editing software. Disadvantages: (hard to use in data mining) ASCII text must be extracted for the sections of interest. Embedded images must be converted to more easily used GIF, JPG or BMP formats. PDF and PS/EPS very tricky to convert to other formats. 13. 4. Gridded data File formats: ASCII : example - SURFER (*.GRD) - with "DSAA" header lines Binary : Plain binary grids: byte, short integer, long integer, single-precision or double-precision; with or without ASCII Header Files (see earlier) 14. 4. Gridded data Creation of the Grid: The gridded data file is created from scattered data points in the real world, by a process called "gridding." mathematical methods to create the grid algorithms are available to examine data points 15. 4. Gridded data Gridded data files commonly contain more than a single grid Data mostly avaiable for different parameters Using sequences of XYZ dimensions and parameter dimensions There is no "correct" way to construct files of multiple data grids It is extremely important to document the sequence in which the dimensions (XYZ location, time, parameters) are "read." Vector Grids: To represent vectors (literally arrows showing the direction of flow) in ocean and meteorological datasets two methods have been devised: provide the U and V components of the vector, or provide the direction and magnitude of the arrow. Both of these methods have been adapted to grids, for vector results from gridded models for instance. The grids can be contained in separate files, or sequentially listed in the same file. 16. 4. Gridded data Advantages: Saves storage space XYZ storage which requires 3 data per gridpoint. Binary takes much less space than ASCII. Reading the data is usually a very straightforward creation of a DO LOOP routine (or nest of routines) that follows the order in which the data were stored Disadvantages: Binary data are not liked by those who want "to see" their data at all times. 17. 5. Hard copy Older, hard copy datasets necessary evil (pre-60s) ocean data has never been digitized These datasets range from technical reports to handwritten log sheets and lab sheets. Reports usually contain enough information to be successfully digitized Manuscript holdings often require tedious collation and cross-referencing in order to assemble all the needed parts. Datasets with missing critical parts (e.g. station data) exist, as well as analysis and synthesis reports containing statistics, graphs and tables, but no data. 18. 5. Hard copy Examples: Lab sheets Journal articles Technical Reports 80-character punch cards - Included here because many locations lack the facilities to read them Hand-annotated charts/graphs Specimen identification cards Diaries Ship logs 19. 5. Hard copy Risk of data loss: Rule in many data centres: No paper data should be mailed or shipped unless photocopied. All ORIGINAL paper data should be gathered by the data manager immediately after the relevant cruise and grouped into named folios whose contents are indexed.All paper data should be submitted to supervised digitization as soon as possible. Example: heritage libraryMetadata of hard copy data: should fully describe the folios numbers of pages Color of frontpage Other identifying characteristics Advantages: They still exist. Disadvantages: Cannot be used in modern digital analysis. Digital capture is very labor intensive. Access is a tricky political issue in some institutions.Compatibilities: Published papers in good condition can be scanned and converted to ASCII text with many commercial packages. (OCR techniques) Controll afterwards . 20. 5. Hard copy From hard copy to digital copy ... Technique used depends on aim and type of data Often just transformed in document format If to other formats often man-driven In many cases going back to hard copy only way to work (due to lack of metadata, file versions, ...) 21. 6. Simple Images Graphics file without earth mapping information Interpretation is purely man-based Very variable Many file formats: TIFF, GIF, JPG, BMP RAW versus compressed RAW: all image information is stored without compression Compressed: JPG/GIF information is compressed by extrapolation, reducing colors smaller files but loss of information 22. 6. Simple images Some images have added artistic borders outside the geographic grid: that obscure the pixel-tocoordinates relationship Advantages Quick visualization of data that may have originally been extremely complex. Subjective analyses that do not require positional accuracy. Disadvantages Quantification difficult; synthesis nearly impossible unless with pictures derived in exactly the same fashion Compatibilities Nearly all graphic picture formats are interchangeable with editor programs. 23. 7. Geo-referenced images Graphics file, with ancillary mapping information, showing 1 or more parameters of the earth's system in a rectilinear grid, usually derived by processing and decimation of very high-density information from aerial or space sensors. Coordinates of pixel correspond to XY geocoordinate. Color of pixel represents a parameter 24. 7. Geo-referenced images TIF files can be made into Geo-Referenced Image files by the addition of internal geographic tags, which require exact knowledge of the image dimensions and its proper location on the earth's surface. JPG, TIF and BMP can be made into Geo-Referenced Image formats by the addition of header "world files," which require exact knowledge of the image dimensions and its proper location on the earth's surface. A world file is a simple ASCII file with the following contents: X-pixel size (delta X) Rotation term for row (normally zero) Rotation term for column (normally zero) Y-pixel size (delta Y) X-coordinate of center of upper left pixel Y-coordinate of center of upper left pixel World files for TIF have the extension TFW; world files for JPG have the extension JPW; world files for BMP have the extension BPW. 25. 7. Geo-referenced images 26. 8-9-10. Mapping data Mapping - Mapping data consisting of digital representations of individual objects (points, lines, polygons, etc.) 8 XY- Mapping line objects, in X (usually longitude) and Y (usually latitude) coordinates only 9 List- Mapping objects (points, lines, symbols, text, etc.) without topology or descriptive attributes 10 Geographic Information System (GIS) - Mapping objects (points, lines, polygons, etc.) on the earth incorporated into robust data assemblages that contain additional detailed information about the properties and topologies of the objects. [NOTE: Most GIS systems can also accommodate gridded, geo-referenced image, relational and spreadsheet formats.] 27. 8. XY data Description: simplest kind of geographic information: lines specified by their ordered X and Y coordinates. countr