Open Earth Framework Dealing with file formats, data semantics, and other gotchas Dave Nadeau John...
-
Upload
allison-kelly -
Category
Documents
-
view
215 -
download
0
Transcript of Open Earth Framework Dealing with file formats, data semantics, and other gotchas Dave Nadeau John...
Open Earth Framework
Dealing with file formats, data semantics, and other gotchas
Dave NadeauJohn Moreland
Want to...•Connect lots of tools together
• Data collection + processing + visualization
• No one tool does it all
•Data exchange conventions are needed
Want to...•Plot earthquakes as dots on a map
•Paste a gravity map atop a terrain
•Slice 3D tomography below a terrain
•Do this all at once
Have to...•Work with lots of data types
• Lists, grids, geometry, time series, hierarchical, overlapping, multichannel, ...
•Stitch together software
• Ideally use open source libraries
• But usually have to write new code too
Semantics & formats
•Define:
• Semantics are the meaning & structure of data
• File formats store it
•Both must be standardized
• So software & users can depend upon it
•But...
Lots of gotchas...A few of our pet peeves
Lack of standards•Sometimes no suitable standard
exists
•Have to resort to ad hoc standards
• Text files
• Custom software
• README’s and code comments/* * Column 1 = latitude (degrees) * Column 2 = longitude (radians) * Column 3 = depth (furlongs) * Column 4 = age (dog years) */
/* * Column 1 = latitude (degrees) * Column 2 = longitude (radians) * Column 3 = depth (furlongs) * Column 4 = age (dog years) */
•Often not enough semantics in files
• CSV has no coordinate space or standard columns
• GXF has no field units
• NetCDF has no geoscience conventions
•Still need README’s
Incomplete standards
Lat Lon Flork Yeem Snorf Wiffle Bloop23.5 43.1 -18.5 A37 $✔#! ☛☀☻ Ω☢☂23.2 44.8 -27.5 Ö8⅓ ❸✸✠ ✂☎✇ ✈℅©™24.1 45.7 -8.9 ß4½ §➥ξ ❃‡₪⊗ ∞
Lat Lon Flork Yeem Snorf Wiffle Bloop23.5 43.1 -18.5 A37 $✔#! ☛☀☻ Ω☢☂23.2 44.8 -27.5 Ö8⅓ ❸✸✠ ✂☎✇ ✈℅©™24.1 45.7 -8.9 ß4½ §➥ξ ❃‡₪⊗ ∞
Incomplete standards
•Usually missing provenance
• Author? Contact? Creation date?
• How collected and processed?
•Need to track these to understand source, value, and give credit
Author: Joe’s pizza and geodata emporium
Credit: Joe and his brother Zeek
Date: last wednesday after lunch
Methods: Counted Zeek’s paces (size 12 shoe)
Author: Joe’s pizza and geodata emporium
Credit: Joe and his brother Zeek
Date: last wednesday after lunch
Methods: Counted Zeek’s paces (size 12 shoe)
Proprietary standards
•Rarely publicly documented
• Few “official” ESRI format specs
• Mostly reverse engineered
•Can change at any time
• Format supports tool, not community
•Changes “encourage” buying new software
version 3.0 + = version 3.0.0.1
Misused standards•Improperly used features
• Odd field names and units
• Z scale/up/down assumptions != format’s defaults
• Non-descriptive titles, descriptions, & authors
•Missing optional features
• Missing field names, units, scales
• Missing coordinate system
Misused standards•GeoTIFF is misunderstood
• Stores geolocated data, not just an image
Bad Good
Complex standards•GeoSciML does “everything”
• Hard to use it just a little bit
Inefficient standards
•Text: 2x to 3x expansion of binary data
• 4 Gbytes becomes 8 to 12 Gbytes
•XML: another 2x to 3x expansion
• 4 Gbytes becomes 16 to 36 Gbytes
Binary XML text
Un-indexed standards
•Missing table of contents
•Needed to:
• Find data you want, skipping the rest
• Do so repeatedly and efficiently
• Do so when whole data set won’t fit in memory
Suggestions...A few of our thoughts
Use existing standards
•Lots to choose between
• NetCDF or HDF for lists and 2D/3D grids
• GeoTIFF for image overlays
•Use them if suitable
• Already documented
• Already debugged
• Often widely supported
Prefer open standards
•Free
•Community-driven
•No vendor lock-in
•Lots of software already
Follow conventions•Add all the metadata at creation time
• Too easy to forget to add it later
•No more README’s needed
• They’re easily lost or out of date
Create conventions•Make new conventions if necessary
• Standard names for fields
• “Height”, “Depth”, “Elevation”, or “Z”?
• Standard field units
• Meters or kilometers?
• Standard field use
• Positive Z up or down?
Simplify standards•Use subsets of standards
• GeoSciML packages
• NetCDF conventions
Use efficient standards
•Prefer binary
• Such as NetCDF, HDF, GeoTIFF
•Prefer those with a table of contents
• Such as NetCDF, HDF, GeoTIFF
What we’re doing...
•Developing a software framework for
•Data handling
•Processing
•Visualization
Open Earth Framework
•Open source, modular, portable
•Java, threaded, 3D accelerated
•Applications & construction kits
•Interactive & batch tools
•Integrates with other software
• Doesn’t assume one “right” way to do anything
Open Earth Framework
•Collection of construction kits (libraries)
•Few dependencies
•Combine with other software
•Build your own applications, batch tools, and web services
•Several pre-built applications
Open Earth Framework
•Data handling construction kits
• Standard & common formats
• Format conversion
• File “completion” to restore missing info
•Web service construction kits
• Standard & custom protocols
•Mix, match, and extend
Open Earth Framework
•Visualization construction kits
• 2D & 3D
• Lots of layers
• Common and new geoscience visual representations
• Control over colors, shapes, etc.
• Interactive
•Mix, match, and extend
Open Earth Framework
•User interface construction kits
• 3D canvases
• Control panels
• Common menus, toolbars, dialogs
•Mix, match, and extend
Open Earth Framework
• Terrains and overlays
• Data draped over terrain
• Color, shading, & transparency control
Open Earth Framework
•Tomography
• Sectioning planes & isosurfaces
• Color, shading, & transparency control
Open Earth Framework
•Dots, lines, and shapes
• Above, atop, or below terrain
• Color, size, & transparency control
Open Earth Framework
•All at once
Open Earth Framework
•Ongoing development
• Alpha release... soon?
•We want your input...
• Formats, vis techniques, processing, interaction, etc.