Open Earth Framework Dealing with file formats, data semantics, and other gotchas Dave Nadeau John...

33
Open Earth Framework Dealing with file formats, data semantics, and other gotchas Dave Nadeau John Moreland

Transcript of Open Earth Framework Dealing with file formats, data semantics, and other gotchas Dave Nadeau John...

Page 1: Open Earth Framework Dealing with file formats, data semantics, and other gotchas Dave Nadeau John Moreland.

Open Earth Framework

Dealing with file formats, data semantics, and other gotchas

Dave NadeauJohn Moreland

Page 2: Open Earth Framework Dealing with file formats, data semantics, and other gotchas Dave Nadeau John Moreland.

Want to...•Connect lots of tools together

• Data collection + processing + visualization

• No one tool does it all

•Data exchange conventions are needed

Page 3: Open Earth Framework Dealing with file formats, data semantics, and other gotchas Dave Nadeau John Moreland.

Want to...•Plot earthquakes as dots on a map

•Paste a gravity map atop a terrain

•Slice 3D tomography below a terrain

•Do this all at once

Page 4: Open Earth Framework Dealing with file formats, data semantics, and other gotchas Dave Nadeau John Moreland.

Have to...•Work with lots of data types

• Lists, grids, geometry, time series, hierarchical, overlapping, multichannel, ...

•Stitch together software

• Ideally use open source libraries

• But usually have to write new code too

Page 5: Open Earth Framework Dealing with file formats, data semantics, and other gotchas Dave Nadeau John Moreland.

Semantics & formats

•Define:

• Semantics are the meaning & structure of data

• File formats store it

•Both must be standardized

• So software & users can depend upon it

•But...

Page 6: Open Earth Framework Dealing with file formats, data semantics, and other gotchas Dave Nadeau John Moreland.

Lots of gotchas...A few of our pet peeves

Page 7: Open Earth Framework Dealing with file formats, data semantics, and other gotchas Dave Nadeau John Moreland.

Lack of standards•Sometimes no suitable standard

exists

•Have to resort to ad hoc standards

• Text files

• Custom software

• README’s and code comments/* * Column 1 = latitude (degrees) * Column 2 = longitude (radians) * Column 3 = depth (furlongs) * Column 4 = age (dog years) */

/* * Column 1 = latitude (degrees) * Column 2 = longitude (radians) * Column 3 = depth (furlongs) * Column 4 = age (dog years) */

Page 8: Open Earth Framework Dealing with file formats, data semantics, and other gotchas Dave Nadeau John Moreland.

•Often not enough semantics in files

• CSV has no coordinate space or standard columns

• GXF has no field units

• NetCDF has no geoscience conventions

•Still need README’s

Incomplete standards

Lat Lon Flork Yeem Snorf Wiffle Bloop23.5 43.1 -18.5 A37 $✔#! ☛☀☻ Ω☢☂23.2 44.8 -27.5 Ö8⅓ ❸✸✠ ✂☎✇ ✈℅©™24.1 45.7 -8.9 ß4½ §➥ξ ❃‡₪⊗ ∞

Lat Lon Flork Yeem Snorf Wiffle Bloop23.5 43.1 -18.5 A37 $✔#! ☛☀☻ Ω☢☂23.2 44.8 -27.5 Ö8⅓ ❸✸✠ ✂☎✇ ✈℅©™24.1 45.7 -8.9 ß4½ §➥ξ ❃‡₪⊗ ∞

Page 9: Open Earth Framework Dealing with file formats, data semantics, and other gotchas Dave Nadeau John Moreland.

Incomplete standards

•Usually missing provenance

• Author? Contact? Creation date?

• How collected and processed?

•Need to track these to understand source, value, and give credit

Author: Joe’s pizza and geodata emporium

Credit: Joe and his brother Zeek

Date: last wednesday after lunch

Methods: Counted Zeek’s paces (size 12 shoe)

Author: Joe’s pizza and geodata emporium

Credit: Joe and his brother Zeek

Date: last wednesday after lunch

Methods: Counted Zeek’s paces (size 12 shoe)

Page 10: Open Earth Framework Dealing with file formats, data semantics, and other gotchas Dave Nadeau John Moreland.

Proprietary standards

•Rarely publicly documented

• Few “official” ESRI format specs

• Mostly reverse engineered

•Can change at any time

• Format supports tool, not community

•Changes “encourage” buying new software

version 3.0 + = version 3.0.0.1

Page 11: Open Earth Framework Dealing with file formats, data semantics, and other gotchas Dave Nadeau John Moreland.

Misused standards•Improperly used features

• Odd field names and units

• Z scale/up/down assumptions != format’s defaults

• Non-descriptive titles, descriptions, & authors

•Missing optional features

• Missing field names, units, scales

• Missing coordinate system

Page 12: Open Earth Framework Dealing with file formats, data semantics, and other gotchas Dave Nadeau John Moreland.

Misused standards•GeoTIFF is misunderstood

• Stores geolocated data, not just an image

Bad Good

Page 13: Open Earth Framework Dealing with file formats, data semantics, and other gotchas Dave Nadeau John Moreland.

Complex standards•GeoSciML does “everything”

• Hard to use it just a little bit

Page 14: Open Earth Framework Dealing with file formats, data semantics, and other gotchas Dave Nadeau John Moreland.

Inefficient standards

•Text: 2x to 3x expansion of binary data

• 4 Gbytes becomes 8 to 12 Gbytes

•XML: another 2x to 3x expansion

• 4 Gbytes becomes 16 to 36 Gbytes

Binary XML text

Page 15: Open Earth Framework Dealing with file formats, data semantics, and other gotchas Dave Nadeau John Moreland.

Un-indexed standards

•Missing table of contents

•Needed to:

• Find data you want, skipping the rest

• Do so repeatedly and efficiently

• Do so when whole data set won’t fit in memory

Page 16: Open Earth Framework Dealing with file formats, data semantics, and other gotchas Dave Nadeau John Moreland.

Suggestions...A few of our thoughts

Page 17: Open Earth Framework Dealing with file formats, data semantics, and other gotchas Dave Nadeau John Moreland.

Use existing standards

•Lots to choose between

• NetCDF or HDF for lists and 2D/3D grids

• GeoTIFF for image overlays

•Use them if suitable

• Already documented

• Already debugged

• Often widely supported

Page 18: Open Earth Framework Dealing with file formats, data semantics, and other gotchas Dave Nadeau John Moreland.

Prefer open standards

•Free

•Community-driven

•No vendor lock-in

•Lots of software already

Page 19: Open Earth Framework Dealing with file formats, data semantics, and other gotchas Dave Nadeau John Moreland.

Follow conventions•Add all the metadata at creation time

• Too easy to forget to add it later

•No more README’s needed

• They’re easily lost or out of date

Page 20: Open Earth Framework Dealing with file formats, data semantics, and other gotchas Dave Nadeau John Moreland.

Create conventions•Make new conventions if necessary

• Standard names for fields

• “Height”, “Depth”, “Elevation”, or “Z”?

• Standard field units

• Meters or kilometers?

• Standard field use

• Positive Z up or down?

Page 21: Open Earth Framework Dealing with file formats, data semantics, and other gotchas Dave Nadeau John Moreland.

Simplify standards•Use subsets of standards

• GeoSciML packages

• NetCDF conventions

Page 22: Open Earth Framework Dealing with file formats, data semantics, and other gotchas Dave Nadeau John Moreland.

Use efficient standards

•Prefer binary

• Such as NetCDF, HDF, GeoTIFF

•Prefer those with a table of contents

• Such as NetCDF, HDF, GeoTIFF

Page 23: Open Earth Framework Dealing with file formats, data semantics, and other gotchas Dave Nadeau John Moreland.

What we’re doing...

•Developing a software framework for

•Data handling

•Processing

•Visualization

Page 24: Open Earth Framework Dealing with file formats, data semantics, and other gotchas Dave Nadeau John Moreland.

Open Earth Framework

•Open source, modular, portable

•Java, threaded, 3D accelerated

•Applications & construction kits

•Interactive & batch tools

•Integrates with other software

• Doesn’t assume one “right” way to do anything

Page 25: Open Earth Framework Dealing with file formats, data semantics, and other gotchas Dave Nadeau John Moreland.

Open Earth Framework

•Collection of construction kits (libraries)

•Few dependencies

•Combine with other software

•Build your own applications, batch tools, and web services

•Several pre-built applications

Page 26: Open Earth Framework Dealing with file formats, data semantics, and other gotchas Dave Nadeau John Moreland.

Open Earth Framework

•Data handling construction kits

• Standard & common formats

• Format conversion

• File “completion” to restore missing info

•Web service construction kits

• Standard & custom protocols

•Mix, match, and extend

Page 27: Open Earth Framework Dealing with file formats, data semantics, and other gotchas Dave Nadeau John Moreland.

Open Earth Framework

•Visualization construction kits

• 2D & 3D

• Lots of layers

• Common and new geoscience visual representations

• Control over colors, shapes, etc.

• Interactive

•Mix, match, and extend

Page 28: Open Earth Framework Dealing with file formats, data semantics, and other gotchas Dave Nadeau John Moreland.

Open Earth Framework

•User interface construction kits

• 3D canvases

• Control panels

• Common menus, toolbars, dialogs

•Mix, match, and extend

Page 29: Open Earth Framework Dealing with file formats, data semantics, and other gotchas Dave Nadeau John Moreland.

Open Earth Framework

• Terrains and overlays

• Data draped over terrain

• Color, shading, & transparency control

Page 30: Open Earth Framework Dealing with file formats, data semantics, and other gotchas Dave Nadeau John Moreland.

Open Earth Framework

•Tomography

• Sectioning planes & isosurfaces

• Color, shading, & transparency control

Page 31: Open Earth Framework Dealing with file formats, data semantics, and other gotchas Dave Nadeau John Moreland.

Open Earth Framework

•Dots, lines, and shapes

• Above, atop, or below terrain

• Color, size, & transparency control

Page 32: Open Earth Framework Dealing with file formats, data semantics, and other gotchas Dave Nadeau John Moreland.

Open Earth Framework

•All at once

Page 33: Open Earth Framework Dealing with file formats, data semantics, and other gotchas Dave Nadeau John Moreland.

Open Earth Framework

•Ongoing development

• Alpha release... soon?

•We want your input...

• Formats, vis techniques, processing, interaction, etc.