Sharing and publishing data using CUAHSI HIS
description
Transcript of Sharing and publishing data using CUAHSI HIS
Sharing and publishing data using CUAHSI HIS
Outline
• HIS data publication system
• WaterML and WaterOneFlow web services
• Observations data model (ODM)
• Data loading
• Data editing and quality control
• Controlled vocabularies
• HIS central registration and tagging
Base StationComputer(s)
Telemetry Network
Sensors
Query, Visualize, and Edit data using ODM Tools
Excel Text
ODMDatabase
ODM Data
Loader
Streaming Data
Loader
GetSitesGetSiteInfoGetVariableInfoGetValues
WaterOneFlowWeb Service
WaterML
DiscoveryHydroseek
AccessAnalysis
GISMatlabSplus
RIDL
JavaC++VB
Water Metadata Catalog
Harvester
Service Registry Hydrotagger
HIS Central
HydroExcelHydroGetHydroLink
HydroObjects
ODM
ODM
Contribute your ODM
HIS Data Publication System
Steps in publishing data1. Establish an HIS Server
2. Load observations into an ODM database
3. Provide access to data through web services (http://<your-server>/<your-network>/cuahsi_1_0.asmx?WSDL)
4. Index the resulting water data service at HIS Central (http://hiscentral.cuahsi.org)
Establishing an HIS Server• Windows server platform
• Base Software: Microsoft SQL and ArcGIS Server
• HIS Server applications
– WaterOneFlow web services
– ODM + tools
– DASH
• HIS Data
http://his.cuahsi.org/hisserver.html
Load Observations into an ODM Database
Soil moisture
data
Streamflow
Flux towerdata
Groundwaterlevels
Water Quality
Precipitation& Climate ODM
Outline
• HIS data publication system
• WaterML and WaterOneFlow web services
• Observations data model (ODM)
• Data loading
• Data editing and quality control
• Controlled vocabularies
• HIS central registration and tagging
WaterML and WaterOneFlow
Locations
Variables
Time
GetSiteInfoGetVariableInfoGetValues
WaterOneFlowWeb Service
Client
TCEQ
UTUSGS
DataRepositories
Data
DataData
EXTRACTTRANSFORMLOAD
WaterML
WaterML is an XML language for communicating water dataWaterOneFlow is a set of web services based on WaterML
Slide from David Valentine
Web ServicesLibrary
Web Application: Data Portal
Your application• Excel, ArcGIS, Matlab• Fortran, C/C++, Visual Basic• Hydrologic model• …………….
Your operating system• Windows, Unix, Linux, Mac
Internet Simple Object Access Protocol
WaterOneFlow Web Services
Slide from David Valentine
WaterOneFlow• Set of query functions • Returns data in WaterML
NWIS Daily Values (discharge), NWIS Ground Water, NWIS Unit Values (real time), NWIS Instantaneous Irregular Data, EPA STORET, NCDC ASOS, DAYMET, MODIS, NAM12K, USGS SNOTEL, ODM (multiple sites)
Slide from David Valentine
WaterML design principles• Goal - capture semantics of hydrologic observations
discovery and retrieval• Role - exchange schema for CUAHSI web services• Driven by
– Hydrologists (community review)– ODM– USGS NWIS, EPA STORET, Academic Sources
• Conformance with Open Geospatial Consortium standards. http://www.opengeospatial.org/
• For XSD pros, the WaterML schema is athttp://his.cuahsi.org/wofws.html
Slide from David Valentine
Data Source
Network
Sites
Variables
Values
{Value, Time, Qualifier, Offset}
Utah State University
Little Bear River
Little Bear River at Mendon Rd
Dissolved Oxygen
9.78 mg/L, 1 October 2007, 6PM
• A data source operates and provides data to an observation network• A network is a set of observation sites (stored in a single ODM instance)• A site is a point location where one or more variables are measured• A variable is a measured property (e.g. describing the flow or quality of water)• A value is an observation of a variable at a particular time• A qualifier is a symbol that provides additional information about the value• An offset allows specification of measurements at various depths in water
GetSites
GetSiteInfo
GetVariableInfo
GetValues
Point Observations Information Model
- Sites
- Variables
- TimeSeries
Building Blocks of WaterML Responses
• Response Types • Key Elements
– site– sourceInfo– seriesCatalog– variable– value– queryInfo
GetValues
GetVariableInfo
GetSiteInfoGetSites
Slide from David Valentine
Sites responsequeryInfo
site
name
code
location
seriesCatalog
variables
Series how many
when
TimePeriodType
Slide from David Valentine
VariablesResponseType
• variable – same as in series element
• Code, name, units Sites
Variables
Values
Slide from David Valentine
GetValues response - timeSeries
• queryInfo
• timeSeries– sourceInfo – “where”– variable – “what”– values
Sites
Variables
Values
Slide from David Valentine
Values
• Each time series value recorded in value element
• Timestamp, plus metadata for the value, recorded in element’s attributes
ISO Time
valuequalifier
Slide from David Valentine
Outline
• HIS data publication system
• WaterML and WaterOneFlow web services
• Observations data model (ODM)
• Data loading
• Data editing and quality control
• Controlled vocabularies
• HIS central registration and tagging
Why an Observations Data Model
• Syntactic heterogeneity (File types and formats)• Semantic heterogeneity
– Language for observation attributes (structural)– Language to encode observation attribute values
(contextual)
• Publishing and sharing research data • Metadata to facilitate unambiguous
interpretation• Enhance analysis capability
Scope• Focus on Hydrologic Observations made at a
point• Exclude Remote sensing or grid data. These
are part of a digital watershed but not suitable for an atomic database model and individual value queries
• Primarily store raw observations and simple derived information to get data into its most usable form.
• Limit inclusion of extensively synthesized information and model outputs at this stage.
What are the basic attributes to be associated with each single data value and
how can these best be organized?
Value
DateTime
Variable
Location
Units
Interval (support)
Accuracy
Offset
OffsetType/ Reference Point
Source/Organization
Censoring
Data Qualifying Comments
Method
Quality Control Level
Sample Medium
Value Type
Data Type
CUAHSI Observations Data ModelStreamflow
Flux towerdata
Precipitation& Climate
Groundwaterlevels
Water Quality
Soil moisture
data
• A relational database at the single observation level (atomic model)
• Stores observation data made at points
• Metadata for unambiguous interpretation
• Traceable heritage from raw measurements to usable information
• Standard format for data sharing
• Cross dimension retrieval and analysis
Space, S
Time, T
Variables, V
s
t
Vi
vi (s,t)“Where”
“What”
“When”
A data value
CUAHSI Observations Data Modelhttp://www.cuahsi.org/his/odm.html
Site Attributes
SiteCode, e.g. NWIS:10109000SiteName, e.g. Logan River Near Logan, UTLatitude, Longitude Geographic coordinates of siteLatLongDatum Spatial reference system of latitude and longitudeElevation_m Elevation of the siteVerticalDatum Datum of the site elevationLocal X, Local Y Local coordinates of siteLocalProjection Spatial reference system of local coordinatesPosAccuracy_m Positional AccuracyState, e.g. UtahCounty, e.g. Cache
Feature
Waterbody
HydroIDHydroCodeFTypeNameAreaSqKmJunctionID
HydroPoint
HydroIDHydroCodeFTypeNameJunctionID
Watershed
HydroIDHydroCodeDrainIDAreaSqKmJunctionIDNextDownID
ComplexEdgeFeature
EdgeType
Flowline
Shoreline
HydroEdge
HydroIDHydroCodeReachCodeNameLengthKmLengthDownFlowDirFTypeEdgeTypeEnabled
SimpleJunctionFeature
1HydroJunction
HydroIDHydroCodeNextDownIDLengthDownDrainAreaFTypeEnabledAncillaryRole
*
1
*
HydroNetwork
*
HydroJunction
HydroIDHydroCodeNextDownIDLengthDownDrainAreaFTypeEnabledAncillaryRole
HydroJunction
HydroIDHydroCodeNextDownIDLengthDownDrainAreaFTypeEnabledAncillaryRole
1
1
CouplingTable
SiteIDHydroID
Sites
SiteIDSiteCode
SiteNameLatitudeLongitude…
Observations Data Model
1
1
OR
Independent of, but can be coupled to Geographic Representation
ODM Arc Hydro
Variable attributes
VariableName, e.g. dischargeVariableCode, e.g. NWIS:0060SampleMedium, e.g. waterValueType, e.g. field observation, laboratory sampleIsRegular, e.g. Yes for regular or No for intermittentTimeSupport (averaging interval for observation)DataType, e.g. Continuous, Instantaneous, CategoricalGeneralCategory, e.g. Climate, Water QualityNoDataValue, e.g. -9999
m3/sFlowCubic meters per second
Scale issues in the interpretation of data
The scale triplet
From: Blöschl, G., (1996), Scale and Scaling in Hydrology, Habilitationsschrift, Weiner Mitteilungen Wasser Abwasser Gewasser, Wien, 346 p.
a) Extent b) Spacing c) Support
length or time
quan
tity
length or time
quan
tity
length or time
quan
tity
From: Blöschl, G., (1996), Scale and Scaling in Hydrology, Habilitationsschrift, Weiner Mitteilungen Wasser Abwasser Gewasser, Wien, 346 p.
The effect of sampling for measurement scales not commensurate with the process scale
-1.5
-1
-0.5
0
0.5
1
1.5-1.25
-0.75
-0.25
0.25
0.75
1.25
(b) extent too small – trend
(c) support too large – smoothing out
-1.25
-0.75
-0.25
0.25
0.75
1.25 (a) spacing too large – noise (aliasing)
Discharge, Stage, Concentration and Daily Average Example
Data Types• Continuous (Frequent sampling - fine spacing)• Sporadic (Spot sampling - coarse spacing)• Cumulative• Incremental• Average• Maximum• Minimum• Constant over Interval• Categorical
t
0
d)(Q)t(V
t
tt
d)(Q)t(V
t
tVtQ
)(
)(
Incomplete or Inexact daily total occurring. Value is not a true 24-hour amount. One or
more periods are missing and/or an accumulated amount has begun but not ended
during the daily period.
15 min Precipitation from NCDC
Irregularly sampled groundwater level
Offset
OffsetValue
Distance from a datum or control point at which an observation was made
OffsetType defines the type of offset, e.g. distance below water level, distance above ground surface, or distance from bank of river
Water Chemistry from a profile in a lake
Groups and Derived From Associations
Stage and Streamflow Example
Daily Average Discharge ExampleDaily Average Discharge Derived from 15 Minute Discharge Data
Methods and Samples
Method specifies the method whereby an observation is measured, e.g. Streamflow using a V notch weir, TDS using a Hydrolab, sample collected in auto-sampler
SampleID is used for observations based on the laboratory analysis of a physical sample and identifies the sample from which the observation was derived. This keys to a unique LabSampleID (e.g. bottle number) and name and description of the analytical method used by a processing lab.
Water Chemistry from Laboratory Sample
ValueAccuracy
A numeric value that quantifies measurement accuracy defined as the nearness of a measurement to the standard or true value. This may be quantified as an average or root mean square error relative to the true value. Since the true value is not known this may should be estimated based on knowledge of the method and measurement instrument. Accuracy is distinct from precision which quantifies reproducibility, but does not refer to the standard or true value.
Accurate Low Accuracy, but precise
Low Accuracy
ValueAccuracy
Data Quality
Qualifier Code and Description provides qualifying information about the observations, e.g. Estimated, Provisional, Derived, Holding time for analysis exceeded
QualityControlLevel records the level of quality control that the data has been subjected to.- Level 0. Raw Data - Level 1. Quality Controlled Data - Level 2. Derived Products - Level 3. Interpreted Products - Level 4. Knowledge Products
Series of Observations
A “Data Series” is a set of all the observations of a particular variable at a site.
The SeriesCatalog is programmatically generated to provide users with the ability to do data discovery (i.e. what data is available and where) without formulating complex queries or hitting the DataValues table which can get very large.
Outline
• HIS data publication system
• WaterML and WaterOneFlow web services
• Observations data model (ODM)
• Data loading
• Data editing and quality control
• Controlled vocabularies
• HIS central registration and tagging
Loading data into ODM
• Interactive OD Data Loader (OD Loader)– Loads data from spreadsheets and
comma separated tables in simple format
• Scheduled Data Loader (SDL)– Loads data from datalogger files on a
prescribed schedule.– Interactive configuration
• SQL Server Integration Services (SSIS)– Microsoft application accompanying
SQL Server useful for programming complex loading or data management functions
OD Data Loader
SDL
SSIS
ObservationsDatabase
(ODM)
Base StationComputer
ODM StreamingData Loader
Inte
rnet
Sensor Network
Remote Monitoring Sites
Data discovery, visualization, and analysis through Internet
enabled applications
Inte
rnet
Radio Repeaters
ApplicationsCentral Observations
Database
From Jeff Horsburgh
ODM
Streaming Data Text
Files
Base StationComputer(s)
ODM SDL manages the periodic insertion of the streaming data into the ODM database using the mappings stored in the XML configuration file.
ODM SDL Import Application
XML Config
File
ODM SDL Mapping Wizard
• Automate the data loading process via scheduled updates
• Map datalogger files to the ODM schema and controlled vocabularies
ODM Streaming Data LoaderLoading theLittle Bear
Sensor DataInto ODM
From Jeff Horsburgh
CUAHSI Observations Data Modelhttp://www.cuahsi.org/his/odm.html
123
Work from Out to In
4
56
7
At last …
And don’t
forget …
Managing Data Within ODM - ODM Tools
• Query and export – export data series and metadata
• Visualize – plot and summarize data series
• Edit – delete, modify, adjust, interpolate, average, etc.
Outline
• HIS data publication system
• WaterML and WaterOneFlow web services
• Observations data model (ODM)
• Data loading
• Data editing and quality control
• Controlled vocabularies
• HIS central registration and tagging
Syntactic Heterogeneity
ODM ObservationsDatabase
ODM ObservationsDatabase
ExcelFiles
ExcelFiles
AccessFiles
AccessFiles
TextFiles
TextFiles
Data LoggerFiles
Data LoggerFiles
Multiple Data SourcesWith Multiple Formats
From Jeff Horsburgh
Semantic HeterogeneityGeneral Description of Attribute USGS NWISa EPA STORETb
Structural Heterogeneity
Code for location at which data are collected "site_no" "Station ID"
Name of location at which data are collected "Site" OR "Gage" "Station Name"
Code for measured variable "Parameter" ?c
Name of measured variable "Description" "Characteristic Name"
Time at which the observation was made "datetime" "Activity Start"
Code that identifies the agency that collected the data "agency_cd" "Org ID"
Contextual Semantic Heterogeneity
Name of measured variable "Discharge" "Flow"
Units of measured variable "cubic feet per second" "cfs"
Time at which the observation was made "2008-01-01" "2006-04-04 00:00:00"
Latitude of location at which data are collected "41°44'36" "41.7188889"
Type of monitoring site "Spring, Estuary, Lake, Surface Water" "River/Stream"a United States Geological Survey National Water Information System (http://waterdata.usgs.gov/nwis/).b United States Environmental Protection Agency Storage and Retrieval System (http://www.epa.gov/storet/).c An equivalent to the USGS parameter code does not exist in data retrieved from EPA STORET.
From Jeff Horsburgh
Overcoming Semantic Heterogeneity
• ODM Controlled Vocabulary System– ODM CV central database– Online submission and editing
of CV terms– Web services for
broadcasting CVs
Variable NameInvestigator 1: “Temperature, water”
Investigator 2: “Water Temperature”
Investigator 3: “Temperature”
Investigator 4: “Temp.”
ODM VariableNameCV
Term…
Sunshine duration
Temperature
Turbidity
…
From Jeff Horsburgh
Dynamic controlled vocabulary moderation system
Local ODMDatabase
Master ODM Controlled Vocabulary
ODM Website
ODM ControlledVocabulary Moderator
ODM Data Manager
ODMControlled Vocabulary
Web Services
ODM Tools
Local Server
XMLXML
http://his.cuahsi.org/mastercvreg.html From Jeff Horsburgh
Outline
• HIS data publication system• WaterML and WaterOneFlow web services• Observations data model (ODM)• Data loading• Data editing and quality control• Controlled vocabularies• HIS central registration and tagging
Registering Web Services with HIS Central
• Listing of all public data services
• Enables applications like Hydroseek to discover data
Tagging Variables for Data Discovery Through a Metadata Catalog
Ontology: A hierarchy of concepts
Each Variable in your data is connected to a corresponding Concept
From Michael Piasecki
Department of Civil, Architectural & Environmental Engineering04/20/23 Department of Civil, Architectural & Environmental Engineering 56
Tagging variables in Ontology
WATERS Network Information System
Steps1. The WSDL for a set of ODM
web services is registered in the WSDL Registry
2. The “harvester” jumps into action and trawls through the web services at the WSDL to find and identify new variables
3. It returns i) data updating information and ii) variable names used and compares these to those used by HydroSeek.
From Michael Piasecki
Department of Civil, Architectural & Environmental Engineering04/20/23 Department of Civil, Architectural & Environmental Engineering 57
Mapping onto Ontology
Steps contd.4. New variables are manually
mapped onto appropriate ontology concept.
5. HydroSeek catalogue is updated.
Test-Bed VarName Siteexist? VarName? content ActionCCBay DOConcSuf Y Y new data update Cat (Time)CCBay DOConcBot Y N new variable place in TaggerBin => DOCCBay DOConcMid N Y new data upudate Cat (Site+Time)
SRBHOS DO_Water Y Y new data update Cat (Time)
Minnehaha TempSurf Y N new variable place in TaggerBin => TempMInnehaha StreamDOCon Y N new variable place in TaggerBin => DO
SantaFe WaterDOCon Y N new variable place in TaggerBin => DOSantaFe GoldConc Y N new var/no conc place in TaggerBin => ??
From Michael Piasecki
Hydroseekhttp://www.hydroseek.org
Supports search by location and type of data across multiple observation networks including NWIS, Storet, and university data
Summary• Generic method for publishing observational data
– Supports many types of point observational data– Overcomes syntactic and semantic heterogeneity using a
standard data model and controlled vocabularies– Supports a national network of observatory test beds but can
grow!
• Web services provide programmatic machine access to data– Work with the data in your data analysis software of choice
• Internet-based applications provide user interfaces for the data and geographic context for monitoring sites