Web Services and Water Markup Language for Distributed Hydrologic Data Access
description
Transcript of Web Services and Water Markup Language for Distributed Hydrologic Data Access
Web Services and Water Markup Language for Distributed Hydrologic
Data AccessIlya Zaslavsky
San Diego Supercomputer Center, UCSD
CUAHSI = Consortium of Universities for the Advancement of Hydrologic Sciences, Inc.; HIS = Hydrologic Information System
NSF-supported Collaborative Project: UT Austin + SDSC + Drexel + Duke +Utah State
www.cuahsi.org/his/
The Grid is becoming the backbone for collaborative science and data sharing
CI is about RE-USING data and research resources !!
Cyberinfrastructure for hydrology (in the U.S.)
• Hydrologic observations:• Reliance on federally-organized data collection (NWIS, STORET, NCDC, etc.)
with huge and complex nomenclatures simplifying access to federal repositories relatively lower emphasis on data ownership
• Handling time in both UTC and local• Various spatial offsets• Multiple data types: time series, fields, spatial data
• Integrative discipline:• Interoperation with atmospheric, ocean, soils, geomorphology, social datasets
and services…• Community:
• Organized by “natural boundaries” networks of relatively autonomous self-managed data nodes
• Partnership with public sector water management• 96% use Windows for research; Excel, ArcGIS, Matlab – most popular
Mix of standards, software licensing models, vocabularies; leveraging tools developed in other CI projects.
WaterOneFlow Web Services
Data access through web
services
Data storage through web
services
Dow
nlo
ads
Upl
oa
ds
Observatory servers
Workgroup HIS
SDSC HIS servers
3rd party servers
e.g. USGS, NCDC
GIS
Matlab
IDL
Splus, R
D2K, I2K
Programming (Fortran, C, VB)
Web services interface
DASH: Data Access System for Hydrology
Information input, display, query and output services
Preliminary data exploration and discovery. See what is available and perform exploratory analyses
HTML -XML WS
DL
- SO
AP
Hydrologic Information System Service Oriented Architecture
SupercomputerCenters:NCSA,TACC
Domain Sciences:
Unidata, NCARLTER, GEON
Government:USGS, EPA,
NCDC, USDA
Industry:ESRI, Kisters,
OpenMI
HISTeam
WATERSTestbed
WATERS Network Information System
CUAHSI HIS
The CUAHSI Community, HIS and WATERS
CUAHSI: 116 Universities (Nov. 2006)
HIS Team:Texas, SDSC,Utah, Drexel,
Duke
CUAHSI HIS as a mediator across multiple agency and PI data
• Keeps identifiers for sites, variables, etc. across observation networks
• Manages and publishes controlled vocabularies, and provides vocabulary/ontology management and update tools
• Provides common structural definitions for data interchange
• Provides a sample protocol implementation• Governance framework: a consortium of universities,
MOUs with federal agencies, collaboration with key commercial partners, led by renowned hydrologists, and NSF support for core development and test beds
Main Components
• Hydrologic Observations Data Model, ODM databases and site catalogs
• Web services for accessing hydrologic repositories and data in ODMs
• Clients: Online Data Access System + multiple desktopapplication add-ons
• Network of CUAHSI HIS servers, deployed at hydrologic observatories and integrated with other observing systems and sensor data collection
NWISNWIS
ArcGISArcGIS
ExcelExcel
NCARNCAR
UnidataUnidata
NASANASAStoretStoret
NCDCNCDC
AmerifluxAmeriflux
MatlabMatlabAccessAccess SASSAS
FortranFortran
Visual BasicVisual Basic
C/C++C/C++
CUAHSI Web ServicesCUAHSI Web Services
Remote CUAHSI HIS Node (Windows)
Data
IIS Web ServerASP . Net
SQL ServerArcGIS
Technologies
HDASHODM
Web
ServiceWeb Services
Web Serviceproxies
Remote CUAHSI HIS Node (Windows)
Data
IIS Web ServerASP . Net
SQL ServerArcGIS
Technologies
HDASHODM
Web
ServiceWeb Services
Web Serviceproxies
Remote CUAHSI HIS Node (Windows)
Data
IIS Web ServerASP . Net
SQL ServerArcGIS
Technologies
HDASHODM
Web
ServiceWeb Services
Web Serviceproxies
Remote CUAHSI HIS Node (Windows)
Data
IIS Web ServerASP . Net
SQL ServerArcGIS
Technologies
HDASHODM
Web
ServiceWeb Services
Web Serviceproxies
Point Observations Information Model
• A data source operates an observation network• A network is a set of observation sites• A site is a point location where one or more variables are measured• A variable is a property describing the flow or quality of water• An observation series is an array of observations at a given site, for a given variable, with start time and end time• A value is an observation of a variable at a particular time• A qualifier is a symbol that provides additional information about the value
Data Source
Network
Sites
ObservationSeries
Values
{Value, Time, Qualifier}
USGS
Streamflow gages
Neuse River near Clayton, NC
Discharge, stage, start, end (Daily or instantaneous)
206 cfs, 13 August 2006
Return network information, and variable information within the network
Return site information, including a series catalog of variables measured at a site with their periods of record
Return time series of values
Challenges… (1/2)• Sites
• STORET has stations, and measurement points, at various offsets…• Site metadata lacking and inconsistent (e.g. 2/3 no HUC info, 1/3 no
state/county info); agency site files need to be upgraded to ODM…• A groundwater site is different than a stream gauge…
• Censored values• Values have qualifiers, such as “less than”, “censored”, etc. – per value.
Sometimes mixed data types.. • Units
• There are multiple renditions of the same units, even within one repository• There may be several units for the same parameter code (STORET)• If no value recorded – there are no units??• Unit multipliers
• E.g. NCDC ASOS keeps measurements as integers, and provides a multiplier for each variable
• Sources• STORET requires organization IDs (which collected data for STORET) in
addition to site IDs• Time stamps: ISO 8601
• A service to determine UTC offsets given lat/lon and date??
Challenges… (2/2)• Values retrieval
• USGS: by site, variable, time range• EPA: by organization-site, variable, medium, units, time range• NCDC: fewer variables, period of record applies to site, not to
seriesCatalog• Variable semantics
• Variable names and measurement methods don’t match• E.g. NWIS parameter # 625 is labeled ‘ammonia + organic nitrogen‘,
Kjeldahl method is used for determination but not mentioned in parameter description. In STORET this parameter is referred to as Kjeldahl Nitrogen.
• One-to-one mapping not always possible• E.g. NWIS: ‘bed sediment’ and ‘suspended sediment’ medium types vs.
STORET’s ‘sediment’.
Ontology tagging, semantic mediation
- From different database structures, data collection procedures, quality control, access mechanisms to uniform signatures … Water Markup Language- Tested in different environments- Standards-based- Can support advanced interfaces via harvested catalogs- Accessible to community- Templates for development of new services- Optimized, error handling, memory management, versioning, run from fast servers- Working with agencies on setting up services and updating site files
NWIS Daily Values (discharge), NWIS Ground Water, NWIS Unit Values (real time), NWIS Instantaneous Irregular Data, EPA STORET, NCDC ASOS, DAYMET, MODIS, NAM12K, ODM
WaterOneFlow API, v. 1.0
• GetValues • Returns a TimeSeries
• GetSiteInfo• Station Information, including a period of record
• GetVariableInfo• Returns variable/parameter information
• Also: GetSites, GetVariables• Object and string output
WaterML design principles
• Driven largely by hydrologists; the goal is to capture semantics of hydrologic observations discovery and retrieval
• Relies to a large extent on the information model as in ODM (Observations Data Model), and terms are aligned as much as possible• Several community reviews since 2005
• Driven by data served by USGS NWIS, EPA STORET, multiple individual PI-collected observations
• Is no more than an exchange schema for CUAHSI web services
• The least barrier for adoption by hydrologists• A fairly simple and rigid schema tuned to the current
implementation• Conformance with OGC specs not in the initial scope
WaterML key elements
• Response Types
– SiteInfo
– Variables
– TimeSeries
• Key Elements– site– sourceInfo– seriesCatalog– variable– timeSeries
• values
– queryInfo
GetValues
GetVariableInfo
GetSiteInfo
variables
variablesResponse
variable
1
many
timeSeriesqueryInfo
criteria
timeSeriesResponse
variable
sourceInfo
queryURL
values
site queryInfo
criteria
sitesResponse
seriesCatalog siteInfo
queryURL
variable
series
variableTimeInterval
1
many
Structure of responses
SiteInfo responsequeryInfo
site
name
code
location
seriesCatalog
variables
what
how many
when
TimePeriodType
TimeSeries responsequeryInfo
location
variable
values
Clients• Tested with .Net and Java• Desktop clients:
Excel, Matlab, ArcGIS, VB.NET,more beingwritten
• Web client: DASH (Data Access System for Hydrology): http://river.sdsc.edu/DASH (beta)
DA
SH
AGS Server
IIS
Windows 2003 Server4 GB Ram1 TB Disk
Quad Core CPU
SQLServer
VS 2
005
WaterOneFlow Web Services
ArcGIS 9.2
GIS Data Mxd Service
OD
M L
oad
er
OD
M t
ools
OD
M
Current Deployment Current Deployment ArchitectureArchitecture
Direct DB connection
SQL Server
ODMs and catalogs. All instancesexposed as ODM (i.e. have standard ODM tables or views: Sites, Variables, SeriesCatalog, etc.)
NWIS-IID
NWIS-DV
ASOS
STORET
TCEQ
BearRiver
. . .
Spatial store
Geodatabase or collection of shapefilesor both
NWIS-IID points
NWIS-DV points
ASOS points
STORET points
TCEQ points
BearRiver points
. . .
My new ODM
My new points
More databases
More synced layers
DASH Web Application
Background layers
(can be in the same or separate spatial store)
WOF services
Web services from a common template
NWIS-IID WS
NWIS-DV WS
ASOS WS
STORET WS
TCEQ WS
BearRiver WS
. . .My new WS
More WS fromODM-WS template
USGS
NCDC
EPA
TCEQ
Web Configuration fileStores information about registered networks
MXDStores information about layers
WSDLs
, web
serv
ice U
RLs Connection
strings
Layer info,
symbology, etc.
ODMDataLoader
2
6
5
3
1
4
WORKGROUPHISSERVERORGANIZATION
STEPS FORREGISTERINGOBSERVATIO
NDATA
HIS Scalability
• Adding…– …data types and datasets; processing models and services; servers;
users and roles – – - shall not create unmanageable bottlenecks that require system re-
engineering
• Designing for scalability:– Distilling a generic set of web service signatures; resolving semantic
and structural heterogeneities– Using ODM as a common generic format for time series data, for ease
of coding and uniform search interfaces– DASH GUI design to abstract specifics of disparate repositories– Leveraging common CI components developed in GEON– Working with agencies to remove web service bottlenecks
Near future
• Deployment at the 11 WATERS test beds, and beyond• And documenting experience• Organizing HIS support
• Working with federal and state agencies on web services• NCDC, USGS, EPA, state agencies (e.g. TCEQ)• Analysis services for site catalogs and ODMs ( ---- see next slide)
• OGC connections: WaterML is OGC Discussion Paper (approved at April 2007 TC Meeting) • Need to be reviewed further, based on initial implementation• Internationalization (with CSIRO WRON, European WISE, H2OML) • Carry CUAHSI WaterML messages over O&M, as O&M profile
• Towards WaterML and web services 1.1
US Map of USGS Observations
Antarctica
Puerto Rico
Hawaii
Alaska
US Map of USGS Observations – by Mean Period of Record
Different types of nutrients by decade: Available Data Total
Some physical properties by decade: Available Data Total
Same without discharge, gage height, temperature and precipitation (the four most common, in that order):Available Data Total
Near future
• Deployment at the 11 WATERS test beds, and beyond• And documenting experience• Organizing HIS support
• Working with federal and state agencies on web services• NCDC, USGS, EPA, state agencies (e.g. TCEQ)• Analysis services for site catalogs and ODMs ( ---- see next slide)
• OGC connections: WaterML is OGC Discussion Paper (approved at April 2007 TC Meeting) • Need to be reviewed further, based on initial implementation• Internationalization (with CSIRO WRON, European WISE, H2OML) • Carry CUAHSI WaterML messages over O&M, as O&M profile
• Towards WaterML and web services 1.1
SDSC Spatial Information Systems Lab
Research and system development• Services-based spatial information integration
infrastructure• Mediation services for spatial data, query
processing, map assembly services• Long-term spatial data preservation• Spatial data standards and technologies for
online mapping (SVG, WMS/WFS)• Support of spatial data projects at SDSC and
beyond
Mediator
LegendGenerator
MapAssembler
Ontology
…
GRID SERVICESFOR MAP INTEGRATION
Mediator
LegendGenerator
MapAssembler
Ontology
…
GRID SERVICESFOR MAP INTEGRATION
services
In Geosciences (GEON, CUAHSI, CBEO,…)
Spatial web services
FederalAgencies
Figure 1.26 The Geography Network.
ESRICounty spatial data and toxicant information
Telesis, other localNon-profits
CA state
WSDL
WSWSDL
WSWSDL
WSWSDL
WSWSDL
WSWSDL
WS
Student projects
The CHI ME Model
In regional development (NIEHS SBRP, Katrina)
In Neurosciences (BIRN, CCDB)
http://scirad.sdsc.edu/datatech/si.html
Contact: [email protected]
Links and Acknowledgments
• The CUAHSI HIS project:• http://www.cuahsi.org/his/ (main site)• http://water.sdsc.edu (central development server)
• Many thanks to Microsoft Research for partly sponsoring this trip