1
Collaborative Research, Development and Demonstration
Ecoinformatics International Technical Collaboration
Copenhagen, Denmark
March, 23 2009
Bruce BargmeyerLawrence Berkeley National LaboratoryandBerkeley Water CenterUniversity of California, BerkeleyTel: +1 [email protected]
Collaborative Research, Development and Demonstration
SciScope Microsoft, LBNL, Berkeley Water Center, EPA, USGS, EEA?
Involves UCB researchers, data cubes for water data,
Accomplished: Demo using terminology connected to metadata and data to access STORET and NWIS water data
SciScope running on computers at LBNL
Current effort Extending to include some things that were “hardwired” in demo documenting code, creating SDK Install at new site to validate June milestone
To discuss Further work on terminology and linkage to metadata Extensions: Citizen Observatories, Citizen Science Extensions: Social computing Hosting and Governance
– Consortium of Universities for the Advancement of Hydrologic Science( CUASHI)
2
SciScope
STORET has 758 sites in Texas, TCEQ has 8407.
STORET has 47,602 sites in Florida, NWIS has 27,906.
NWIS has 121,545 in Minnesota, STORET has 22,260.
TCEQ data from David MaidmentSource: Bora Beran, Microsoft Research
Citizen Observatories & Citizen Science Quick Examples
Weather Underground Community Collaborative Rain, Hail and
Snow Network. CoCoRaHS Microsoft World Telescope
4
Building Blocks
Terminology and Ontology Metadata Registration Collaborative authoring and Wikis Information and Data Modeling
5
The Wiki Way
Collaborative authoring and Wikis The ability for any member of the community to contribute to the resource A history mechanism The ability to add links, tags, keywords and classifications A rapid, "organic" evolution cycle. The ability to leave information un (or under) specified Federation – virtual "wiki space". Multilingual Collaborative authoring and Wikis
Semantic Wiki The ability to add "meaning" to links, tags, keywords and classifications The ability to import and export more formalized knowledge - e.g., ontology
descriptionsThe ability to represent the same information across the "formalization spectrum" (!) -
6
8
Ecoinformatics Challenge: Draw Together Concept Systems, Metadata & Data
ID Date Temp Hg
A 06-09-13 4.4 4
B 06-09-13 9.3 2
X 06-09-13 6.7 78
Name Datatype Definition Units
ID textMonitoring Station Identifier
not applicable
Date date Date yy-mm-dd
Temp numberTemperature (to 0.1 degree C)
degrees Celcius
Hg numberMercury contamination
micrograms per liter
Facilitate discovery, access, use and understanding
Data:
Metadata:
Biological Radioactive
Contamination
lead cadmiummercury
Chemical
Concept system:
Ecoinformatics Research
Web 2.0 Semantic Technology - the tools for creating, disseminating and using terminological
resources such as dictionaries, classification schemes and ontologies have now reached the point that it is possible to maintain a centralized terminology that can be used to describe data resources in a queryable and interoperable fashion.
Metadata Repositories - ISO 11179 Edition 3 provides a common model to record, manage and disseminate semantic annotation of information resources. This provides a framework for acquiring, integrating and sharing catalog content.
Semantic Wiki - wiki technology has demonstrated the viability of community generated content. Semantic wiki has demonstrated that community generated content can exist across a continuum of formality. This provides the ability to collect information about catalog content in a format and level of formality that suits each user's needs and to gradually transform user input into a shared structure and semantics for distribution, integration and sharing.
Data Modeling and Model Driven Architecture - it is now possible to automatically upload and integrate the contents of most modern SQL catalogs, information models, schemas, spreadsheet templates, etc. This technology makes it possible to create an inventory of physical data resources very quickly.
9
Strategic DiscussionMajor Results
XMDR ISO/IEC Standard Prototype - Open Source Software EPA System of Registries In-house DOD National Cancer Insitute caBIG
caDSR is key part of caCORE Now demonstrating for DOE Nuclear Non-Proliferation
Model Integration
11
caGrid Data Description Infrastructure
Client and service APIs are object oriented, and operate over well-defined and curated data types
Objects are defined in UML and converted into ISO/IEC 11179 Administered Components, which are in turn registered in the Cancer Data Standards Repository (caDSR)
Object definitions draw from controlled terminology and vocabulary registered in the Enterprise Vocabulary Services (EVS), and their relationships are thus semantically described
XML serialization of objects adhere to XML schemas registered in the Global Model Exchange (GME)
Service
Core Services
Client
XSDWSDL
Grid Service
Service Definition
Data TypeDefinitions
Service API
Grid Client
Client API
Registered In
Object Definitions
SemanticallyDescribed In
XMLObjectsSerialize To
ValidatesAgainst
Client Uses
Cancer Data Standards Repository
Enterprise Vocabulary
Services
Objects
GlobalModel
Exchange
GMERegistered In
ObjectDefinitions
Objects
Source: National Cancer Institute, caBIG
EcoinformaticsActivities & Research
Two kinds of activities: Advances/activities as part of current operations
with internal agency resources Ecoinformatics result: primarily technology transfer
by sharing ideas. Activities requiring additional resources
(contracts, research grants, …) Ecoinformatics result: technology transfer of ideas,
research results, and tools/infrastructure.
14
Expressed Intent – Coordinate R&D in Ecoinformatics
Share cost and benefits through coordination of US & EU (& Asia?) ecoinformatics R&D
Identify key advances needed at the core of ecoinformatics Semantics management, semantics services, semantic computing Terminology web services IT support for indicators, … Demonstrate in ecoinformatics “Test Bed” Develop an “architecture” of advanced ecoinformatics
technologies? Research, Development and Demonstration projects ranging from
improvements in operations to strategic breakthroughs
15
How to Share Tasks & Results of Ecoinformatics R&D
Conduct the international R&D in separate projects that are funded separately (but aware of others)
Conduct the R&D with interlocking workpackages/tasks/deliverables. International R&D with integrated results
16
Coordinating International Ecoinformatics R&D
Who funds whom? US NSF and EU 7th FP do not fund internal
government agency activities. R&D funds to academe, Govt. labs, and private industry.
Environmental agencies have operational and “R&D” funding used to fund outside R&D organizations.
Government staff participate on their agency’s own dime. Contract/award staff participate under project funding.
17
Coordinating International Ecoinformatics R&D
How to coordinate? Government agencies can collectively coordinate
priority areas of international ecoinformatics R&D. (EEA, EC DGs, EPA, USGS, NSF, …)
Funding agencies can declare that international joint R&D efforts are encouraged. (But will all partners get funded?)
Government agencies can combine funding. However, it is difficult for funds to cross oceans (or
major political boundaries)
18
Coordinating International Ecoinformatics R&D
Getting down to brass tacks: How do R&D organizations submit international proposals with interlocking
workpackages/deliverables? How do R&D organizations submit proposals in which all of the international
participants get funded?
R&D proposals can be funded with expectation of future international linkage. Funding agencies can establish international linkage of proposal review
process. More difficult as more agencies are involved
Different proposal & funding processes can be utilized, outside of the usual “call” or “RFP” process. Some kind of incremental process?
19
Top Related