Post on 13-Dec-2015
1
eXtended Metadata Registry (XMDR)
International Ecoinformatics Technical CollaborationBerkeley, California
October 24, 2006
Bruce Bargmeyer, Lawrence Berkley National LaboratoryUniversity of CaliforniaTel: +1 510-495-2905bebargmeyer@lbl.gov
2
Topics
Challenges to addressA brief tutorial on Semantics and semantic
computingwhere XMDR fits
Semantic computing technologies Traditional Data Administration
XMDR projectTest Bed demonstrations
3
The Internet Revolution
A world wide web of diverse content: The information glut is nothing new. The access to it is astonishing.
4
Challenge: Find and process non-explicit data
Analgesic Agent
Non-Narcotic Analgesic
AcetominophenNonsteroidal Antiinflammatory Drug
Analgesic and Antipyretic
DatrilAnacin-3 Tylenol
For example…
Patient data on drugs contains brand names (e.g. Tylenol, Anacin-3, Datril,…);
However, want to study patients taking analgesic agents
5
Challenge: Specify and compute across Relations, e.g., within a food web in an
Arctic ecosystem
An organism is connected to another organism for which it is a source of food energy and material by an arrow representing the direction of biomass transfer.
Source: http://en.wikipedia.org/wiki/Food_web#Food_web (from SPIRE)
6
Challenge: Combine Data, Metadata & Concept Systems
ID Date Temp Hg
A 06-09-13 4.4 4
B 06-09-13 9.3 2
X 06-09-13 6.7 78
Name Datatype Definition Units
ID textMonitoring Station Identifier
not applicable
Date date Date yy-mm-dd
Temp numberTemperature (to 0.1 degree C)
degrees Celcius
Hg numberMercury contamination
micrograms per liter
Inference Search Query:“find water bodies downstream from Fletcher Creek where chemical contamination was over 10 micrograms per liter between December 2001 and March 2003”
Data:
Metadata:
Biological Radioactive
Contamination
lead cadmiummercury
Chemical
Concept system:
8
Challenge: Use data from systems that record the same facts with different terms
Common Content
OASIS/ebXMLRegistries
Common Content
ISO 11179Registries
Common Content
OntologicalRegistries
Common Content
CASE ToolRepositories
Common Content
UDDIRegistries
CountryIdentifier
DataElement
XML Tag
TermHierarchy
Attribute
BusinessSpecification
TableColumn
SoftwareComponentRegistries
Common Content
Common Content
DatabaseCatalogs
BusinessObject
DublinCore
Registries
Common Content
Coverage
9
Data Elements
DZ
BE
CN
DK
EG
FR
. . .
ZW
ISO 3166English Name
ISO 31663-Numeric Code
012
056
156
208
818
250
. . .
716
ISO 31662-Alpha Code
Algeria
Belgium
China
Denmark
Egypt
France
. . .
Zimbabwe
Name:Context:Definition:Unique ID: 4572Value Domain:Maintenance Org.Steward:Classification:Registration Authority:Others
ISO 3166French Name
L`Algérie
Belgique
Chine
Danemark
Egypte
La France
. . .
Zimbabwe
DZA
BEL
CHN
DNK
EGY
FRA
. . .
ZWE
ISO 31663-Alpha Code
Same Fact, Different Terms
Algeria
Belgium
China
Denmark
Egypt
France
. . .
Zimbabwe
Name: Country IdentifiersContext:Definition:Unique ID: 5769Conceptual Domain:Maintenance Org.:Steward:Classification:Registration Authority:Others
DataElementConcept
10
Challenge: Draw information together from a broad range of studies, databases, reports, etc.
11
Challenge: Gain Common Understanding of meaning between Data Creators and Data Users
Users Information systems
Data Creation
UsersUsers
EEA
USGS
DoD
EPAenvironagricultureclimatehuman healthindustrytourismsoilwaterair
123345445670248591308
123345445670248591308
3268082513485038270800002178
3268082513485038270800002178
text data
environagricultureclimatehuman healthindustrytourismsoilwaterair
123345445670248591308
123345445670248591308
3268082513485038270800002178
3268082513485038270800002178
text
ambienteagriculturatiemposalud hunanoindustriaturismotierraaguaaero
123345445670248591308
123345445670248591308
3268082513485038270800002178
3268082513485038270800002178
text data
data
environagricultureclimatehuman healthindustrytourismsoilwaterair
123345445670248591308
123345445670248591308
3268082513485038270800002178
3268082513485038270800002178
text data
Others . . .
ambienteagriculturatiemposalud hunoindustriaturismotierraaguaaero
123345445670248591308
123345445670248591308
3268082513485038
3268082513485038270800002178
text data
A common interpretation of what the data represents
12
Semantic Computing and XMDR
We are laying the foundation to make a quantum leap toward a substantially new way of computing: Semantic Computing
How can we make use of semantic computing for the environment and health?
What do environmental agencies need to do to prepare for and stimulate semantic computing?
What are the ecoinformatics challenges?
13
Coming: A Semantic Revolution
Searching and rankingPattern analysisKnowledge discoveryQuestion answeringReasoningSemi-automated decision making
14
The Nub of It
Processing that takes “meaning” into account Processing based on the relations between things
not just computing about the things themselves. Processing that takes people out of the
processing, reducing the human toil Data access, extraction, mapping, translation,
formatting, validation, inferencing, …Delivering higher-level results that are more
helpful for the user’s thought and action
15
XMDR & ISO/IEC 11179
Managing, harmonizing, and vetting semantics is essential to enable semantic computing
Managing, harmonizing and vetting semantics is important for traditional data management. In the past we just covered the basics We want to maintain compatibility with previous MDR
purposes (data administration, data provenance, data design, …)
Ecoinformatics Test Bed demonstrations of XMDR should show more than incremental improvements of current applications for metadata registries
16
A Brief Tutorial on Semantics
What is meaning?What are concepts?What are relations?What are concept systems?What is “reasoning”?
17C.K Ogden and I. A. Richards. The Meaning of Meaning.
Thought or Reference (Concept)
Referent Symbol
SymbolisesRefers to
Stands for“Rose”, “ClipArt”
Meaning: The Semiotic Triangle
18
Semiotic Triangle:Concepts, Definitions and Signs
CONCEPT
Referent
Refers To Symbolizes
Stands For
“Rose”,“ClipArt”
Definition
Sign
22
Definitions in the EPA Environmental Data Registry
http://www.epa/gov/edr/sw/AdministeredItem#MailingAddress
The exact address where a mail piece is intended to be delivered, including urban-style address, rural route, and PO Box
http://www.epa/gov/edr/sw/AdministeredItem#StateUSPSCode
The U.S. Postal Service (USPS) abbreviation that represents a state or state equivalent for the U.S. or Canada
http://www.epa/gov/edr/sw/AdministeredItem#StateName
The name of the state where mail is delivered
Mailing Address:
State USPS Code:
Mailing Address State Name:
24
SNOMED – Terms Defined by Relations
28
Computable Meaning
CONCEPT
Referent
Refers To Symbolizes
Stands For
“Rose”,“ClipArt”
rdfs:subClassOf owl:equivalentClass owl:disjointWith
If “rose” is owl:disjointWith “daffodil”, then a computer can determine that anassertion is invalid, if it states that a rose is also a daffodil (e.g., in a knowledgebase).
29
Fletcher CreekMerced
Lake
WaterBody
What are Relations?
Relation
Merced Lake
Fletcher CreekMerced River
isA isA
Concepts and relations can be represented as nodes and edges in formal graph structures, e.g., “is-a” hierarchies.
30
A
2
b a c d
1
Nodes represent concepts
Lines (arcs) represent relations
Concept Systems have Nodes and may have Relations
Concept systems are concepts and the relations between them.Concept systems can be represented & queried as graphs
31
A More Complex Concept Graph
From Supervaluation Semantics for an Inland Water Feature OntologyPaulo Santos and Brandon Bennett http://ijcai.org/papers/1187.pdf#search=%22terminology%20water%20ontology%22
Concept lattice of inland water features
Linear LargeNon-linear
Non-linear
Large linear Small linear Small non- linear
Deep Natural
Artificial
River Stream Canal Reservoir Lake Marsh Pond
Flowing Shallow Stagnant
33
Tree
Partial Order Tree
Ordered Tree
Faceted ClassificationDirected Acyclic GraphPartial Order Graph
Powerset of 3 element setBipartite Graph Clique
Compound Graph
Types of Concept System Graph Structures
34
Graph Taxonomy
Directed Graph
Directed Acyclic Graph
Graph
Undirected Graph
Bipartite Graph
Partial Order Graph
Faceted Classification
Clique
Partial Order Tree
Tree
Lattice
Ordered Tree
Note: not all bipartite graphsare undirected.
35
What Kind of Relations are There?Lots!
Relationship class: A particular type of connection existing between people related to or having dealings with each other.
acquaintanceOf - A person having more than slight or superficial knowledge of this person but short of friendship.
ambivalentOf - A person towards whom this person has mixed feelings or emotions.
ancestorOf - A person who is a descendant of this person. antagonistOf - A person who opposes and contends against this person. apprenticeTo - A person to whom this person serves as a trusted counselor or
teacher. childOf - A person who was given birth to or nurtured and raised by this person. closeFriendOf - A person who shares a close mutual friendship with this person. collaboratesWith - A person who works towards a common goal with this person.
…
36
Example of relations in a food web in an Arctic ecosystem
An organism is connected to another organism for which it is a source of food energy and material by an arrow representing the direction of biomass transfer.
Source: http://en.wikipedia.org/wiki/Food_web#Food_web (from SPIRE)
37
Ontologies are a type of Concept System
Ontology: explicit formal specifications of the terms in the domain and relations among them (Gruber 1993)
An ontology defines a common vocabulary for researchers who need to share information in a domain. It includes machine-interpretable definitions of basic concepts in the domain and relations among them.
Why would someone want to develop an ontology? Some of the reasons are: To share common understanding of the structure of information
among people or software agents To enable reuse of domain knowledge To make domain assumptions explicit To separate domain knowledge from the operational knowledge To analyze domain knowledge
http://www.ksl.stanford.edu/people/dlm/papers/ontology101/ontology101-noy-mcguinness.html
38
What is Reasoning?Inference
Polio Smallpox
Infectious Disease
Disease
is-a
is-a is-a
is-a
is-a
Diabetes Heart disease
Chronic Disease
is-a
Signifies inferred is-a relationship
39
Reasoning: Taxonomies & partonomies can be used to support inference queries
Oakland Berkeley
Alameda County
California
part-of
part-of part-of
part-of
part-of
Santa Clara San Jose
Santa Clara County
part-of
E.g., if a database containsinformation on events by city,we could query that database for events that happened in a particular county or state,even though the event data does not contain explicit state or county codes.
40
Reasoning: Relationship metadata can be used to infer non-explicit data
Analgesic Agent
Non-Narcotic Analgesic
AcetominophenNonsteroidal Antiinflammatory Drug
Analgesic and Antipyretic
DatrilAnacin-3 Tylenol
For example…(1) patient data on drugs currently
being taken contains brand names (e.g. Tylenol, Anacin-3, Datril,…);
(2) concept system connects different drug types and names with one another (via is-a, part-of, etc. relationships);
(3) so… patient data can be linked and searched by inferred terms like “acetominophen” and “analgesic” as well as trade names explicitly stored as text strings in the database
41
Reasoning: Least Common Ancestor Query
Analgesic and Antipyretic
Analgesic Agent
Non-Narcotic Analgesic
Acetominophen
Opioid
Opiate
Morphine Sulfate
Codeine Phosphate
Nonsteroidal Antiinflammatory Drug
What is the least common ancestor concept in the NCI Thesaurus for Acetominophen and Morphine Sulfate? (answer = Analgesic Agent)
42
Reasoning: Example “sibling” queries: concepts that share a common ancestor
Environmental:"siblings" of Wetland (in NASA SWEET ontology)
HealthSiblings of ERK1 finds all 700+ other kinase enzymesSiblings of Novastatin finds all other statins
11179 MetadataSibling values in an enumerated value
domain
43
HealthFind all the siblings of
Breast Neoplasm
EnvironmentalFind all chemicals that are acarcinogen (cause cancer) andtoxin (are poisonous) andterratogenic (cause birth
defects)
Reasoning: More complex “sibling” queries: concepts with multiple ancestors
site neoplasms breast disorders
Breast neoplasm
RespiratorySystem
neoplasm
Non-Neoplastic
Breast Disorder
Eye neoplasm
44
End of Tutorial about concept systems
Where does ISO/IEC 11179 fit?
45
Data Generation and UseCost vs. Coordination
Autonomous
Reporting
Community of Interest
Full Control
$
Coordination
DataCreation
46
Data Generation and UseCost vs. Coordination
Autonomous
Reporting
Community of Interest
Full Control
$
Coordination
DataCreation
DataUse
47
ISO/IEC 11179 Metadata Registries Reduce Cost of Data Creation and Use
Autonomous
Reporting
Community of Interest
Full Control
$
Coordination
DataCreation
DataUse
48
Metadata Registries Increase the Benefitfrom Data (Strategic Effectiveness)
Autonomous Reporting
Community of Interest Full Control
Benefit
MDR
49
What Can ISO/IEC 11179 MDR Do?
Traditional Data Management (11179 Edition 2) Register metadata which describes data—in databases,
applications, XML Schemas, data models, flat files, paper Assist in harmonizing, standardizing, and vetting metadata Assist data engineering Provide a source of well formed data designs for system designers Record reporting requirements Assist data generation, by describing the meaning of data entry
fields and the potential valid values Register provenance information that can be provided to end
users of data Assist with information discovery by pointing to systems where
particular data is maintained.
50
Data Elements
DZ
BE
CN
DK
EG
FR
. . .
ZW
ISO 3166English Name
ISO 31663-Numeric Code
012
056
156
208
818
250
. . .
716
ISO 31662-Alpha Code
Algeria
Belgium
China
Denmark
Egypt
France
. . .
Zimbabwe
Name:Context:Definition:Unique ID: 4572Value Domain:Maintenance Org.Steward:Classification:Registration Authority:Others
ISO 3166French Name
L`Algérie
Belgique
Chine
Danemark
Egypte
La France
. . .
Zimbabwe
DZA
BEL
CHN
DNK
EGY
FRA
. . .
ZWE
ISO 31663-Alpha Code
Traditional MDR:Manage Code Sets
Algeria
Belgium
China
Denmark
Egypt
France
. . .
Zimbabwe
Name: Country IdentifiersContext:Definition:Unique ID: 5769Conceptual Domain:Maintenance Org.:Steward:Classification:Registration Authority:Others
DataElementConcept
51
What Can XMDR Do?
Support a new generation of semantic computing Concept system management Harmonizing and vetting concept systems Linkage of concept systems to data Interrelation of multiple concept systems Grounding ontologies and RDF in agreed upon
semantics Reasoning across XMDR content Provision of Semantic Services
52
Coming: A Semantic Revolution
Autonomous
Reporting
Community of Interest
Full ControlSearching and rankingPattern analysisKnowledge discoveryQuestion answeringReasoningSemi-automated decision making
53
We are trying to manage semantics in an increasingly complex content space
Structured dataSemi-structured dataUnstructured dataTextPictographicGraphicsMultimediaVoice video
54
11179-3 (E3) Increases MDR Benefit
Autonomous Reporting
Community of Interest Full Control
Benefit
MDR
When communities create information according to a common vocabulary the value of the resulting information increases dramatically.
55
Example
Combining Concept Systems, Data, and Metadata to answer queries.
56
Linking Concepts: Text Document
§ 141.62 Maximum contaminant levelsfor inorganic contaminants.(a) [Reserved](b) The maximum contaminant levelsfor inorganic contaminants specified inparagraphs (b) (2)–(6), (b)(10), and (b)(11)–(16) of this section apply to communitywater systems and non-transient,non-community water systems.The maximum contaminant level specifiedin paragraph (b)(1) of this sectiononly applies to community water systems.The maximum contaminant levelsspecified in (b)(7), (b)(8), and (b)(9)of this section apply to communitywater systems; non-transient, noncommunitywater systems; and transientnon-community water systems.Contaminant MCL (mg/l)(1) Fluoride ............................ 4.0(2) Asbestos .......................... 7 Million Fibers/liter (longerthan 10 μm).(3) Barium .............................. 2(4) Cadmium .......................... 0.005(5) Chromium ......................... 0.1(6) Mercury ............................ 0.002(7) Nitrate ............................... 10 (as Nitrogen)
§ 141.62 40 CFR Ch. I (7–1–02 Edition)
Title 40--Protection of Environment
CHAPTER I--ENVIRONMENTAL PROTECTION AGENCY PART 141--NATIONAL PRIMARY DRINKING WATER REGULATIONS
57
Thesaurus Concept System(From GEMET)
Chemical Contamination
Definition The addition or presence of chemicals to, or in, another substance to such a degree as to render it unfit for its intended purpose.
Broader Term contamination
Narrower Terms cadmium contamination, lead contamination, mercury contamination
Related Terms chemical pollutant, chemical pollution
Deutsch: Chemische Verunreinigung
English (US): chemical contamination
Español: contaminación química
SOURCE General Multi-Lingual Environmental Thesaurus (GEMET)
58
Concept System (Thesaurus)
Chemical
cadmium lead mercury
Biological Radioactive
chemical pollutant
chemical pollution
Contamination
59
Name Acalypha ostryifolia
Mercury Mercury, bis(acetato-.kappa.O)(benzenamine)-
Mercury, (acetato-.kappa.O)phenyl-, mixt. with phenylmercuric propionate
Type Biological Organism
Chemical Chemical Chemical
CAS Number
7439-97-6 63549-47-3 No CAS Number
TSN 28189
ICTV
EPA ID E17113275 E965269
Recent Additions | Contact Us
Environmental Data Registry
Chemicals in EPA Environmental Data Registry
60
Data
Monitoring StationsName Latitude Longitude Location
A 41.45 N 125.99 W Merced Lake
B 43.23 N 120.50 WMerced
River
X 39.45 N 118.12 WFletcher
Creek
ID Date Temp Hg
A 2006-09-13 4.4 4
B 2006-09-13 9.3 2
X 2006-09-15 5.2 3
X 2006-09-13 6.7 78
Measurements
A
BX
Merced Lake
Fletcher CreekMerced River
61
Metadata
System Data Element Definition Units Precision
Measurements ID Monitoring Station Identifier not applicable not applicable
Measurements Date Date sample was collected not applicable not applicable
Measurements Temp Temperature degrees Celcius 0.1
Measurements Hg Mercury contamination micrograms per liter 0.004
Monitoring Stations Name Monitoring Station Identifier
Monitoring Stations Latitude Latitude where sample was taken
Monitoring Stations LongitudeLongitude where sample was taken
Monitoring Stations Location Body of water monitored
Contaminants Contaminant Name of contaminant
Contaminants Threshold Acceptable threshold value
Metadata
ContaminantsContaminant Threshold
mercury 5
lead 42?
cadmium 250?
62
Relations among Inland Bodies of Water
Fletcher Creek
Merced Lake
Merced River
feeds into
feeds intoFletcher Creek Merced Lake
Merced River
fed from feeds into
63
Combining Data, Metadata & Concept Systems
ID Date Temp Hg
A 06-09-13 4.4 4
B 06-09-13 9.3 2
X 06-09-13 6.7 78
Name Datatype Definition Units
ID textMonitoring Station Identifier
not applicable
Date date Date yy-mm-dd
Temp numberTemperature (to 0.1 degree C)
degrees Celcius
Hg numberMercury contamination
micrograms per liter
Inference Search Query:“find water bodies downstream from Fletcher Creek where chemical contamination was over 2 parts per billion between December 2001 and March 2003”
Data
Metadata
Biological Radioactive
Contamination
lead cadmiummercury
Chemical
Concept system
64
Example – Environmental Text Corpus
Idea: Develop an environmental research corpus that could attract R&D efforts. Include the reports and other material from over $1b EPA sponsored research. Prepare the corpus and make it available
Research results from years of ORD R&D Publish associated metadata and concept
systems in XMDR Use open source software for EPA testing
65
Information Extraction & Semantic Computing
Segment
Classify
Associate
Normalize
Deduplicate
Discover patterns
Select models
Fit parameters
Inference
Report results
Actionable Information
Decision Support
ExtractionEngine
11179-3(E3)
XMDR
67
Metadata Registries are Useful
Registered semantics For “training” extraction engines The“Normalize” function can make use of standard
code sets that have mapping between representation forms.
The “Classify” function can interact with pre-established concept systems.
Provenance High precision for proper nouns, less precision
(e.g., 70%) for other concepts -> impacts downstream processing, Need to track precision
68
Data Elements
DZ
BE
CN
DK
EG
FR
. . .
ZW
ISO 3166English Name
ISO 31663-Numeric Code
012
056
156
208
818
250
. . .
716
ISO 31662-Alpha Code
Algeria
Belgium
China
Denmark
Egypt
France
. . .
Zimbabwe
Name:Context:Definition:Unique ID: 4572Value Domain:Maintenance Org.Steward:Classification:Registration Authority:Others
ISO 3166French Name
L`Algérie
Belgique
Chine
Danemark
Egypte
La France
. . .
Zimbabwe
DZA
BEL
CHN
DNK
EGY
FRA
. . .
ZWE
ISO 31663-Alpha Code
Normalize – Need Registered and Mapped Concepts/Code Sets
Algeria
Belgium
China
Denmark
Egypt
France
. . .
Zimbabwe
Name: Country IdentifiersContext:Definition:Unique ID: 5769Conceptual Domain:Maintenance Org.:Steward:Classification:Registration Authority:Others
DataElementConcept
69
Example – 11179-3 (E3) Support Semantic Web Applications
The address state code is “AB”. This can be expressed as a directedGraph e.g., an RDF statement:
Address
AB
State Code
Node
Node
Edge
Subject
Predicate
Object
XMDR may be used to “ground” the Semantics of an RDF Statement.
Graph RDF
70
Example: Grounding RDF nodes and relations: URIs Reference a Metadata Registry
dbA:ma344
“AB”^^ai:StateCode
ai: StateUSPSCode
@prefix dbA: “http:/www.epa.gov/databaseA”@prefix ai: “http://www.epa.gov/edr/sw/AdministeredItem#”
dbA:e0139
ai: MailingAddress
71
Definitions in the EPA Environmental Data Registry
http://www.epa/gov/edr/sw/AdministeredItem#MailingAddress
The exact address where a mail piece is intended to be delivered, including urban-style address, rural route, and PO Box
http://www.epa/gov/edr/sw/AdministeredItem#StateUSPSCode
The U.S. Postal Service (USPS) abbreviation that represents a state or state equivalent for the U.S. or Canada
http://www.epa/gov/edr/sw/AdministeredItem#StateName
The name of the state where mail is delivered
Mailing Address:
State USPS Code:
Mailing Address State Name:
73
Ontologies for Data Mapping
Concept Concept
ConceptConceptGeographic Area
Geographic Sub-Area
Country
Country Identifier
Country Name Country Code
Short Name ISO 31662-Character
Code
ISO 31663- Character
Code
Long Name
DistributorCountry Name
Mailing AddressCountry Name ISO 3166
3-Numeric CodeFIPS Code
Ontologies can help to capture and express semantics
74
Example: Content Mapping Service
Collect data from many sources – files contain data that has the same facts represented by different terms. E.g., one system responds with Danemark, DK, another with DNK, another with 208; map all to Denmark.
XMDR could accept XML files with the data from different code sets and return a result mapped to a single code set.
75
Ecoinformatics: Actions to Manage Semantics
Define, data, concepts, and relationsHarmonize and vet data and concept
systemsGround semantics for RDF, concept
systems, ontologies Provide semantics services
76
Ecoinformatics: Concept System Store
Metadata Registry
Concept System Thesaurus Themes
DataStandards
Ontology GEMET
StructuredMetadata
UsersUsers
Concept systems:KeywordsControlled VocabulariesThesauriTaxonomiesOntologiesAxiomatized Ontologies
(Essentially graphs: node-relation-node + axioms)
}
77
Ecoinformatics: Management of Concept Systems
Metadata Registry
Concept System Thesaurus Themes
DataStandards
Ontology GEMET
StructuredMetadata
UsersUsers
Concept system:RegistrationHarmonization StandardizationAcceptance (vetting)Mapping (correspondences)
}
78
Ecoinformatics: Life Cycle Management
Metadata Registry
Concept System Thesaurus Themes
DataStandards
Ontology GEMET
StructuredMetadata
UsersUsers
Life cycle management:Data andConcept systems(ontologies)
}
79
Ecoinformatics: Grounding Semantics
Metadata Registry
Concept System Thesaurus Themes
DataStandards
Ontology GEMET
StructuredMetadata
UsersUsers
MetadataRegistries Semantic Web
RDF TriplesSubject (node URI)Verb (relation URI)Object (node URI)
Ontologies
80
XMDR Project Collaboration
Collaborative, interagency effort EPA, USGS, NCI, Mayo Clinic, DOD, LBNL …&
othersDraws on and contributes to interagency/International
Cooperation on Ecoinformatics Involves Ecoterm, international, national, state, local
government agencies, other organizations as content providers and potential users
Interacts with many organizations around the world through ISO/IEC standards committees
Only loosely aligned with Ecoinformatics Cooperation
81
XMDR Project
High risk R&D, sponsor expected likelihood of failure Targeted toward leading-edge semantics applications in a
highly strategic environment Conceptualization of new capabilities, creation of
designs (expressed as standards), development of a software architecture and prototype system for demonstrating capabilities and testing designs Reasoning, inference, linkage of concepts to data, ….
Demonstration of fundamental semantic management capabilities for metadata registries, understanding the potential applications that could be built in-house
82
Results to Date
Completed the first version of designs for next generation metadata registries—expressed as figures in a UML model that is proposed for next edition of the ISO/IEC 11179 standard
Developed XMDR Prototype -- available as open source software
Content loaded in prototype: broad range of traditional metadata and concept systems
Designs and prototype being explored and used in several locations. Potential for facilitating development and sharing of content by wide diversity of users.
Starting the next version of designs, taking on more challenging content and capabilities
83
Status of Project
NSF has funded a three-year project, providing a funding base Strong emphasis on the computer science R&D results and
collaboration with EU and Asia Limited staffing
Proposing further high risk R&D Developing proposals for collaborative efforts to
demonstrate capabilities, especially in the area of water. Opportunity to collaborate with JRC and projects under the
European Commission 7th Framework Program
84
Ecoinformatics Test Bed
Proposed in Brussels in September 2004 Project direction and statement developed
Purpose Research and technical informatics to investigate metadata
management techniques. Practical experiment for testing usability.
Initial Focus Use metadata and semantic technologies for air quality
(transportation) health effects Potential for extension to other areas Need for engaging ongoing operations and/or indicators
Bruce the unready
85
Ecoinformatics Test Bed
Extend original charter to Water Use Water as example content
Metadata, concept systems Look for opportunities to coordinate with
EU projects WISE, EC 7th Framework program
Identify and propose possible demonstrations