eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

79
1 eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration Berkeley, California October 24, 2006 Bruce Bargmeyer, Lawrence Berkley National Laboratory University of California Tel: +1 510-495-2905 [email protected]

description

eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration Berkeley, California October 24, 2006. Bruce Bargmeyer, Lawrence Berkley National Laboratory University of California Tel: +1 510-495-2905 [email protected]. Topics. Challenges to address - PowerPoint PPT Presentation

Transcript of eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

Page 1: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

1

eXtended Metadata Registry (XMDR)

International Ecoinformatics Technical CollaborationBerkeley, California

October 24, 2006

Bruce Bargmeyer, Lawrence Berkley National LaboratoryUniversity of CaliforniaTel: +1 [email protected]

Page 2: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

2

Topics

Challenges to addressA brief tutorial on Semantics and semantic

computingwhere XMDR fits

Semantic computing technologies Traditional Data Administration

XMDR projectTest Bed demonstrations

Page 3: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

3

The Internet Revolution

A world wide web of diverse content: The information glut is nothing new. The access to it is astonishing.

Page 4: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

4

Challenge: Find and process non-explicit data

Analgesic Agent

Non-Narcotic Analgesic

AcetominophenNonsteroidal Antiinflammatory Drug

Analgesic and Antipyretic

DatrilAnacin-3 Tylenol

For example…

Patient data on drugs contains brand names (e.g. Tylenol, Anacin-3, Datril,…);

However, want to study patients taking analgesic agents

Page 5: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

5

Challenge: Specify and compute across Relations, e.g., within a food web in an

Arctic ecosystem

                                        An organism is connected to another organism for which it is a source of food energy and material by an arrow representing the direction of biomass transfer.

Source: http://en.wikipedia.org/wiki/Food_web#Food_web (from SPIRE)

Page 6: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

6

Challenge: Combine Data, Metadata & Concept Systems

ID Date Temp Hg

A 06-09-13 4.4 4

B 06-09-13 9.3 2

X 06-09-13 6.7 78

Name Datatype Definition Units

ID textMonitoring Station Identifier

not applicable

Date date Date yy-mm-dd

Temp numberTemperature (to 0.1 degree C)

degrees Celcius

Hg numberMercury contamination

micrograms per liter

Inference Search Query:“find water bodies downstream from Fletcher Creek where chemical contamination was over 10 micrograms per liter between December 2001 and March 2003”

Data:

Metadata:

Biological Radioactive

Contamination

lead cadmiummercury

Chemical

Concept system:

Page 7: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

8

Challenge: Use data from systems that record the same facts with different terms

Common Content

OASIS/ebXMLRegistries

Common Content

ISO 11179Registries

Common Content

OntologicalRegistries

Common Content

CASE ToolRepositories

Common Content

UDDIRegistries

CountryIdentifier

DataElement

XML Tag

TermHierarchy

Attribute

BusinessSpecification

TableColumn

SoftwareComponentRegistries

Common Content

Common Content

DatabaseCatalogs

BusinessObject

DublinCore

Registries

Common Content

Coverage

Page 8: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

10

Challenge: Draw information together from a broad range of studies, databases, reports, etc.

Page 9: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

11

Challenge: Gain Common Understanding of meaning between Data Creators and Data Users

Users Information systems

Data Creation

UsersUsers

EEA

USGS

DoD

EPAenvironagricultureclimatehuman healthindustrytourismsoilwaterair

123345445670248591308

123345445670248591308

3268082513485038270800002178

3268082513485038270800002178

text data

environagricultureclimatehuman healthindustrytourismsoilwaterair

123345445670248591308

123345445670248591308

3268082513485038270800002178

3268082513485038270800002178

text

ambienteagriculturatiemposalud hunanoindustriaturismotierraaguaaero

123345445670248591308

123345445670248591308

3268082513485038270800002178

3268082513485038270800002178

text data

data

environagricultureclimatehuman healthindustrytourismsoilwaterair

123345445670248591308

123345445670248591308

3268082513485038270800002178

3268082513485038270800002178

text data

Others . . .

ambienteagriculturatiemposalud hunoindustriaturismotierraaguaaero

123345445670248591308

123345445670248591308

3268082513485038

3268082513485038270800002178

text data

A common interpretation of what the data represents

Page 10: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

12

Semantic Computing and XMDR

We are laying the foundation to make a quantum leap toward a substantially new way of computing: Semantic Computing

How can we make use of semantic computing for the environment and health?

What do environmental agencies need to do to prepare for and stimulate semantic computing?

What are the ecoinformatics challenges?

Page 11: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

13

Coming: A Semantic Revolution

Searching and rankingPattern analysisKnowledge discoveryQuestion answeringReasoningSemi-automated decision making

Page 12: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

14

The Nub of It

Processing that takes “meaning” into account Processing based on the relations between things

not just computing about the things themselves. Processing that takes people out of the

processing, reducing the human toil Data access, extraction, mapping, translation,

formatting, validation, inferencing, …Delivering higher-level results that are more

helpful for the user’s thought and action

Page 13: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

16

A Brief Tutorial on Semantics

What is meaning?What are concepts?What are relations?What are concept systems?What is “reasoning”?

Page 14: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

17C.K Ogden and I. A. Richards. The Meaning of Meaning.

Thought or Reference (Concept)

Referent Symbol

SymbolisesRefers to

Stands for“Rose”, “ClipArt”

Meaning: The Semiotic Triangle

Page 15: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

18

Semiotic Triangle:Concepts, Definitions and Signs

CONCEPT

Referent

Refers To Symbolizes

Stands For

“Rose”,“ClipArt”

Definition

Sign

Page 16: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

20

Forms of Definitions

CONCEPT

Referent

Refers To Symbolizes

Stands For

“Rose”,“ClipArt”

Definition - Define by: --Essence & Differentia --Relations --Axioms

Sign

Page 17: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

21

Definition of Concept - Rose: Dictionary - Essence & Differentia

1. any of the wild or cultivated, usually prickly-stemmed, pinnate-leaved, showy-flowered shrubs of the genus Rosa. Cf. rose family.

2. any of various related or similar plants. 3. the flower of any such shrub, of a red,

pink, white, or yellow color.--Random House Webster’s Unabridged

Dictionary (2003)

Page 18: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

22

Definitions in the EPA Environmental Data Registry

http://www.epa/gov/edr/sw/AdministeredItem#MailingAddress

The exact address where a mail piece is intended to be delivered, including urban-style address, rural route, and PO Box

http://www.epa/gov/edr/sw/AdministeredItem#StateUSPSCode

The U.S. Postal Service (USPS) abbreviation that represents a state or state equivalent for the U.S. or Canada

http://www.epa/gov/edr/sw/AdministeredItem#StateName

The name of the state where mail is delivered

Mailing Address:

State USPS Code:

Mailing Address State Name:

Page 19: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

23

Definition of Concept - Rose: Relations to Other Concepts

CONCEPT

Referent

Refers To Symbolizes

Stands For

“Rose”,“ClipArt”

LoveRomanceMarriage

Page 20: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

24

SNOMED – Terms Defined by Relations

Page 21: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

25

Definition of Concept - Rose:Defined by Axioms in OWL

CONCEPT

Referent

Refers To Symbolizes

Stands For

“Rose”,“ClipArt”

rdfs:subClassOf owl:equivalentClass owl:disjointWith

Page 22: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

26

Class Axiom (Definitions)Class Description is Building Block of Class

Axiom A class description is the term used in this document (and in the OWL Semantics and

Abstract Syntax) for the basic building blocks of class axioms (informally called class definitions in the Overview and Guide documents). A class description describes an OWL class, either by a class name or by specifying the class extension of an unnamed anonymous class.

OWL distinguishes six types of class descriptions: a class identifier (a URI reference) an exhaustive enumeration of individuals that together form the instances of a class a property restriction the intersection of two or more class descriptions the union of two or more class descriptions the complement of a class description The first type is special in the sense that it describes a class through a class name

(syntactically represented as a URI reference). The other five types of class descriptions describe an anonymous class by placing constraints on the class extension.

Class descriptions of type 2-6 describe, respectively, a class that contains exactly the enumerated individuals (2nd type), a class of all individuals which satisfy a particular property restriction (3rd type), or a class that satisfies boolean combinations of class descriptions (4th, 5th and 6th type). Intersection, union and complement can be respectively seen as the logical AND, OR and NOT operators. The four latter types of class descriptions lead to nested class descriptions and can thus in theory lead to arbitrarily complex class descriptions. In practice, the level of nesting is usually limited.

Page 23: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

27

Class Descriptions -> Class Axiom

Class descriptions form the building blocks for defining classes through class axioms. The simplest form of a class axiom is a class description of type 1, It just states the existence of a class, using owl:Class with a class identifier.

For example, the following class axiom declares the URI reference #Human to be the name of an OWL class: <owl:Class rdf:ID="Human"/> This is correct OWL, but does not tell us very much

about the class Human. Class axioms typically contain additional components that state necessary and/or sufficient characteristics of a class. OWL contains three language constructs for combining class descriptions into class axioms:

rdfs:subClassOf allows one to say that the class extension of a class description is a subset of the class extension of another class description.

owl:equivalentClass allows one to say that a class description has exactly the same class extension as another class description.

owl:disjointWith allows one to say that the class extension of a class description has no members in common with the class extension of another class description.

Page 24: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

28

Computable Meaning

CONCEPT

Referent

Refers To Symbolizes

Stands For

“Rose”,“ClipArt”

rdfs:subClassOf owl:equivalentClass owl:disjointWith

If “rose” is owl:disjointWith “daffodil”, then a computer can determine that anassertion is invalid, if it states that a rose is also a daffodil (e.g., in a knowledgebase).

Page 25: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

29

Fletcher CreekMerced

Lake

WaterBody

What are Relations?

Relation

Merced Lake

Fletcher CreekMerced River

isA isA

Concepts and relations can be represented as nodes and edges in formal graph structures, e.g., “is-a” hierarchies.

Page 26: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

30

A

2

b a c d

1

Nodes represent concepts

Lines (arcs) represent relations

Concept Systems have Nodes and may have Relations

Concept systems can be represented & queried as graphs

Page 27: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

31

A More Complex Concept Graph

From Supervaluation Semantics for an Inland Water Feature OntologyPaulo Santos and Brandon Bennett http://ijcai.org/papers/1187.pdf#search=%22terminology%20water%20ontology%22

Concept lattice of inland water features

Linear LargeNon-linear

Non-linear

Large linear Small linear Small non- linear

Deep Natural

Artificial

River Stream Canal Reservoir Lake Marsh Pond

Flowing Shallow Stagnant

Page 28: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

32

Types of Concept System Graph Structures

Trees Partially Ordered Trees Ordered Trees Faceted Classifications Directed Acyclic Graphs Partially Ordered Graphs Lattices Bipartite Graphs Directed Graphs Cliques Compound Graphs

Page 29: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

33

Tree

Partial Order Tree

Ordered Tree

Faceted ClassificationDirected Acyclic GraphPartial Order Graph

Powerset of 3 element setBipartite Graph Clique

Compound Graph

Types of Concept System Graph Structures

Page 30: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

34

Graph Taxonomy

Directed Graph

Directed Acyclic Graph

Graph

Undirected Graph

Bipartite Graph

Partial Order Graph

Faceted Classification

Clique

Partial Order Tree

Tree

Lattice

Ordered Tree

Note: not all bipartite graphsare undirected.

Page 31: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

35

What Kind of Relations are There?Lots!

Relationship class: A particular type of connection existing between people related to or having dealings with each other.

acquaintanceOf - A person having more than slight or superficial knowledge of this person but short of friendship.

ambivalentOf - A person towards whom this person has mixed feelings or emotions.

ancestorOf - A person who is a descendant of this person. antagonistOf - A person who opposes and contends against this person. apprenticeTo - A person to whom this person serves as a trusted counselor or

teacher. childOf - A person who was given birth to or nurtured and raised by this person. closeFriendOf - A person who shares a close mutual friendship with this person. collaboratesWith - A person who works towards a common goal with this person.

Page 32: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

36

Example of relations in a food web in an Arctic ecosystem

                                        An organism is connected to another organism for which it is a source of food energy and material by an arrow representing the direction of biomass transfer.

Source: http://en.wikipedia.org/wiki/Food_web#Food_web (from SPIRE)

Page 33: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

37

Ontologies are a type of Concept System

Ontology: explicit formal specifications of the terms in the domain and relations among them (Gruber 1993)

An ontology defines a common vocabulary for researchers who need to share information in a domain. It includes machine-interpretable definitions of basic concepts in the domain and relations among them.

Why would someone want to develop an ontology? Some of the reasons are: To share common understanding of the structure of information

among people or software agents To enable reuse of domain knowledge To make domain assumptions explicit To separate domain knowledge from the operational knowledge To analyze domain knowledge

http://www.ksl.stanford.edu/people/dlm/papers/ontology101/ontology101-noy-mcguinness.html

Page 34: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

38

What is Reasoning?Inference

Polio Smallpox

Infectious Disease

Disease

is-a

is-a is-a

is-a

is-a

Diabetes Heart disease

Chronic Disease

is-a

Signifies inferred is-a relationship

Page 35: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

39

Reasoning: Taxonomies & partonomies can be used to support inference queries

Oakland Berkeley

Alameda County

California

part-of

part-of part-of

part-of

part-of

Santa Clara San Jose

Santa Clara County

part-of

E.g., if a database containsinformation on events by city,we could query that database for events that happened in a particular county or state,even though the event data does not contain explicit state or county codes.

Page 36: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

40

Reasoning: Relationship metadata can be used to infer non-explicit data

Analgesic Agent

Non-Narcotic Analgesic

AcetominophenNonsteroidal Antiinflammatory Drug

Analgesic and Antipyretic

DatrilAnacin-3 Tylenol

For example…(1) patient data on drugs currently

being taken contains brand names (e.g. Tylenol, Anacin-3, Datril,…);

(2) concept system connects different drug types and names with one another (via is-a, part-of, etc. relationships);

(3) so… patient data can be linked and searched by inferred terms like “acetominophen” and “analgesic” as well as trade names explicitly stored as text strings in the database

Page 37: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

41

Reasoning: Least Common Ancestor Query

Analgesic and Antipyretic

Analgesic Agent

Non-Narcotic Analgesic

Acetominophen

Opioid

Opiate

Morphine Sulfate

Codeine Phosphate

Nonsteroidal Antiinflammatory Drug

What is the least common ancestor concept in the NCI Thesaurus for Acetominophen and Morphine Sulfate? (answer = Analgesic Agent)

Page 38: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

42

Reasoning: Example “sibling” queries: concepts that share a common ancestor

Environmental:"siblings" of Wetland (in NASA SWEET ontology)

HealthSiblings of ERK1 finds all 700+ other kinase enzymesSiblings of Novastatin finds all other statins

11179 MetadataSibling values in an enumerated value

domain

Page 39: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

43

HealthFind all the siblings of

Breast Neoplasm

EnvironmentalFind all chemicals that are acarcinogen (cause cancer) andtoxin (are poisonous) andterratogenic (cause birth

defects)

Reasoning: More complex “sibling” queries: concepts with multiple ancestors

site neoplasms breast disorders

Breast neoplasm

RespiratorySystem

neoplasm

Non-Neoplastic

Breast Disorder

Eye neoplasm

Page 40: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

44

End of Tutorial about concept systems

Where does ISO/IEC 11179 fit?

Page 41: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

45

Data Generation and UseCost vs. Coordination

Autonomous

Reporting

Community of Interest

Full Control

$

Coordination

DataCreation

Page 42: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

46

Data Generation and UseCost vs. Coordination

Autonomous

Reporting

Community of Interest

Full Control

$

Coordination

DataCreation

DataUse

Page 43: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

47

ISO/IEC 11179 Metadata Registries Reduce Cost of Data Creation and Use

Autonomous

Reporting

Community of Interest

Full Control

$

Coordination

DataCreation

DataUse

Page 44: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

48

Metadata Registries Increase the Benefitfrom Data (Strategic Effectiveness)

Autonomous Reporting

Community of Interest Full Control

Benefit

MDR

Page 45: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

49

What Can ISO/IEC 11179 MDR Do?

Traditional Data Management (11179 Edition 2) Register metadata which describes data—in databases,

applications, XML Schemas, data models, flat files, paper Assist in harmonizing, standardizing, and vetting metadata Assist data engineering Provide a source of well formed data designs for system designers Record reporting requirements Assist data generation, by describing the meaning of data entry

fields and the potential valid values Register provenance information that can be provided to end

users of data Assist with information discovery by pointing to systems where

particular data is maintained.

Page 46: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

50

Data Elements

DZ

BE

CN

DK

EG

FR

. . .

ZW

ISO 3166English Name

ISO 31663-Numeric Code

012

056

156

208

818

250

. . .

716

ISO 31662-Alpha Code

Algeria

Belgium

China

Denmark

Egypt

France

. . .

Zimbabwe

Name:Context:Definition:Unique ID: 4572Value Domain:Maintenance Org.Steward:Classification:Registration Authority:Others

ISO 3166French Name

L`Algérie

Belgique

Chine

Danemark

Egypte

La France

. . .

Zimbabwe

DZA

BEL

CHN

DNK

EGY

FRA

. . .

ZWE

ISO 31663-Alpha Code

Traditional MDR:Manage Code Sets

Algeria

Belgium

China

Denmark

Egypt

France

. . .

Zimbabwe

Name: Country IdentifiersContext:Definition:Unique ID: 5769Conceptual Domain:Maintenance Org.:Steward:Classification:Registration Authority:Others

DataElementConcept

Page 47: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

51

What Can XMDR Do?

Support a new generation of semantic computing Concept system management Harmonizing and vetting concept systems Linkage of concept systems to data Interrelation of multiple concept systems Grounding ontologies and RDF in agreed upon

semantics Reasoning across XMDR content Provision of Semantic Services

Page 48: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

52

Coming: A Semantic Revolution

Autonomous

Reporting

Community of Interest

Full ControlSearching and rankingPattern analysisKnowledge discoveryQuestion answeringReasoningSemi-automated decision making

Page 49: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

53

We are trying to manage semantics in an increasingly complex content space

Structured dataSemi-structured dataUnstructured dataTextPictographicGraphicsMultimediaVoice video

Page 50: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

54

11179-3 (E3) Increases MDR Benefit

Autonomous Reporting

Community of Interest Full Control

Benefit

MDR

When communities create information according to a common vocabulary the value of the resulting information increases dramatically.

Page 51: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

55

Example

Combining Concept Systems, Data, and Metadata to answer queries.

Page 52: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

56

Linking Concepts: Text Document

§ 141.62 Maximum contaminant levelsfor inorganic contaminants.(a) [Reserved](b) The maximum contaminant levelsfor inorganic contaminants specified inparagraphs (b) (2)–(6), (b)(10), and (b)(11)–(16) of this section apply to communitywater systems and non-transient,non-community water systems.The maximum contaminant level specifiedin paragraph (b)(1) of this sectiononly applies to community water systems.The maximum contaminant levelsspecified in (b)(7), (b)(8), and (b)(9)of this section apply to communitywater systems; non-transient, noncommunitywater systems; and transientnon-community water systems.Contaminant MCL (mg/l)(1) Fluoride ............................ 4.0(2) Asbestos .......................... 7 Million Fibers/liter (longerthan 10 μm).(3) Barium .............................. 2(4) Cadmium .......................... 0.005(5) Chromium ......................... 0.1(6) Mercury ............................ 0.002(7) Nitrate ............................... 10 (as Nitrogen)

§ 141.62 40 CFR Ch. I (7–1–02 Edition)

Title 40--Protection of Environment

CHAPTER I--ENVIRONMENTAL PROTECTION AGENCY PART 141--NATIONAL PRIMARY DRINKING WATER REGULATIONS

Page 53: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

57

Thesaurus Concept System(From GEMET)

Chemical Contamination

Definition The addition or presence of chemicals to, or in, another substance to such a degree as to render it unfit for its intended purpose.

Broader Term contamination

Narrower Terms cadmium contamination, lead contamination, mercury contamination

Related Terms chemical pollutant, chemical pollution

Deutsch: Chemische Verunreinigung

English (US): chemical contamination

Español: contaminación química

SOURCE General Multi-Lingual Environmental Thesaurus (GEMET)

Page 54: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

58

Concept System (Thesaurus)

Chemical

cadmium lead mercury

Biological Radioactive

chemical pollutant

chemical pollution

Contamination

Page 55: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

59

Name Acalypha ostryifolia

Mercury Mercury, bis(acetato-.kappa.O)(benzenamine)-

Mercury, (acetato-.kappa.O)phenyl-, mixt. with phenylmercuric propionate

Type Biological Organism

Chemical Chemical Chemical

CAS Number

7439-97-6 63549-47-3 No CAS Number

TSN 28189

ICTV

EPA ID E17113275 E965269

Recent Additions | Contact Us

Environmental Data Registry

Chemicals in EPA Environmental Data Registry

Page 56: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

60

Data

Monitoring StationsName Latitude Longitude Location

A 41.45 N 125.99 W Merced Lake

B 43.23 N 120.50 WMerced

River

X 39.45 N 118.12 WFletcher

Creek

ID Date Temp Hg

A 2006-09-13 4.4 4

B 2006-09-13 9.3 2

X 2006-09-15 5.2 3

X 2006-09-13 6.7 78

Measurements

A

BX

Merced Lake

Fletcher CreekMerced River

Page 57: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

61

Metadata

System Data Element Definition Units Precision

Measurements ID Monitoring Station Identifier not applicable not applicable

Measurements Date Date sample was collected not applicable not applicable

Measurements Temp Temperature degrees Celcius 0.1

Measurements Hg Mercury contamination micrograms per liter 0.004

Monitoring Stations Name Monitoring Station Identifier

Monitoring Stations Latitude Latitude where sample was taken

Monitoring Stations LongitudeLongitude where sample was taken

Monitoring Stations Location Body of water monitored

Contaminants Contaminant Name of contaminant

Contaminants Threshold Acceptable threshold value

Metadata

ContaminantsContaminant Threshold

mercury 5

lead 42?

cadmium 250?

Page 58: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

62

Relations among Inland Bodies of Water

Fletcher Creek

Merced Lake

Merced River

feeds into

feeds intoFletcher Creek Merced Lake

Merced River

fed from feeds into

Page 59: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

63

Combining Data, Metadata & Concept Systems

ID Date Temp Hg

A 06-09-13 4.4 4

B 06-09-13 9.3 2

X 06-09-13 6.7 78

Name Datatype Definition Units

ID textMonitoring Station Identifier

not applicable

Date date Date yy-mm-dd

Temp numberTemperature (to 0.1 degree C)

degrees Celcius

Hg numberMercury contamination

micrograms per liter

Inference Search Query:“find water bodies downstream from Fletcher Creek where chemical contamination was over 2 parts per billion between December 2001 and March 2003”

Data

Metadata

Biological Radioactive

Contamination

lead cadmiummercury

Chemical

Concept system

Page 60: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

64

Example – Environmental Text Corpus

Idea: Develop an environmental research corpus that could attract R&D efforts. Include the reports and other material from over $1b EPA sponsored research. Prepare the corpus and make it available

Research results from years of ORD R&D Publish associated metadata and concept

systems in XMDR Use open source software for EPA testing

Page 61: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

65

Extraction Engines

Find concepts and relations between concepts in text, tables, data, audio, video, …

Produce databases (relational tables, graph structures), and other output

Functions: Segment – find text snippets (boundaries important) Classify – determines database field for text

segment Association – which text segments belong together Normalization – put information into standard form Deduplication – collapse redundant information

Page 62: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

66

Metadata Registries are Useful

Registered semantics For “training” extraction engines The“Normalize” function can make use of standard

code sets that have mapping between representation forms.

The “Classify” function can interact with pre-established concept systems.

Provenance High precision for proper nouns, less precision

(e.g., 70%) for other concepts -> impacts downstream processing, Need to track precision

Page 63: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

67

Data Elements

DZ

BE

CN

DK

EG

FR

. . .

ZW

ISO 3166English Name

ISO 31663-Numeric Code

012

056

156

208

818

250

. . .

716

ISO 31662-Alpha Code

Algeria

Belgium

China

Denmark

Egypt

France

. . .

Zimbabwe

Name:Context:Definition:Unique ID: 4572Value Domain:Maintenance Org.Steward:Classification:Registration Authority:Others

ISO 3166French Name

L`Algérie

Belgique

Chine

Danemark

Egypte

La France

. . .

Zimbabwe

DZA

BEL

CHN

DNK

EGY

FRA

. . .

ZWE

ISO 31663-Alpha Code

Normalize – Need Registered and Mapped Concepts/Code Sets

Algeria

Belgium

China

Denmark

Egypt

France

. . .

Zimbabwe

Name: Country IdentifiersContext:Definition:Unique ID: 5769Conceptual Domain:Maintenance Org.:Steward:Classification:Registration Authority:Others

DataElementConcept

Page 64: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

68

Information Extraction & Semantic Computing

Segment

Classify

Associate

Normalize

Deduplicate

Discover patterns

Select models

Fit parameters

Inference

Report results

Actionable Information

Decision Support

ExtractionEngine

11179-3(E3)

XMDR

Page 65: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

69

Example – 11179-3 (E3) Support Semantic Web Applications

The address state code is “AB”. This can be expressed as a directedGraph e.g., an RDF statement:

Address

AB

State Code

Node

Node

Edge

Subject

Predicate

Object

XMDR may be used to “ground” the Semantics of an RDF Statement.

Graph RDF

Page 66: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

70

Example: Grounding RDF nodes and relations: URIs Reference a Metadata Registry

dbA:ma344

“AB”^^ai:StateCode

ai: StateUSPSCode

@prefix dbA: “http:/www.epa.gov/databaseA”@prefix ai: “http://www.epa.gov/edr/sw/AdministeredItem#”

dbA:e0139

ai: MailingAddress

Page 67: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

71

Definitions in the EPA Environmental Data Registry

http://www.epa/gov/edr/sw/AdministeredItem#MailingAddress

The exact address where a mail piece is intended to be delivered, including urban-style address, rural route, and PO Box

http://www.epa/gov/edr/sw/AdministeredItem#StateUSPSCode

The U.S. Postal Service (USPS) abbreviation that represents a state or state equivalent for the U.S. or Canada

http://www.epa/gov/edr/sw/AdministeredItem#StateName

The name of the state where mail is delivered

Mailing Address:

State USPS Code:

Mailing Address State Name:

Page 68: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

73

Ontologies for Data Mapping

Concept Concept

ConceptConceptGeographic Area

Geographic Sub-Area

Country

Country Identifier

Country Name Country Code

Short Name ISO 31662-Character

Code

ISO 31663- Character

Code

Long Name

DistributorCountry Name

Mailing AddressCountry Name ISO 3166

3-Numeric CodeFIPS Code

Ontologies can help to capture and express semantics

Page 69: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

74

Example: Content Mapping Service

Collect data from many sources – files contain data that has the same facts represented by different terms. E.g., one system responds with Danemark, DK, another with DNK, another with 208; map all to Denmark.

XMDR could accept XML files with the data from different code sets and return a result mapped to a single code set.

Page 70: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

75

Ecoinformatics: Concept System Store

Metadata Registry

Concept System Thesaurus Themes

DataStandards

Ontology GEMET

StructuredMetadata

UsersUsers

Concept systems:KeywordsControlled VocabulariesThesauriTaxonomiesOntologiesAxiomatized Ontologies

(Essentially graphs: node-relation-node + axioms)

}

Page 71: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

76

Ecoinformatics: Management of Concept Systems

Metadata Registry

Concept System Thesaurus Themes

DataStandards

Ontology GEMET

StructuredMetadata

UsersUsers

Concept system:RegistrationHarmonization StandardizationAcceptance (vetting)Mapping (correspondences)

}

Page 72: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

77

Ecoinformatics: Life Cycle Management

Metadata Registry

Concept System Thesaurus Themes

DataStandards

Ontology GEMET

StructuredMetadata

UsersUsers

Life cycle management:Data andConcept systems(ontologies)

}

Page 73: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

78

Ecoinformatics: Grounding Semantics

Metadata Registry

Concept System Thesaurus Themes

DataStandards

Ontology GEMET

StructuredMetadata

UsersUsers

MetadataRegistries Semantic Web

RDF TriplesSubject (node URI)Verb (relation URI)Object (node URI)

Ontologies

Page 74: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

79

XMDR Project Collaboration

Collaborative, interagency effort EPA, USGS, NCI, Mayo Clinic, DOD, LBNL …&

othersDraws on and contributes to interagency/International

Cooperation on Ecoinformatics Involves Ecoterm, international, national, state, local

government agencies, other organizations as content providers and potential users

Interacts with many organizations around the world through ISO/IEC standards committees

Only loosely aligned with Ecoinformatics Cooperation

Page 75: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

80

XMDR Project

High risk R&D, sponsor expected likelihood of failure Targeted toward leading-edge semantics applications in a

highly strategic environment Conceptualization of new capabilities, creation of

designs (expressed as standards), development of a software architecture and prototype system for demonstrating capabilities and testing designs Reasoning, inference, linkage of concepts to data, ….

Demonstration of fundamental semantic management capabilities for metadata registries, understanding the potential applications that could be built in-house

Page 76: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

81

Results to Date

Completed the first version of designs for next generation metadata registries—expressed as figures in a UML model that is proposed for next edition of the ISO/IEC 11179 standard

Developed XMDR Prototype -- available as open source software

Content loaded in prototype: broad range of traditional metadata and concept systems

Designs and prototype being explored and used in several locations. Potential for facilitating development and sharing of content by wide diversity of users.

Starting the next version of designs, taking on more challenging content and capabilities

Page 77: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

82

Status of Project

NSF has funded a three-year project, providing a funding base Strong emphasis on the computer science R&D results and

collaboration with EU and Asia Limited staffing

Proposing further high risk R&D Developing proposals for collaborative efforts to

demonstrate capabilities, especially in the area of water. Opportunity to collaborate with JRC and projects under the

European Commission 7th Framework Program

Page 78: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

83

Ecoinformatics Test Bed

Proposed in Brussels in September 2004 Project direction and statement developed

Purpose Research and technical informatics to investigate metadata

management techniques. Practical experiment for testing usability.

Initial Focus Use metadata and semantic technologies for air quality

(transportation) health effects Potential for extension to other areas Need for engaging ongoing operations and/or indicators

Bruce the unready

Page 79: eXtended Metadata Registry (XMDR) International Ecoinformatics Technical Collaboration

84

Ecoinformatics Test Bed

Extend original charter to Water Use Water as example content

Metadata, concept systems Look for opportunities to coordinate with

EU projects WISE, EC 7th Framework program

Identify and propose possible demonstrations