CEOS IDN Task Team
description
Transcript of CEOS IDN Task Team
CEOS IDN Task Team
May 9, 2002
IDN Agenda – 9 May 2002
IDN Minutes from Darmstadt and IDN Profile are at http://idn.ceos.org
Data Policy IDN Metrics (from GCMD node) IDN Content History
Content Strategies IDN Keywords Authoring Tools
MD8 Status Break
IDN Agenda – 9 May 2002
MD8 Status – continued MD8 Software Waiting List IDN Collaborations Collaborations: Operational Portals IDN’s Use of ZOPE for
Communications MD9 and ISO 19115
Lorant Czaran on ISO 19115 Issues/Concerns
Data Policy
Data Policy Issues Global Change Research Policy Statements from
the Executive Office of the President - OSTP in 1991
U.S. Global Change Research Program requires an early and continuing commitment to the establishment, maintenance, validation, description, accessibility and distribution of high-quality, long-term data sets.
Full and open sharing of the full suite of global data sets for all global change researchers is a fundamental objective.
Preservation of data needed for long-term global change research is required.
Data archives must include easily accessible information about the data holdings, including quality assessments, supporting ancillary information, and guidance and aids for locating and obtaining data.
National and international standards should be used to the greatest extent possible for media and for processing and communication of global data sets.
Data should be provided at the lowest possible cost to global change researchers in the interest of full and open access to data.
For those programs in which selected principal investigators have initial periods of exclusive data use, data should be made openly available as soon as they become widely useful.
Data Policy Issues
National Academy of Sciences (U.S.) National Research Council
U.S. National Committee for CODATA CODATA 2002 - Frontiers of Scientific and
Technical Data (29 September - 3 October) CGED – Dr. Anne Linn
Dr. Bernard Minster, Chairman Upcoming workshop on Carbon Cycle data.
International Policies
Combined NSC, DPC, NEC Climate Change Policy Panel(Program Review)
Committee on Climate Change Science and Technology IntegrationChair: Secretary of Commerce, Vice-Chair, Secretary of Energy
Executive Director: OSTP Director
Interagency Working Group on Climate Change Science and TechnologyChair: Deputy/UnderSecretary of DOE, Vice Chair: Deputy/UnderSecretary of DOC
Secretary: OSTP AD for Climate Science & Technology
Climate Change Science Program OfficeDirector: Commerce Detailee
Climate Change Technology ProgramDepartment of Energy
IDN Metrics (from GCMD node)
Ten Years of DIFs May 1992-March 2002
0
2000
4000
6000
8000
10000
12000
May
-92
Aug
-92
Nov
-92
Feb-
93
May
-93
Aug
-93
Nov
-93
Feb-
94
May
-94
Aug
-94
Nov
-94
Feb-
95
May
-95
Aug
-95
Nov
-95
Feb-
96
May
-96
Aug
-96
Nov
-96
Feb-
97
May
-97
Aug
-97
Nov
-97
Feb-
98
May
-98
Aug
-98
Nov
-98
Feb-
99
May
-99
Aug
-99
Nov
-99
Feb-
00
May
-00
Aug
-00
Nov
-00
Feb-
01
May
-01
Aug
-01
Nov
-01
Feb-
02
Date
# DIF
s
New DIFs
8000
8500
9000
9500
10000
10500
11000
11500
J an-01 Feb-01 Mar-01 Apr-01 May-01 J un-01 J ul-01 Aug-01 Sep-01 Oct-01 Nov-01 Dec-01 J an-02 Feb-02 Mar-02
Date
# D
IFs
DIFs by TOPIC
AGRICULTURE6%
ATMOSPHERE19%
BIOSPHERE16%
CRYOSPHERE3%HUMAN DIMENSIONS
10%
HYDROSPHERE6%
LAND SURFACE12%
OCEANS14%
PALEOCLIMATE1%
RADIANCE OR IMAGERY7%
SUN-EARTH INTERACTIONS1%
SOLID EARTH5%
AGRICULTURE ATMOSPHERE BIOSPHERECRYOSPHERE HUMAN DIMENSIONS HYDROSPHERELAND SURFACE OCEANS PALEOCLIMATERADIANCE OR IMAGERY SUN-EARTH INTERACTIONS SOLID EARTH
GCMD Population by NodeMarch 2002
41%
14%
0%
1%
3%
0%
0%
5%
2%
1%
6%
0%
0%
0%
1%
1%
0%
5%
17%0%
NASA NOAA CNES CIESIN USGSCONAE INPE CCRS ESA/ESRIN DLRUSDA NEONET PNRA RAS NASDAJST CSIRO UNEP/GRID AMD G3OS
0
5000
10000
15000
20000
25000
J an-01 Feb-01 Mar-01 Apr-01 May-01 J un-01 J ul-01 Aug-01 Sep-01 Oct-01 Nov-01 Dec-01 J an-02 Feb-02 Mar-02
Number of unique hosts
Unique Hosts
0
10000
20000
30000
40000
50000
60000
.gov .edu .org .com .net .mil numeric foreign .us
Domain Type
#uni
que h
ost
s
2000 Total
2001 Total
2000-2001 Web Usage by Domain
Agriculture9%
Atmosphere12%
Biosphere7%
Cryosphere3%
Human Dimensions6%
Hydrosphere6%
Land Surface7%
Oceans14%Paleoclimate
3%
Radiance or Imagery6%
Sun-Earth Interactions2%
Solid Earth7%
Location11%
Source 3%
Sensor4%
Agriculture Atmosphere Biosphere Cryosphere Human Dimensions
Hydrosphere Land Surface Oceans Paleoclimate Radiance or Imagery
Sun-Earth Interactions Solid Earth Location Source Sensor
Controlled Keyword Search
0
2000
4000
6000
8000
10000
12000
14000
16000
Agric
ulture
Atmos
pher
e
Bios
phere
Cryos
phere
Human
Dim
ensio
ns
Hydro
sphe
re
Land
Sur
face
Ocean
s
Paleo
climate
Radian
ce or Imag
ery
Sun-Ea
rth In
tera
ction
s
Solid
Ear
th
Parameter Topics
#Searc
hes
Total 2000
Total 2001
2000-2001 Parameter Searches
Total DIF Retrievals
0
10000
20000
30000
40000
50000
60000
70000
80000
J an-01 Feb-01 Mar-01 Apr-01 May-01 J un-01 J ul-01 Aug-01 Sep-01 Oct-01 Nov-01 Dec-01 J an-02 Feb-02 Mar-02
Month
Num
ber
of DIF
s
Total DIF Retrievals
0
10000
20000
30000
40000
50000
60000
70000
80000
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Month
To
tal
DIF
Re
trie
va
ls
2000
2001
2002
Redirects to Other Data
Top redirects from DIFs2001 2000
1. NASA data/web pages 631 2442. NOAA data/web pages 355 3893. EOSDIS DAAC data/web pages 349 2904. USGS data/web pages 329 9725. CDIAC data/web pages 40 906. CCRS/CEONET/GeoConnections 34 n/a7. International data/web pages 118 1938. Other data/web pages (various) 466 2185
0
5000
10000
15000
20000
25000
Nov-00 J an-01 Feb-01 Apr-01 J un-01 J ul-01 Sep-01 Nov-01 Dec-01 Feb-02 Apr-02
month
#uniq
ue u
sers
# Unique Hosts Since Jan 01
0
100000
200000
300000
400000
500000
600000
Nov-00 Jan-01 Feb-01 Apr-01 Jun-01 Jul-01 Sep-01 Nov-01 Dec-01 Feb-02 Apr-02
month
#hits
# Web Page Hits Since Jan 01
Decline in Usage?
GCMD web usage has tended to be flat over the past year.
Prior to 9-11, usage was showing a +1.6% increase for the year.
Since 9-11, usage has declined. Overall GCMD usage has declined by 3% from the past year.
Numeric domains have increased by 18% over 2000, but .gov domains have declined by almost 47% since 2000.
FGDC Clearinghouse changed filtering of Isite queries.
Is decline due to increased information available (information saturation), 9-11, decline in interest on climate issues, other factors?
Decline in Usage?
Web page hits have increased since Jan 2001, while # unique hosts has decreased. Possible reasons:
Domain contraction: more users on fewer hosting domains. AOL has 13.58% of global ISP market; more gov agencies using single domain (e.g., usgs.gov).
More users behind firewalls. More hits are by robots? (we block them from
DIFs but not web pages). Fewer users are making more hits.
Who Links to the GCMD?
GCMD is #1 on Google search for “global change”Week of April 1, 2002
Top 10 sites that link to GCMD (from Google) PODAAC NSIDC WWW Virtual Library – Meteorology AADC Metadata page LBNL Energy Crossroads Climate Change Page WHOI COFDL Laboratory The Weather Pointers Page NOAA/PFEL Yahoo! Environment and Pollution (French) NASADAACS page Quaternary Web Resources (Colby College)
Who Links to the GCMD?
Google’s top 10 sites that link to GCMD (pt 1)(Week of April 15, 2002)
Google-ranked sites that are most often linked with links to GCMD
GES DAAC Direct Links to MODIS Data http://acdisx.gsfc.nasa.gov/data/dataset/MODIS/nofrills.html
GES DAAC MODIS Overview http://daac.gsfc.nasa.gov/MODIS/overview.shtml
BakerHughes Industry Links-Labs, Research, Gov http://www.bakerhughes.com/bakerhughes/resources/labs.htm
PCI Geomatics Industry Links http://www.pcigeomatics.com/corpinfo/ind_links.html
VGL Data Links http://www.umich.edu/~vgl/booksdata/data.html
SeaWiFS Evaluation Products http://daac.gsfc.nasa.gov/data/dataset/SEAWIFS/06_New_Products/
Who Links to the GCMD? Google’s top 10 sites that link to GCMD (pt 2)
DAAC Alliance Products&Services Page http://nasadaacs.eos.nasa.gov/data/path12.html
Harvard University Environment and Sustainable Development http://www.cid.harvard.edu/esd/esdlinks/esdlinks.html
Blackwell Publishers - Geospatial Datasets http://www.blackwellpublishers.co.uk/geog/data.asp
RPI/Rensselaer Research Libraries http://www.lib.rpi.edu
/dept/library/html/resources/subjects/science/earth.html RSMAS Library Internet Resources
http://www.rsmas.miami.edu/support/lib/library_links.html
IDN Content History
Content Strategies
Past Content Struggle
Non-existent to poor authoring tools. Inadequate operations facility for
interacting with the database. Science Coordinators had to gather all
data set info through intensive, laborious process.
Little interest or cooperation by data centers or data set producers.
Result: Prior to 1994 - 3 DIFs/month/coordinator were written.
Present Content Strategy
Make improved authoring tools available. Provide effective operations facility for
validation/loading entries into database. Provide capability to update “on the spot”. Unsolicited entries now arriving - from data set
producers, data center personnel, portal representatives, other international and interagency groups.
(Although still time-consuming to gather all information.)
Result: 35.3 DIFs/coordinator/month (April 2001 - March 2002)
Future Content Strategy
Make further improvements of authoring tools. Further enhance Operations Client and QA
facility for quality control and loading of entries. Provide ownership of entries through portals
and distributed nodes, and thus expect more contributions from partners.
Distribute final validation QA function beyond GCMD node - providing even more sense of ownership and responsibility.
Increase interest - sometimes initially by software developers and later by content providers.
Reasons for MD8 Operations Client – A Client User’s Perspective
One person performed all database administration tasks
Increased interest by partners to write and share metadata
Clumsy text-based interface introduced errors and increased maintenance
How MD8 Has Changed Our Mode of Operation for the Better
Database administration tasks shared by science coordinators.
Decreased time between submission of metadata and its entry into the database.
Allows users to perform tasks that previously required knowledge of command line Oracle SQL.
Eases the process of managing personnel and valids.
Graphical User Interface.
Operations (OPS): Operations (OPS): Loading Metadata RecordsLoading Metadata Records
Operations (OPS): Extracting Operations (OPS): Extracting content content from the database from the database
Content Strategy - Using the QA Ops
You gotta’ know when to hold ‘em, Know when to load ‘em, Know when to walk away, Know when to run.
You never count your DIFs when they’re only in the table...
There’ll be time enough for countin’ when the loadin’s done.
You gotta’ know when to bold ‘em, Know when to fold ‘em, ...
IDN Keywords
Data Center Bucket Revision
Original list created for HCIL Interface. Buckets not adequate for Science
Keyword Interface. Overlapping Buckets Minimal Quality Control of Original Buckets
Science coordinators created new bucket list.
Staff is in the process of matching each Data Center valid to a new bucket.
Data Center Bucket Revision
NEW Buckets Academic Commercial Consortia/Institutions Multinational Non-Government
Agencies Non-US Government US Federal Agencies US State and
Regional Agencies
Old Buckets Commercial DOC DOD DOE DOI EPA Federal Agencies Institutions International International Agencies NASA NOAA Non-Profit Organizations NSF Regional Agencies Universities USDA USGS World Data Centers
Data Center Bucket Revision
US Federal AgenciesDOC NASADOD NSFDOE USDADOI USGS EPADOT
Keyword Changes
Guiding Principles: Follow the Rules!
Earth science parameters are a 4-tier controlled vocabulary for indexing and retrieving metadata.
Parameter hierarchy includes a 5th level uncontrolled “detailed variable”.
CATEGORY > TOPIC > TERM > VARIABLE > detailed variable
Example: EARTH SCIENCE > Solid Earth > Geochemistry > Chemical
Weathering
Keyword Process
Keywords requiring modification can usually be modified through database operations so that all DIFs affected are modified at the same time.
New keywords are simply added to the database and to the list of controlled keywords available in tools and interfaces. Usually manual process to ensure existing DIFs are
indexed with the new keyword.
Summary of Science Keyword Changes Added 54 new Variables and 4 new Terms Modified 39 Variable and 2 Terms Currently: 1199 Variables in GCMD Modified Marine Geophysics and Bathymetry Terms and
Variables Many keywords were not being used or could be re-classified
under better Terms Modified Terrestrial Ecosystem Variables from singular to
plural (e.g., forest to forests) Modified Marine Sediments Variables
Suggested by C. Moore at NOAA/NGDC/MGG Change Term Solar-Terrestrial Interactions to Sun-Earth
Interactions Sun-Earth was more recognizable Term More compatible with home page redesign - took up less “real
estate” in keyword hierarchy.
Keywords Added
Added Marine Biology, Marine Geochemistry, Marine Tectonics, Marine Volcanism , and Sea Surface Topography Terms and Variables to Oceans
Added Land Use/Land Cover Term and Variables to Human Dimensions
Added Geomorphology Term and Variables to Solid Earth
Added Natural Hazards Term and Variables to Human Dimensions
Added Aquatic Habitat and Demersal Habitat Variables to Biosphere
Added Forest Science/Conservation Variables to Biosphere (Canada)
Added Snow Chemistry (NSIDC)
Who Suggested Keyword Changes in 2001?
GCMD Staff EOSDIS DAAC/DAAC Alliance data providers
MSFC/GHRC NSIDC DAAC GSFC DAAC SEDAC ORNL DAAC
ECS Science Office NOAA/NGDC (marine geophysics) Canada/CCRS (forest science) IODE (marine biology, oceans)
Community Usage of GCMD Keywords CEOS Interoperability Protocol (CIP)
uses Category > Topic > Term EOSDIS Data Gateway (EDG)
uses Topic > Term > Variable EOSDIS Core System (ECS)
uses all 5 levels , including detailed variable Other Communities using GCMD Keywords
FGDC (although not required); many agencies using FGDC metadata use GCMD keywords as “theme thesaurus”
Canada and GeoConnections Mercury U. Cal. Natural Reserve System NOAA Semantic web NASA’s Visible Earth (part of Earth Observatory) DODS
Keyword Process
ECS and EDG Notification Policy ECS and EDG are notified of GCMD-
approved science keyword changes prior to implementation
Process gives ECS and EDG time to notify science and data teams as to potential software changes. New keywords added to the GCMD are usually not a problem. Modification of existing keywords is more problematic.
Authoring Tools
Authoring Tools
Current Authoring Tools include: DIFbuilder DIFbuildlet ModDIFbuilder SERFbuilder ModSERFbuilder ESIP DIFbuilder JCADM DIFbuilder
Usage of the Authoring Tools has increased from outside partners (DAACs, GLOBEC and AMD)
DIF Authoring Tools CY01 - Present
DIFBuilder68%
ModDIFBuilder29%
DIFBuildlet3%
DIFBuilder ModDIFBuilder DIFBuildlet
SERF Authoring Tools CY01 - Present
SERFBuilder75%
ModSERFBuilder25%
SERFBuilder ModSERFBuilder
Monthly DIFBuilder Usage
0
50
100
150
200
250
Jan-01 Feb-01 Mar-01 Apr-01 May-01 Jun-01 Jul-01 Aug-01 Sep-01 Oct-01 Nov-01 Dec-01 Jan-02 Feb-02 Mar-02
ModDIF Builder Monthly Usage
0
10
20
30
40
50
60
70
Jan-01 Feb-01 Mar-01 Apr-01 May-01 Jun-01 Jul-01 Aug-01 Sep-01 Oct-01 Nov-01 Dec-01 Jan-02 Feb-02 Mar-02
DIFBuildlet Monthly Usage
0
2
4
6
8
10
12
14
Jan-01 Feb-01 Mar-01 Apr-01 May-01 Jun-01 Jul-01 Aug-01 Sep-01 Oct-01 Nov-01 Dec-01 Jan-02 Feb-02 Mar-02
SERFBuilder Monthly Usage
0
1
2
3
4
5
6
7
8
9
10
Jan-01 Feb-01 Mar-01 Apr-01 May-01 Jun-01 Jul-01 Aug-01 Sep-01 Oct-01 Nov-01 Dec-01 Jan-02 Feb-02 Mar-02
ModSERFBuilder Monthly Usage
0
1
2
3
4
5
6
7
Jan-01 Feb-01 Mar-01 Apr-01 May-01 Jun-01 Jul-01 Aug-01 Sep-01 Oct-01 Nov-01 Dec-01 Jan-02 Feb-02 Mar-02
MD8 Status
3 tiers: Client, Server, and Database
+ Local Database Agents
Status of MD8 Server
Functional Since Late 2001
MDServerProvides a mechanism for remote clients to
interoperate with GCMD using RMI protocol.
Document API Create, modify, and remove Documents
Query API Retrieve documents based on entry identifier, object identifier,or query expression.
Valids API Insert, modify, remove, retrieve valids
Personnel API Insert, modify, remove, retrieve and merge personnel.
Incoming Queue API Insert, modify, retrieve Incoming Items.
Application Programmers Interface (API)
get_entry_ids.py Retrieve a set of entry identifiers given a specified query.
getdif.py Retrieve a DIF given its entry identifer or object identifier.
getdifs.py Retrieve a set of DIFs given a specified query.
get_valids.py Retrieve a set of valids given its type.
getdifs_by_personnel.py
Retrieve a set of personnel given the first, middle, and last name.
Servlets
HTTP Protocol Provides a mechanism for remote clients to
interoperate with GCMD using HTTP protocol
Improvements After Beta
Further restructuring of Operations (OPS) Quality Control (QA GUI) Operations Help written in HTML. Better support for distributed loading Metrics
Improvements After MD8 Active
Load testing revealed: Refinement slower
than expected. Title display slower
than expected. Robots accessing DIFs
and each display in the DIF.
Servlets never die. Tomcat 4.0 performed
poorly under heavy load.
Improvements: Added another level of
caching to the servlets. Improved/optimized
some algorithms. Added META tag to
DIFs. Files, streams, sockets,
etc. MUST be closed. Reverted back to
Tomcat 3.3.
MD8 Database Extension: Local Database Agent
What is the “Local Database Agent” (LDA)
A major component of MD8 that links Earth science databases around the world
Captures content updates to the local database using triggers and shares content
Peer-to-peer connectivity to other databases
Minimal impact to the local DB activities
Reasons for a Local Database Agent
Enable distributed input from CEOS partners Easily share metadata information among nodes Facilitate and manage the metadata population Reduce maintenance related to data exchange
among nodes Build a sense of community among the CEOS
partners by linking them together
GCMD JCADM
UNEP
Data Sharing NetworkData Sharing Network
Tables•Schedule•Incoming Queue•Mirrors Announcer
Scheduler
Local Database Agent
Network
Local DB
New Content
IncomingQueueTable
MD ServerMD Server
Scheduler
Announcer Schedule Table
Trigger Table
Local Database AgentLocal Database Agent
LDAServer
GCMDGCMDNodeNode
DB MD Server
Local Database AgentUNEPUNEPNodeNode
DB MD Server
Local Database AgentJCADMJCADM
NodeNode
DB MD Server
Local Database AgentNASDANASDA
NodeNode
Data Ingest
DIFDIFDBDB
MyDIF.xmlMyDIF.xml
OPSOPS
ResearcherResearcher
CoordinatorCoordinator
DifBuilderDifBuilder
DIF1 GCMD 05/01/02 AcceptedDIF2 UNEP 05/01/02 AcceptedDIF3 JCADM 05/01/02 UnknownDIF4 GCMD 05/01/01 RejectedDIF5 NASDA 05/01/01 Accepted
IncomingIncoming QueueQueueManagerManager
Metadata@gcmd(ProcMail)(ProcMail)
LDALDA
Incoming Items Incoming Items from other LDA’sfrom other LDA’s
NewDIF.xml
Personnel
Parameter Valid
NewDIF2.xml
LDA Announcer/Scheduler
Announcer
Is item from a remote node? YES
DIF sent from UNEP is loaded
Schedule Table
Trigger Table
Incoming Queue
DIF Tables
Incoming QueueManager
Scheduler
Mirror List
Announce to otherLDA's
NO
NASDACSIROJCADMUNEP
Evolution of LDA Architecture Options
Distributed peer-to-peer with auto-commit: All nodes talk to all other nodes
Downside: - Could get into a sticky situation when multiple nodes are down at different times.
- Initialisation and synchronization are very complex. - QA must be performed after propagation to all nodes
GCMD centralized with auto-commit: All updates from other LDA's go to GCMD where they are automatically committed.
Downside: - GCMD is single point of failure. - If GCMD trumps, GCMD will need to update all the nodes
- QA must be performed after propagation to all nodes
GCMD centralized as QA maintainer: All updates from other LDA's go to GCMD where they are validated, QA'd and then broadcasted to the other nodes.
Downside: - GCMD is a single point of failure for propagating content between nodes - Latency in DIF propagation due to QA process.
Advantages of the Final LDA Architecture
Nodes can still update their own system while validation is pending thus maintaining autonomy of their local system.
Most closely models the current mode of GCMD operation.
Cleanest method of quality control from a code/design point of view.
It makes concurrency of updates easier to deal with by limiting possible “race conditions” and deadlock issues because GCMD can now manage these issues.
Evolution of the LDA Design
Replaced Java serialized objects with XML messages Removed the requirement for an auto-commit Improved ease of synchronization Removed InstantDB and replaced with Oracle Improved “componentization” and modularity Improved network fault tolerance by threading the
Announcer All remote database updates now propagate to GCMD Improved node registration process
LDA Test Plan
Create/Read/Update/Delete DIFs, SERFs, and Valids
Boundary conditions Initialization Synchronization
Load testing Load and delete the same DIF before scheduler
runs Merge Personnel Simultaneous loading (2 cases: Remote nodes
and local node) Network/Server Failure Scenarios
OPS-LDA Demo
Update a DIF Load the DIF Watch it propagate to another
node Load the same DIF on the remote
node
MD8 Software Waiting List
MD8 Software Waiting List
In order to prioritize the installation of MD8, an MD8 Waiting List was created.
Sites were divided by MD8-Oracle/MD8-Isite and by resources available at the site.
Current sites installing MD8-Oracle: AADC (JCADM) and UNEP Nairobi.
Current site installing MD8-Isite: CONAE and CNES.
Wait List
Priority IDN Nodes: Antarctic Coordinating Node UNEP/GRID Budapest Asian Coordinating Node - NASDA ESA Coordinating Node Australian Cooperating Node - CSIRO NOAA Cooperating Node Dutch Cooperating Node: NEOnet Argentina’s Cooperating Node: CONAE
Wait List
Priority IDN Nodes (continued): French Cooperating Node (CNES) Brazilian Cooperating Node (INPE) German Cooperating Node (DLR) Canadian Cooperating Node - no request San Diego Supercomputing Center ? Israel’s Cooperating Node Russian Cooperating Node (Space Research
Institute) UNEP/GRID, Nairobi
Wait List
Others Interested: AWI (Alfred Wegener Institute for Polar
and Marine Research: Manfred Reinke), Australian Institute of Marine Science
(AIMS) Goddard’s Data Assimilation Office
(DAO) The EPA (Ross Lunetta at Research
Triangle). Korean Oceanographic Data Center World Data Center, Sydney (Michael
Wang)
IDN Collaborations
Please give your report on your individual Node status.
Collaborations: Operational Portals
Federation of Earth Science Information Partners (ESIP)
GCMD population of ESIP products and services
ESIP Type 1: 1935 DIFs 41 SERFs
ESIP Type 2: 700 DIFs 8 SERFs
ESIP Type 3: 17 DIFs 6 SERFs
Total: 2652 DIFs 55 SERFs %of GCMD 24% 18%holdings
MD Server
LDA
LocalDatabase
DIF Peer
Loader
ObjObj
Obj Obj
ObjDOM
Persistent Component
DIF Component
SERFComponent
SupplementalComponent
XML Colon Parser XML DocBuilder
XMLDIFs
MercuryExtractor
Mercury
GCMD Federation Interoperability
DAAC Alliance Metrics
At request of V. Griffin, GCMD became responsible for tracking publicly-available DAAC products.
GCMD works closely with SPSO to track products.
At end of FY01, there were 1,553 DAAC DIFs representing 1,656 products.
As of 3/31/02, there were 1,573 DAAC DIFs representing 1,676 products (accounting for deletions and replacement DIFs).
World Data Center Portal
Request received from Dave Clark (NOAA/NGDC) to create a World Data Center portal.
Portal prototype was quickly prepared and was presented by Dave Clark at WDC Task team meeting August 2001.
WDC-related DIFs need to be modified and new WDC metadata created with assistance from WDC.
DODS portalhttp://gcmd.gsfc.nasa.gov/Data/portals/dods/
The DODS portal is 1 of 12 portals created as a virtual subset of GCMD’s content.
There are a total of 4 ocean related portals within GCMD: DODS, GLOBEC, GOSIC (GOOS), and RSMAS
Each ocean portal contains a subset of the 3648+ ocean records held in the GCMD database based on their project.
No. of DIFS
DODS Portal:194 DIFs
GLOBEC,GOSIC,RSMAS: 150
All OceanPortals: 344
Total OceanDIFs: 3648
Total OceanDIFs: 3513
DODS Portal: Usage StatisticsGCMD Recently provided a means of tracking statistics for the DODS portal by enhancing the link used to retrieve datasets
• Currently statistics show increasing usage of the portal
• DODS provides a link to the DODS Portal from their website (http://www.unidata.ucar.edu/packages/dods/index.html)
DODS Portal
DODS portal users can search for data via the Keyword Search
or the Free text search
http://gcmd.gsfc.nasa.gov/Data/portals/dods/http://gcmd.gsfc.nasa.gov/Data/portals/dods/freetext/ft_search.html
Within the keyword search results page, an abbreviated form of the URL_Content_Type follows the title of each dataset in the DODS portal.
http://gcmd.nasa.gov/Data/portals/dods/
The DODS portal uses a similar method to identify the URLs within the DODS data set menu list to ensure that users will be able to locate data set content using either website.
DODS Portal: Search DODS Portal: Search resultsresults
http://www.unidata.ucar.edu/cgi-bin/dods/datasets/datasets.cgi?xmlfilename=datasets.xml
DODS/GCMD Future
Collaborate with DODS in their effort to create a new client application that provides an interface using the GCMD servlets.
Continue to populate the DODS portal with new datasets.
Encourage the DODS community to write new datasets descriptions through the use of a specialized DIFBuilder tool. Goddard DAAC recently expressed interest in
populating the DODS portal
RSMAShttp://gcmd.nasa.gov/Data/portals/rsmas/
University of Miami’s Rosenstiel School of Marine & Atmospheric Science is currently working with GCMD to create new records
Currently there are 25 records within this portal
Future plans have been discussed with RSMAS to incorporate all ocean related data sets within the GCMD.
GLOBEC PortalGLOBEC Portalhttp://gcmd.gsfc.nasa.gov/Data/portals/globec/
GLOBEC Portal home page GLOBEC Portal Enhanced Search page
GLOBEC PortalGLOBEC Portal
GLOBEC data managers use the GCMD as “a secure way of preserving a record of the results and achievements of the GLOBEC program”
GLOBEC’s data policy includes the adoption of the DIF format as the recommended format for all data set descriptions.
Over 112 records have been loaded into the GLOBEC portal (via DIFBuilder, metadata creation tool)
GLOBEC Portal: Usage StatisticsGLOBEC Portal: Usage Statistics
2002 MEDI Subcommittee Meeting: Brief Summary
Presentation to MEDI subcommittee by Monica Holland included:
An overview of Metadata Tools reviewed/frequently used by GCMD
(GCMD Builder tools MATT, SMMS, and MEDI)
GCMD Ocean keywords and Body of Water Location keywords
Review of GCMD Contributions to MEDI since 1st Meeting (Oostende, Belgium)
MEDI Software Tool Evaluation
GCMD MD8 Portals (Ocean related portals)
Presentation by Lola Olsen included: Current Status of MD8
ISO field requirements
Future GCMD plans: LDA, Zope, MD9
Potential collaboration from MEDI subcommittee member from KODC
Requested GCMD presentation(s) and introduction slides about GCMD for a development meeting for Korea Ocean Science Information Inventory System (KOSI).
Received a request for ocean related valids from Greg Reed (MEDI
Chairman) Source, Sensor, Project, Locations, Keywords, and Data Centers
Received 3 updated NOAA Supplementals (D.Collins–National Oceanographic Data Center, MEDI subcommittee member)
Also noted during the meeting Decided to add new keyword to Oceans: EARTH SCIENCE > Oceans > Agricultural Aquatic Sciences >
Aquaculture Noticed GCMD sensor valids should be reviewed
Additional information about MEDI available online: http://ioc.unesco.org/iode/contents.
php?id=24
2002 MEDI Subcommittee Meeting: Feedback
Global Observing System (G3OS) and CLIVAR
Created portal for each component of G3OS Global Ocean Observing System (GOOS) Global Terrestrial Observing System (GTOS) Global Climate Observing System (GCOS)
Free-text G3OS search has option to search across all G3OS components through GOSIC portal.
CLIVAR portal created at request of K. Bouton in anticipation of additional data sets.
Portal Experiences
Partners want customized keyword interfaces that only show keywords relevant to their discipline.
Some partners have need for specialized map projections such as polar or orthographic projections.
Some partners have expressed need for an option to search the entire GCMD database (e.g. a “toggle” option was added for GLOBEC portal).
BRD Collaboration
Funded since 1996 to create metadata– 1996-1999 - National Biological
Information Infrastructure (NBII)– 1999-present - Biological Data Profile– Assist in sessions to train future metadata
creators such as at Smithsonian and NASA– Help scientists create their own metadata or
do it for them (Biological Resources Division, National Park Service, The Nature Conservancy)
– Put metadata into DIF format
BRD Collaboration
16 US EPA Columbia River Basin Biota Database Abstracts
16 US EPA Columbia River Basin Sediment Database Abstracts
1 Prediction of Thistle Infected Areas in Badlands National Park using a GIS model
30 Oregon District datasets from
http://oregon.usgs.gov/pubs_dir/online_list.htm
BRD Collaboration
8 Databases including Global Invasive Species Database
18 Species 2000 databases 16 Expert Taxonomic Identification CDs
16 Food and Agriculture Organization databases
7 BRD datasets from archive of funded projects 1 Design and Implementation of Metadata for
Indian Fungi
BRD Collaboration
1 Detroit River Candidate Sites for Habitat Protection and Remediation
11 TOXNET Cluster of Databases 1 Amphibian Research and Monitoring Initiative:
Lower Mississippi River Basin 1 Multiscale Habitat Evaluation of Amphibians in
the Lower Mississippi River Alluvial Valley
BRD Collaboration
16 Northern Prairie Wildlife Research Center bibliographies and datasets from
http://www.npwrc.usgs.gov/resource/research.htm 15 Bor Forest Island fire ecology datasets 1 E.V. Komarek Fire Ecology Database 16 Patuxent Wildlife Research Center software
products
IDN’s Use of Zope for Communications
What Is ZOPE? ZOPE is an open-source web application
server developed by Digital Creations. Uses DTML – Document Template Markup
Language (server side scripting language). example.
Instead of DTML ZOPE will use ZPT – Zope Page Template. example.
ZOPE objects include DTML documents, DTML Methods, images, files, folders, page templates…
ZOPE Products can be imported to enhance a website.
Zope Management Interface
Interface used to customize Zope (browser).
Interface can be used to: add users, set site permissions, import/export Zope objects from other machine, add/modify/delete objects.
Objects being used for header and footer consistency: standard_html_header and standard_html_footer (DTML), or (ZPT).
Example of Content Management
Interface
Content Management Framework
CMF is a Zope Product used to quickly create websites – ‘portals’.
CMF allows site to have: User log-in, but anonymous user has site
access. Allows site management so that submitted
content from user can be reviewed by manager.
Site customizable with ‘skins’. (demo in ZPT).
Example of Website Using CMF
Example of IDN Demo Site Using CMF
Customization For Website
Use log-in option? Allow anyone to register or hide the Join link?
Layout and colors of IDN homepage and site? Leave default layout?
Use on site? From CMF: News, site search. Other Zope products: portal forum
(ZDiscussion and ZDBase); polls and surveys (PMPSurvey); sitemaps.
IDN CMF Website Users Anonymous user view of site. Logged-in user view of site.
Log-in and go to MyStuff and add a document.
Add a News item. Change preferences to change view of
site. (skins). Logged-in as content manager.
Go to ‘Folder contents’ and click on an object’s title to view the display status. Click on ‘Publish’ link if the object is in “Private” status.
Logged-in User
Content Manager
MD9
Future Challenges
Balance New Development WithInitiating New Nodes Into Distributed Network And Assuring Their Proper Functioning
-Installations of MD8 and LDAs at Nodes.-Test Functionality of LDAs.-Test Scalability of LDAs.
Reasons for an MD9/10
Too Many Data Set Descriptions? No way.Build in additional refinement criteria as population increases
to improve limit result set.
GCMD database now holds over 11,000 entries. Reaching critical mass for effective searching.
Current refinement implementation is OK and is widely used, but…
Need better refinement criteria to: Refine by Temporal Resolution Refine by Geospatial Resolution Refine by multiple keyword.
Reasons for an MD9/10
ISO 19115 (geospatial) and ISO 19119 (services) Metadata
“Core” mandatory ISO 19115 metadata fields mapped to existing DIF fields
ISO MD9 DIFCitation: Title Dataset_Citation: Dataset TitleCitation: Date Dataset_Citation:Dataset_Release_DateDataset language Data_Set_Language*Dataset topic category *N/AAbstract Summary (R)Metadata Contact Personnel: DIF AuthorMetadata date stamp DIF_Creation_Date
Mandatory ISO fields that are optional in DIF
Dataset Release Date Metadata Author Dataset Language Metadata Creation Date
DIF Fields not in ISO
Publication Publication Place Publisher Sensor (Instrument) Source (Platform, like satellite) Minimum/Maximum Altitude and Depth
DIF Fields not in ISO
Temporal Resolution Project (Campaign) Originating Center Data_Center_URL
DIF fields not in ISO
Multimedia Sample URL Multimedia Caption Related_URL IDN_Node DIF Revision History Future_DIF_Review_Date
ISO 19119 - Services Metadata
Not a ISO international standard - document still in review (as DIS, Draft International Standard).
Many of the same required fields for ISO 19115.
ISO MD9 SERFService Type Service_Citation: TitleService reference date Service_citation:Release_dateService language Service_languageProvider Name Service ProviderService Contact personnel: SERF Author, Technical ContactDistributed Computing Platforms Use Constraints
Reasons for an MD9/10
Population, Accuracy, and Currency DIFs/SERFs: Improved authoring tools will lower the
barrier for creation by external users. DIFs/SERFs/Supplementals.
Supplemental descriptions: Needs update capabilities within display. Widely used, but needs attention in population, accuracy and currency.
Nodes: Need local/stand-alone customized tools Earth science links: Access to links need improvement;
add Thunderstone search; improve categories.
Increase Population: DOCbuilder
Feature Reasoning Use object-oriented Code reuse. architecture.
Rewrite current Perl Platform independence and code in Java/Jython. maintenance reduction.
Support XML, but make Extensibility and easier for transparent to user. information exchange (transportability).
Create three versions: Support multiple environments
Stand-alone application, where such a tool could be Web application, Java applet used.
Increase Population: DOCbuilder
Feature Reasoning Integrate with MD8 Code reuse, added components (eg., validator). functionality.
Support multiple document Code reuse, flexibility. types (DIF, SERF, Supp),as well as different look and feels (DIF, ISO, FGDC, etc.).
Allow for easy customization in Tight integration with terms of look and feel. Portals.
Reasons for an MD9/10
Moving from DTD to XML Schema Defines the legal building blocks of an XML
document. Reasons for replacing DTD with XML Schema:
Written in XML, allowing the use of tools like DOM and XSL Extensible to future additions Supports more data types (comparable to those in databases,
programming languages) Specifies occurrences and requirements more precisely Supports namespaces (can include >1 schema in XML doc) Specifies the model of the document more closely to the actual
representation DIF Schema already written; however, not yet implemented.
Reasons for an MD9/10
Improved Geographic Search
Use SOAP offerings? Clients can make request for service.
Use MEDI’s SVG tool? The SVG tool is customized as a part of the
MEDI package that is compatible with the DIF metadata format.
Use Polar Projection Search Applets? Modify existing code from Global Land
Information System (USGS/GLIS) to meet our requirements.
MEDI Tool SVG Graph
Spatial Polygon types•Box•Polygon•Line•Circle•Point
Scalable Vector Graphics (SVG) is an XML-based language for Web graphics from the World Wide Web Consortium (W3C).
Currently the SVG Adobe plug-in 3.0 is only supported by Internet Explorer (does not function correctly with Netscape)
Polar Projection Search Applets
USGS gave permission to use code
Downloaded tool Unable to install for
effective use In contact with Tool’s
developer but not promising
Start from scratch?
Reasons for an MD9/10
Better Search EnginesAre there better text search engines than Isite? Isite allows only simplified searching compared
to most Internet search engines. Isite allows only AND, OR Boolean operations
that must be explicitly typed in the search box. No advanced features are implemented. Refinements are not possible.
Pros: Isite is freely available. No license problems with distribution as part of MD#. Useful for FGDC participation. Implements Z39.50 protocol.
Reasons for an MD9/10Better Search Engines
Google Search Appliance Hardware/software; costly
Compusult Commercial software. Z39.50 compliant; used by GeoConnections
Blue Angel Commercial software. Z39.50 compliant; used by Mercury
Many search tools are available: http://www.searchtools.com/tools/tools.html XML text search engines: http://www.searchtools.com/info/xml
-resources.html Z39.50 and metaserach engines: http://www.searchtools.com/info/
metasearch.html Issues: How would a COTS free-text search engine affect our
IDN partners? Are the above search engines better than Isite?
Google Search Appliance
Same Effective Algorithm Used for Text Searches
But, How Can It Be Distributed to the Nodes? (Isite is Open Source)
Package Includes Software and Hardware and 2 Years Total Support
Hardware Installation May Present Security Issues Cost: $20K (up to $40K)
Reasons for an MD9/10
Xpath is now a Standard. Xquery Embeds Xpath. Manages 2 levels down. Replace GCMD’s Query Language With XPath
Reasons for an MD9/10
Re-evaluate Parent-Child Implementation
Users would like to get back to parents from children.
Free-text implementation needs to be improved.
Take “Country” Out of Address
Reasons for an MD9/10
Explore better ways to combine free-text and controlled keyword searches.
Currently, users can only search using free-text or controlled keywords from the home page - not both.
Users can combine free-text with a TOPIC search (e.g. free-text and “ATMOSPHERE”) - but users cannot combine or refine VARIABLE queries by free-text.
Reasons for an MD9
Free-text enhancements Free-text searches cannot retrieve parent DIF/SERF
when a child DIF/SERF is found (the Parent_DIF or Parent_SERF field is not linked).
Cannot navigate through DIF/SERF display returned through free-text (as can be done in keyword search.)
Fields within DIF/SERF (e.g., Parameters) are not linked when retrieved through free-text like they are in keyword search.
Reasons for an MD9/10Direct Access to Data and Resources
Web services - the programmatic interfaces made available for application to application communication.
Use SOAP (Simple Object Access Protocol) to access Web services.
XML/HTTP-based protocol for accessing services, objects and servers in a platform-independent manner.
Allows clients to make requests to services. Libraries available for many programming
languages. GCMD applications can work in conjunction with
Web services to gain additional functionality (ex: get a lat/long bounding box from a country name).
GCMD can be a Web service in its own right.
OverviewAgglomeration of tools and
services held and operated by CEOS organisations
WGISS Data & Information Products
• There is a set of tools and services (e.g. IDN, DIAL)
WGISS MenuOpen reference list of WGISS
products & services, updated on the basisof the results of the test facility,
linked on-line to the products and services themselves
Products& services
• These are being collected into a “WGISS Menu” which will offer a single place to find them.
• A partnership of programs/projects will be able to draw from this menu
PartnershipDisaster
Management
Oceans
GOFC
Define the data & information
systems and services requirements
Projects
• A Nakodo process will allow the creation of a WGISS Test Facility for the projects.
Disaster ManagementOceans
Nakodo
GOFC WGISS Test FacilitySelected WGISS products made
into a coherent, open, modular system
by the partners to address each project’s needs (can include tools
dev’d by the project), dev’d & tested against the project’s
requirements
• The Test Facility can use Products and Services from the WGISS Menu and from the Projects. And can feed back to the Projects!
Products& services
Products & services
• An improved set of Products and Services should be one result of this process.
Based on the test facility results
ImprovedProducts & Services
Long term operationNon-WGISS requirements
Tools, data
Links to data
suppliers
• Projects and the WTF also have links to the “outside”. E.g. WTF can be deployed for long term operation.
• The cycle is completed as these Products and Services get taken up by WGISS into the WGISS Menu.A Proposed WGISS Test Environment
Version Sept 12, 2000
Capitalize on Projects Winning External Funding with Proposals Based on the GCMD
Thesaurus Integration Semantic Web
Event - date - place
Docum ent -title
Journal - nam e - volum eProceedings
Publication - publishing_house - year
Conference - nam e
Artic le - pages
Com m unication - section - pages
Book
Editor
Author
Person - nam e - em ail - affiliationl - s ite
has_editor
authored_by
author_of
contains
points_to
is_presented_in
from
appeared_in
is_published_in
Reference
GCMD Keywords and the Semantic Web
• GCMD keywords to be used as a basis for developing an ontology for the Earth science disciplines.
SWEET Architecture
XML
<...></...><...>
</...>
RDF/ DAML+OIL
dataset3dataset2
dataset1
granu le300granu le200granu le100
winds
el nino
hurricaneMaria
O nto logies
A nnota tedweb pages
D ata R eposito ries
O nto logy T oo ls
U ser In te rfaces
U sers
A gent T oo ls
C ase S tudy: P O D A A C
C ase S tudy:G S FCG E S D A A C
Issues/Concerns
Issues/Concerns
Requesting Newsletter articles for next meeting Latest newsletter was sent to IDN April 19,
2002. Articles included:
UWG meeting Next CEOS meeting AADC Node status synchronization with Catalogue Interoperability
Protocol (CIP) MD8 Operations Client proposed MD9 Write-A-DIF
Add New Fields Metadata Limits