Scientific Data Discovery with XMC Cat Pushing Back on the Data Deluge:
-
Upload
amos-branch -
Category
Documents
-
view
18 -
download
1
description
Transcript of Scientific Data Discovery with XMC Cat Pushing Back on the Data Deluge:
Scientific Data Discovery with XMC CatPushing Back on the Data Deluge: Advancements in Metadata, Archival and Workflows
Scott Jensen, PhDSenior Researcher
XMC Cat
• Need to capture detailed metadata for discovery and re-use
• Must be able to capture domain-specific metadata– Metadata standards implemented in XML – Adaptable to XML schemata from different scientific communities– Able to communicate results based on community schema
• Detailed data discovery search capabilities
• Describe data products in a broader experiment context
• Capture metadata incrementally and early in the scientific process
– Concept based partitioning of metadata schema– Incremental and asynchronous metadata capture
Standalone Front-end Metadata Catalog to Backend Storage Repositories
FedoraOPeNDAPiRODS
XMC Cat
name resolver
Logical IDData
XMC Cat Workspace
Query Data
DataObjects
Location Transparency
Metadata
XMC Cat:
Scientific Metadata Captured as Concepts
Indentification
Citation
Keywords
.
.
.Theme
Temporal
...
Thesaurus
Theme Keyword
Originator
Publication Date
Publication Time
Title...
Publication InfoPublication Place
Publisher
Larger Work Citation
...
Entity and Attribute Detailed Desc
Entity TypeType Label
Type Definition
Definition Source
Attribute
Thesaurus
Temporal Keyword
Attribute Label
Definition
Definition Source
Domain Values
.
.
.
Distribution Distributor
Standard Order Process
Metadata
Metadata Concepts
Elements Within a Concept
Concepts Enable:• Incremental capture• Detailed discovery• Fast response
metadata
identification spatial data distribution
citation description keywords. . . . . .
. . . . . .
contact . . . order process
...Concepts
Stored as XML
Metadata “Shredded” to Relational Tables
Build ResponseFrom Concepts
ComplexSearch
Schema is Partitioned
Based on Concepts
Query Result Based on Community XML Schema
DomainConfiguration
DomainSchema
ConceptShredding
XML Beans
Detailed Relational Search + XML Concept CLOBsConfigurable to Varied Scientific Domains
Query Interface• Point & click query construction adapts to the user’s
community schema
• User builds query through point & click interface– =can be added
• Strongly typed metadata allows for more precise search criteria
• Results are returned using the community XML schema
Let’s Take a Look ...
Browsing Detailed Metadata
Creating a New Query
Selecting “Experiments” as the Query Target
Selecting Concepts to Query
Specifying the Search Range for Concept Elements
Viewing Details of the Search Results
Point & Click XMC Cat Configuration• Prompts for required schemas, determines dependencies
and builds the necessary XML Bean jars.
• Concepts identified through a point & click interface.– Default pushes concepts down as far as possible– Automatically adjusts other concept definitions– Human readable descriptions can be added
• Wizard-based approach “remembers” your configuration through annotations.
• All configurations saved for future sessions.
• Configuration files are automatically generated and downloadable.
Let’s Take a Look ...
Logging into the XMC Cat Builder
Uploading / Selecting Your Schema
Schema Dependencies Determined
Selecting the Root Metadata Element
Point and Click to Identify Concepts
Selecting Catalog Configuration Options
Downloading the Configuration Files
DataRepository
Science Gateway
Data Management Agent
Archived to the data repository XMC Cat Metadata Catalog
Minimal source metadata is recorded
worker
Post-processing of data registration events
Registration eventsadded to queue
pluginplugin
worker
pluginplugin
Database
dataregistration
event queue
nodenode
node
node
Workflow Nodes Register Data Products
XMC Cat: Incremental and Asynchronous Metadata Capture
Try it Out!
• XMC Cat is available through the D2I Website:
http://pti.iu.edu/d2i/xmccat
(Also check out our other D2I projects!)
• Additional schemata being added as pre-packaged configurations.
• Post-processing plug-ins available.
• If your project has a metadata management need, please contact us:– Scott Jensen [email protected]
– Beth Plale [email protected]