Metadata. Generally speaking, metadata are data and information that describe and model data and...

12
Metadata

Transcript of Metadata. Generally speaking, metadata are data and information that describe and model data and...

Page 1: Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.

Metadata

Page 2: Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.

MetadataGenerally speaking, metadata are data and

information that describe and model data and informationFor example, a database schema is the metadata for the

data stored in the database

Metadata also includes data that represent properties and relationships among individual objects (instances) of any type (e.g., tectonic, sedimentary, geochemical)

This kind of metadata is typical of those in an ontology, which is a Semantic Web technology

Page 3: Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.

VocabularyMetadata are used to specify vocabularies for exchanging data

among different people in research groups or between machines

The vocabulary enriches the data so that software can interact with them, and manipulate them

Metadata tell software (algorithm, processors) what to do with the data and how to use them, and are of many kinds Syntactic (how code statements are put together)Structural (how data are structured; e.g., relational, XML, OO, graph)Referent (set of allowable relations or properties connecting objects to

instances, e.g., subclass, part-of, intersection, disjoint)Domain specific

These metadata are in forms that include database schemas, XML documents, UML diagrams, and domain-specific entity hierarchies

Page 4: Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.

Subsumtive/Partitive HierarchyMetadata allow representation of the format and organization of

data (e.g., taxonomy, partonomy), for example:Foliation isA PlanarStructure describes the subsumption

relationship between foliation and planar structure Subsumption is the word for the is-a relation. If B is a kind of A, then we say A subsumes B, and B is subsumed by A.

Mineral partOf Rock describes the meronomic relation between minerals and rocks

This following includes the relationship between various types of data, e.g., AxialPlanarFoliation parallel FoldAxialPlane

Page 5: Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.

EntailmentMetadata also allow us to formally specify and represent our

domain knowledge by describing the information domain (i.e., field, such as geochemistry), thereby helping us to infer implicit statements from explicit statements through inference rules and entailment, e.g.:

PlanarStructure has Strike. If we assert in our ontology that LinearStructure disjointWith PlanarStructure, and Lineation isA LinearStructure, and that LinearStructure has trend, the knowledge can then be used to make inferences about the underlying data

For example, if a structure, such as Foliation, has strike, we can infer that it is a planar structure;if it has trend, we infer that it is a linear structure

Page 6: Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.

Applications of metadataMetadata are used as a tool to describe and model domain

information and knowledge, and can support several useful functionalities such as navigation, browsing, and retrieval of maps, images, and information about a specific geologic feature or phenomenon such as a rock or mineral sampleMetadata will enable knowledge-based decision support and

management systems

The decision support system, when implemented, can be used by the decision makers in geoscience communities, and the knowledge management system will be used by geologists in these communities, trying to figure out the relationship between cross-disciplinary geological facts and phenomena (e.g., mineral reserve and petrology; geochemistry and water quality)

Page 7: Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.

Types of metadataMetadata can describe content-independent information, such as

rock sample number or the date the sample was taken

The URI (Uniform Resource Identifier) associated with a geological resource is another example of this kind of metadata

Content-based metadata, on the other hand, describe the structural information of documents or artifacts, and domain-specific terminology and vocabulary, which capture both intra- and inter-domain relationships among data (i.e., within one field or between different fields, for example within the Geochemistry field, or between Geochemistry and Petrology fields)

While the content-independent metadata describe the format and organization of the underlying data, the domain-specific metadata are the most relevant, and capture information about the domain (e.g., stratigraphy, geochemistry), and are the most useful as far as scientific semantics is concerned

Page 8: Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.

Metadata are commonly developed in isolation, and require intermediary software for interchange, interoperability, and integration

The Semantic Web can help in developing systems that allow efficiently linking and integrating distributed data to anything in a community

Decisions on where to explore for a specific mineral or drill wells for water or oil depend on the accuracy of the data, and how these data (e.g., aquifer and rock type or contaminants) are related to each other

Currently, these data are scattered in publications and unrelated databases and worksheets

Page 9: Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.

OntologyStructured vocabularies define the metadata for specific fields

(domains). The more domain-specific the metadata, the more useful they become to model the domain knowledge

Therefore, the terms in the vocabularies should capture consensual domain terms and interrelationships among these terms

Among the different types of vocabularies, ontologies are at the top of the hierarchy in providing the most useful and complete metadata, hence semantics

Ontology is a formal specification and model of a domain’s knowledge (e.g., knowledge of Geochemistry). It defines the shared vocabulary and the interrelationships that exist among the real individual objects within a specific field or domain of discourse, such as plate tectonics

Page 10: Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.

Metadata FrameworksMetadata frameworks are specifications that allow creating,

manipulating, and querying metadata descriptions, and include those that are XML-, RDF-, and OWL-based (among others)

Each of these frameworks consists of a data model, semantics (applying RDF, RDFS, OWL), serialization format (e.g., XML, N3), and query language (e.g., SPARQL)

The XML-based metadata framework is used to capture both

content (separate from presentation) and metadata, but not semantics

Schema in XML exists with the data as tag namesThis allows the self-describing content to include both data and

metadata

Page 11: Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.

RDFThe RDF-based metadata presentation is based on XML,

and is designed to describe metadata for resources on the Web

RDF uses a subject-predicate-object triple graph formatThe subject and object are resources, which on the web

can be anything said about anything by anyone

RDF triple: Sample analysis Chemistry, means that a specific sample (a resource) has analysis (predicate) given by the chemistry resource (which can be a trace element list of data)

Page 12: Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.

OWLOWL-based metadata framework, which builds on RDF and

RDF Schema (RDFS), allows construction of more complex semantic expressions at the schema and data levels

OWL allows defining class and class membership and properties between classes (e.g., subclass-of, disjoint-from, equivalent)

Among many other constructs, OWL allows defining domain and range for each class

OWL-QL and SPARQL are two query languages for the OWL language.