Definition of Metadata
Data about Data
Data that describes, defines or manages data
“Pure” metadata has meaning only in relation to the primary data that is being described.
auto-generated
automatically harvested from the resource
human-created
end user
metadata creator/manager
computer application/program
METADATA MAY BE:
AUDIENCE MAY BE:
Data Model:
o Abstract characterization or “World View” of the data:
-- relationships between objects in the model
-- “living” data—events occur in the lifecycle of each object in the model
--context independent—so that any context can be supported
ENTITIES
Metadata - Educational Objects - Metadata Creators - Users
ATTRIBUTES
Identify, Define Entities
MODEL
Relationships between Entities within a Domain
RELATIONSHIPS
One to one; One to Many ; Parent, child, sibling ; Inheritance
ORGANIZATION’S INFORMATION MODEL
The Structure of Information (IFLA)
Work
Expression Expression
Distinct intellectual or
artistic creation
Intellectual or artistic realization of a work (“interpretation”)
ManifestationManifestation Manifestation
Item
Unique physical
instance of a manifestation.
Physical manifestation of an expression. May differ in physical format, but not in content or interpretation
Intellectual / artistic content
Physical recording of content
Single physical representation of a
recording
A
B
S
T
R
A
C
T
I
O
N
GONE WITH THE WIND
InterpretationNovel MovieScript
WORK
EXPRESSION
MANIFESTATION Paper
HTML
70 MM Film
35 MM Film
DVD
MPEG2
Copy in Blockbuster, Atlanta, GA
24 Reels of film, MGM Archive
ITEM
PP
rr
oo
dd
uu
cc
ee
rr
SIPSIP
Ingest
Descriptive Info
Access/
Dissemination
Archival Storage
DIP
CC
OO
NN
SS
UU
MM
EE
RR
DI
AIP
DI
AIP
OAIS - Reference Model for an Open Archival Information SystemFrom: CCSDS 650.0-R-1: Reference Model for an Open Archival Information System (OAIS). Red Book. Issue 1. May 1999. PDF.Available at: http://ssdoo.gsfc.nasa.gov/nost/isoas/overview.html
OAIS INFORMATION MODEL
Data Model
Record Structure
Repository
Design
Data Element Registration
Database Population
Dissemination to Users
Data interchange
(other repositories)
End - to - End Metadata Implementation
Inside the Digital Information Repository
Persistent Objects:
Manage objects through changes to: hardware, software, players, search&retrieval systems, etc.Persistent Metadata:
Manage metadata through schema and data element versioning changes, new metadata formats, I&R changes, hardware & database migrations.
Key Issue for Preservation
• Authenticity
-- integrity “digital document must be whole and undisturbed”
--provenance – must be tightly associated with its creator and act of creation
Gladney and Bennett. What do we mean by authentic? http://www.dlib.org/dlib/july03/gladney/07gladney.html
In the analog space
Object in hand is compared with a conceptual (“canonical”) historical version
Authenticity
o In the digital space
-- Fidelity to the source artifact
-- Identical (true/false) to the digital
canonical master
--accompanied by a “true” provenance
statement
--Proof: digital signature verifying that canonical object is unchanged. Digital audit trail documenting provenance and any changes to artifact or chain of provenance
Administrative metadata: provenance, fixity, context, reference, and
lifecycle management. Rights MD may be a subset.
Technical Metadata: physical characteristics of the resource. Used to
manage digital preservation and display of resource. May be a subset of Administrative MD. Also called Preservation Metadata
Descriptive Metadata: - information to discover, identify, select and
obtain the resource
Metadata Managing the Resource
Structural metadata: - Information about the structured
relationship between components of a complex object. May be a subset of Administrative MD.
Meta metadata: metadata that describes and manages the
metadata record. Can add “intelligence” to metadata.
Metadata Managing the Resource
Repository design concatenates all types of metadata to support preservation and access to objects in the repository
METADATA SCHEMA COMPONENTS Data Element - Atomic Unit of Meaning- Community Defined
Attribute - Refines, Extends, Interprets data element
Value - Information unique to each data element instance
Constraint - Order imposed on data element expression for consistency; semantic viability
Label - contextual instance of data element name. “How the data element displays on the web for the end user.”
OAIS – Preservation and Access
File Encoding and Transport
METS: Metadata Encoding & Transmission Standard
• XML document format for encoding metadata for resource description and management.
• “wrapper” that concatenates digital object(s) in multiple formats, metadata, a structure map documenting the organization of the digital object(s), as well as behaviors that act upon digital object(s)
• standardized transmission of METS package between repositories and applications
METS:
Metadata Encoding & Transmission Standard
METS Document has seven major sections:
METS Header: minimal descriptive metadata about the METS document itself
Descriptive Metadata: metadata describing the digital object, to enable discovery and evaluation.
Administrative Metadata: metadata about the creation, use and provenance of the digital object(s). Includes four subtypes: technical, source, rights and digital provenance metadata
METS:
Metadata Encoding & Transmission Standard
File Section: Includes one or more <fileGrp> elements, to group together related files, such as the different digital manifestations of a file, e.g.,the uncompressed digital master, mpeg4 and Quicktime access files, for a video title.
Structural Map: Outlines hierarchical structure of a digital object and links the elements of that structure to relevant content files and metadata
METS:
Metadata Encoding & Transmission Standard
Structural Links: Contains a single element, <smLink>. Used to record the existence of hyperlinks between items within the structural map.
Behavior: Used to associate executable behaviors with content within the METS document. For example, a behavior could automatically launch a video player application when a digital video file is selected for display.
FEDORAFEDORABackground:
o “Flexible, Extensible Digital Object Repository Architecture”
o Developed by Cornell University and University of Virginia via a Mellon Foundation Grant.
o Utilizes METS (v 2.0 – FOXML (Interoperable with METS)
http://www.fedora.info/
PREMIS Data Dictionary
o Sponsored by OCLC and RLG
o Defines a “core” set of preservation metadata elements
o Provides a data dictionary supporting the preservation of digital information
PREMIS Data Model
Intellectual
Entities
Objects
Events
Agents
Rights
http://www.oclc.org/research/projects/pmwg/
MPEG-21 Multimedia Framework
oTransparent management and use of digital multimedia resources, from creation through consumption.
o Key concept is the Digital Item Declaration, which includes structure, resources and metadata bundled in the item.
o Repository architecture—LANL’s aDORe—modular digital object repository architecture modeled on MPEG21.
http://public.lanl.gov/herbertv/papers/aDORe_20050128_submission.pdf
MXF: Multimedia Exchange Format`
• “Open file format targeted at the interchange of audiovisual material, with associated data and metadata.”
• Intended to support file interoperability between content creation devices, servers and workstations. Supports integration of file-based and streaming resource formats.
• Maintains the “documentation chain” for metadata about audiovisual essences throughout the resource lifecycle—creation, broadcast, storage, re-use
MXF: Multimedia Exchange Format
Example: Video footage of hurricane activity in the field has automatic GPS, date/time and duration capture as captions on the footage. MXF can maintain the essence and the metadata captured simultaneously by the camera for use in production, archiving and reuse, without the need to “recatalog” the information.
Example: Footage of jaguar hunting in Brazil is captioned in the field, transferred with captions to production facility, where it is packaged into a program, “The Vanishing Rainforest.” Footage is licensed to a travelog production company. Footage of jaguar on the DVD, “This is Brazil” has online attribution to “The Vanishing Rainforest,” from metadata added in production, as well as attribution to the the field cinematographer, location, date and time of capture, from the original captions, with no recreation of metadata.
MXF: Multimedia Exchange Format
Header partition
pack
Header metadata
Essence ContainerFooter
partition pack
File HeaderFile Body
File FooterEvery item in MXF File is KLV (Key Length Value) encoded—identified by a unique 16-byte key and by its length. Anything that is not understood or needed (unrecognized keys) can be ignored and skipped over
MXF: Multimedia Exchange Format
Header Metadata:
• Metadata (DMS-1 or other schema)
• Timing and synchronization parameters
Synchronization and Description of the Essence through three packages:
• Material Package: Output timeline of the file (tracks and sequence)
• File Package: the essence itself
• Source Package: Derivation of the essence (“source film stock” descriptions, etc.
Content IntellectualProperty
Instantiation
Title Creator DateSubject Publisher TypeDescription Contributor FormatSource Rights IdentifierLanguageRelationCoverage
Dublin Core
From “Description of Dublin Core Elements”http://purl.oclc.org/metadata/dublin_core_elements
Every element is optional, repeatable, with rules for format and values
DESCRIPTIVE METADATA SCHEMAS
• Provides a great deal of flexibility.• Easy to learn.• Ensures interoperability with other schemes.• Good transport protocol when expressed as XML
+
-• Lacks support for multiple formats• Lacks support for seriality• Technical description (formats, containers, extent,
etc.) is weak and not standardized.
DUBLIN CORE
PBCore
Intended to address description, preservation and access needs of television, radio, and associated web activities.
Based on Dublin Core—qualifies and expands the 15 Dublin Core data elements.
58 Data Elements (30 mandatory)
V 1.0 available free of charge for use, via the Corporation for Public Broadcasting.
Maps readily to other schema (Dublin Core, MPEG-7, MODS, etc.)
PBCore
• Data elements address descriptive and technical metadata for access and management
• Simple “linear” data model is easy to apply
• Like Dublin Core, does not address issue of “multiple manifestations” (Although both can be used within METS to address this issue).
<FormatFileSize>296 MB </FormatFilesize>
<FormatImageFrameRate>30 fps</FormatImageFrameRate>
<format>296 MB</format>
<format>30 fps</format>
DC “Dumb Down”
PBCore – “Qualified” Dublin Core for DV
Synchronization between content and description
Textual indexing: Creation information, subjects, concepts, media profiles.
Non-textual indexing - melody and speech recognition, color, shape, scene changes, etc.
Textual format/Binary Format completely equivalent. You can use any functionality in textual or nontextual form.
MPEG-7: Multimedia Content Description Interface
Does not support description of analog or textual resources
High-level textual description of component parts (“table of contents”) does not exist.
Some duplication of descriptive information across MPEG7 descriptive schemes
Documentation, examples and widespread adoption as a descriptive metadata standard is weak.
MPEG-7: Multimedia Content Description Interface
MPEG-7
TextualEncoder
MPEG-7
TextualDecoder
Contentdescription
MPEG-7
BinaryEncoder
MPEG-7
BinaryEncoder
Content
Access Unit -Textual Format
Access Unit
BinaryFormat
MPEG-7MPEG-7
MPEG-7 Content Description:Low level Audio Visual descriptors
• Color • Camera motion• Motion activity• Mosaic
• Color • Motion
trajectory• Parametric
motion• Spatio-temporal
shape
• Color • Shape• Position• Texture
Video segments Still regions
Moving regions Audio segments
• Spoken content
• Spectral characterization
• Music: timbre, melody
MPEG-7 Description ToolsDescription Schemes (structure) and Descriptors (features)
Figure 1: Overview of the DSsFigure 2: Overview of the DSs
Datatype &Structures
Link & MediaLocalization
Models
Navigation &Access
Content management
Content description
Collection &Classification
Summaries
Variations
Content organization
Creation &Production
Media Usage
Semanticstructure
Spatio-temporalstructureAspects
User Interaction
UserPreferences
UsageHistory
Roots and Top-level Elements
PackagesSchemaTools
Partitions andDecompositions
Basicelements
Audio and Visualfeatures
Dublin Core vs. MPEG7 – The Challenges
• MPEG7 is a structured, hierarchical schema.
• “Work” described in CreationInformation DS
• Manifestation/Item described in MediaInformation and UsageInformation DSs
•Dublin Core is a “flat” schema that mixes “work” or intellectual content with single manifestation/item description
(“1:1 principle”)
MANIFESTATION in DC and MPEG-7
CREATOR
TITLE
SUBJECT
DATE
IDENTIFIER
FORMAT
RIGHTS
IDENTIFIER
FORMAT
RIGHTS
CreationInformation
MediaProfileUsageAvailability
MediaProfileUsageAvailability
MediaInstance
MODS: Metadata Object Description Schema
• XML representation of MARC21 data, to enable seamless transfer of MARC data to XML.
• Enables both original description of digital and analog resources and mapping of legacy metadata in MARC to MODS
• MODS is represented in application profiles for METS Descriptive MD and OAI-PMH for data sharing and transport
MXF DMS-1Material Exchange Format – Descriptive Metadata Scheme-1 (SMPTE 380M-2004)
• Utilizes SMPTE RP 210 –Metadata Dictionary Registry of Metadata Element Descriptions
• Data model and core rules are taken from AAF, so that DMS-1 can be seen as an Application of AAF.
• Utilizes a collection of descriptive metadata frameworks.
• Supports migration of DM from one MXF file to another when essence is migrated or reused.
MXF DMS-1
Frameworks: “grouping of related descriptive metadata properties and sets, which describe the contents of an MXF file body.”
• Production framework: “provide[s] identification and ownership details of the audio-visual content in the file body.” “Applies to the complete input or output of the MXF file as a whole.”
• Clip framework: “provide[s] capture and creation information about the individual “audio-visual” clips in the file body. “A ‘clip’ is a continuous essence element, or essence element interleave, in the essence container.
MXF DMS-1
Scene framework: “describe[s] actions and events within individual scenes of the aufio-visual content of the file body.” “Scene is an editorial concept and describes a continuous section of content in an MXF file.”
MXF DMS-1Production framework
Award
Identification
Group Relationship
Branding
Titles
Participant
Metadata Server Locator
Event
Captions Description
Annotation
Setting/Period
Contract
Picture Format
Project
Publication
Annotation
Classification
Cue Words
Related Material Locator
Rights
MXF DMS-1Clip framework
Project
Captions Description
Picture Format
Processing
Titles
Participant
Metadata Server Locator
Annotation
Scripting
Shot
Contract
Device Parameters
Scripting Locator
Cue Words
Related Material Locator
Classification
Cue Words
Key Point
Rights
Name-value
Name-value
MXF DMS-1Scene framework
Setting period
Participant
Contacts List
Titles
Metadata Server Locator
Annotation
Shot
Cue Words
Related Material Locator
Classification
Cue Words
Key Point
Name-Value
Union Catalog
Archive Directory
Education and Outreach Space
Cataloging Utility
Dynamic, contextual portals
Union Catalog
Archive Directory
Education and Outreach Space
Cataloging Utility
Dynamic, contextual portals
Concatenate moving images for preservation and access through:
MIC Organization Directory
• Contact information, home page URL, logo
• Collection descriptions
• Preservation activities
• Cataloging activities
• How to obtain materials
• Administrative information
• Shibboleth Authentication/Authorization
MIC Organization Directory
• Intersects with the Union Catalog for:Intersects with the Union Catalog for:
• pre-selection for union catalog searches
• provide information about the organization, particularly obtaining resources, audience served, location, etc.
OrgID
Org Org DirectoryDirectory
Union Union CatalogCatalog
MIC PORTALS
• Resource and organization descriptions specific and organization descriptions specific to communityto community
• PortalID in both Org Directory and Union PortalID in both Org Directory and Union Catalog retrieves portal-specific informationCatalog retrieves portal-specific information
PortalID
Org Org DirectoryDirectory
Union Union CatalogCatalog
Top Related