Metadata for the Web From Discovery to Description

44
Cornell CS 502 Metadata for the Web From Discovery to Description CS 502 – 20020226 Carl Lagoze – Cornell University

description

Metadata for the Web From Discovery to Description. CS 502 – 20020226 Carl Lagoze – Cornell University. Co-existing Cost/Functionality Levels. Greater Functionality & Cost. Dublin Core Qualifiers. From fuzzy buckets to more specific description Model of “graceful degradation” - PowerPoint PPT Presentation

Transcript of Metadata for the Web From Discovery to Description

Page 1: Metadata for the Web From Discovery to Description

Cornell CS 502

Metadata for the WebFrom Discovery to Description

CS 502 – 20020226Carl Lagoze – Cornell University

Page 2: Metadata for the Web From Discovery to Description

Cornell CS 502

Co-existing Cost/Functionality Levels

Gre

ate

r Fun

ction

ality

&

Cost

Page 3: Metadata for the Web From Discovery to Description

Cornell CS 502

Dublin Core Qualifiers

• From fuzzy buckets to more specific description

• Model of “graceful degradation”– Support both simplicity and specificity– Intra-domain and inter-domain semantics

Page 4: Metadata for the Web From Discovery to Description

Cornell CS 502

Resource has property

DC:CreatorDC:TitleDC:SubjectDC:Date...

X

implied subject

impliedverb

one of 15properties

property value(an appropriateliteral)

[optional qualifier]

[optional qualifier]

qualifiers(adjectives)

Page 5: Metadata for the Web From Discovery to Description

Cornell CS 502

Varieties of qualifiers: Element Refinements

• Make the meaning of an element narrower or more specific.

• Narrowing implies an is a relationship – a "date created“ is a "date“– an "is part of relation“ is a "relation“

• If your software does not understand the qualifier, you can safely ignore it.

Page 6: Metadata for the Web From Discovery to Description

Cornell CS 502

Varieties of Qualifiers: Value Encoding Schemes

• Says that the value is– a term from a controlled vocabulary (e.g., Library of

Congress Subject Headings)– a string formatted in a standard way (e.g., "2001-05-

02" means May 3, not February 5)

• Even if a scheme is not known by software, the value should be "appropriate" and usable for resource discovery.

Page 7: Metadata for the Web From Discovery to Description

Cornell CS 502

Resource has Date "2000-06-13"Revised

ISO8601

Resource has Subject "Languages -- Grammar"LCSH

Page 8: Metadata for the Web From Discovery to Description

Cornell CS 502

Dumb-Down Principle for Qualifiers

• The fifteen elements should be usable and understandable with or without the qualifiers

• Qualifiers refine meaning (but may be harder to understand)

• Nouns can stand on their own without adjectives

• If your software encounters an unfamiliar qualifier, look it up -- or just ignore it!

• "has a“ relations break the model– E.g., a creator has a hair color

Page 9: Metadata for the Web From Discovery to Description

Cornell CS 502

Resource has Date "2000-06-13"Revised

ISO8601

Resource has Subject "Languages -- Grammar"LCSH

Test for “good““ qualifiers:cover and ask: -- Does the statement still make sense? -- Is it still correct?

Page 10: Metadata for the Web From Discovery to Description

Cornell CS 502

Resource has subjectaudience

Resource has creatoraffiliation

“Incorrect” Qualification

“Cornell University”

“pre-schoolers”

Page 11: Metadata for the Web From Discovery to Description

Cornell CS 502

Open questions in this model

• Are uncontrolled and unconstrained values really useful for discovery?

• Is it possible for an organization (DCMI) to control the evolution of a language?

• How can "simple discovery metadata" be combined with complex descriptions? Is there a notion of graceful degradation?

• Can DC serve as a lingua franca (mapping template) among more complex models

Page 12: Metadata for the Web From Discovery to Description

Cornell CS 502

Models for Deploying Metadata

• Embedded in the resource– low deployment threshold– Limited flexibility, limited model

• Linked to from resource– Using xlink– Is there only one source of metadata?

• Independent resource referencing resource– Model of accessing the object through its surrogate

Page 13: Metadata for the Web From Discovery to Description

Cornell CS 502

Syntax Alternatives:HTML

• Advantages:– Simple Mechanism – META tags embedded in content– Widely deployed tools and knowledge

• Disadvantages– Limited structural richness (won’t support

hierarchical,tree-structured data or entity distinctions).

Page 14: Metadata for the Web From Discovery to Description

Cornell CS 502

Dublin Core in HTML

• http://www.dublincore.org/documents/2000/08/15/dcq-html/

• HTML constructs– <link> to establish pseudo-namespace– <meta> for metadata statements

• name attribute for DC element (DC.element.ER)

• content attribute for element value

• scheme attribute for encoding scheme or controlled vocabulary

• lang attribute for language of element value

Page 15: Metadata for the Web From Discovery to Description

Cornell CS 502

Dublin Core in HTML example

<link rel="schema.DC" href="http://purl.org/dc/elements/1.1"> <meta name="DC.Title" content="Business Unusual”><meta name=“DC.Title” lang=“es” content=“negocio inusual”> <meta name="DC.Creator" content="Carl Lagoze"> <meta name="DC.Subject" content="bibliographic control web cataloging "> <meta name="DC.Date.Created" scheme="W3CDTF"

content="2000-10-23"> <meta name="DC.Format" content="text/html"> <meta name="DC.Identifier" content="http://lcweb.loc.gov/lagoze_paper.html">

Page 16: Metadata for the Web From Discovery to Description

Cornell CS 502

Unqualified Dublin Core in XML

http://www.dublincore.org/documents/2000/11/dcmes-xml/

<?xml version="1.0"?>

<!DOCTYPE rdf:RDF SYSTEM "http://dublincore.org/2000/12/01-dcmes-xml-dtd.dtd">

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"

xmlns:dc="http://purl.org/dc/elements/1.1/">

<rdf:Description rdf:about="http://www.ilrt.bristol.ac.uk/people/cmdjb/">

<dc:title>Dave Beckett's Home Page</dc:title>

<dc:creator>Dave Beckett</dc:creator>

<dc:publisher>ILRT, University of Bristol</dc:publisher>

<dc:date>2000-06-06</dc:date>

</rdf:Description>

</rdf:RDF>

Page 17: Metadata for the Web From Discovery to Description

Cornell CS 502

Example of Dublin Core Use

A map in the United States Library of Congress on-line American Memory Collection

Page 18: Metadata for the Web From Discovery to Description

Cornell CS 502

Title

The name given to the resource

< META name = “DC.Title” content = “Novi Belgii Novæque Angliæ:nec non partis Virginiæ tabula multis in locis emendata ” lang = “la” >

Page 19: Metadata for the Web From Discovery to Description

Cornell CS 502

Creator

An entity primarily responsible for making the content of the resource

< META name = “DC.Creator” content = “Nicolaum Visscher” >

Page 20: Metadata for the Web From Discovery to Description

Cornell CS 502

Subject

The topic of the content of the resource

< META name = “DC.Subject” content = “Middle Atlantic States” scheme = “LCSH”>< META name = “DC.Subject” content = “Maps” scheme = “LCSH”>< META name = “DC.Subject” content = “Early works to 1800” scheme = “LCSH”>

Page 21: Metadata for the Web From Discovery to Description

Cornell CS 502

Description

An account of the content of the description

< META name = “DC.Description.Abstract” content = “An historical map showing the coast of New Jersey as perceived in the seventeenth century”>

Page 22: Metadata for the Web From Discovery to Description

Cornell CS 502

Publisher

An entity responsible for making the resource available

< META name = “DC.Publisher” content = “Library of Congress, United States”>

Page 23: Metadata for the Web From Discovery to Description

Cornell CS 502

Contributor

An entity responsible for making contributions to the content of the resource.

< META name = “DC.Contributor” content = “Historic Urban Plans”>

Page 24: Metadata for the Web From Discovery to Description

Cornell CS 502

Date

A date associated with an event in the lifecycle of the resource

< META name = “DC.Date.Created” content = “1996-04-17” scheme = “W3C-DTF” >

Page 25: Metadata for the Web From Discovery to Description

Cornell CS 502

Type

The nature or genre of the content of the resource

< META name = “DC.Type” content = “image”

scheme = “DCMIType”>

Page 26: Metadata for the Web From Discovery to Description

Cornell CS 502

Format

The physical or digital manifestation of the resource

< META name = “DC.Format.Medium” content = “image/gif” scheme = “IMT”>

< META name = “DC.Format.Extent” content = “556K”>

Page 27: Metadata for the Web From Discovery to Description

Cornell CS 502

Identifier

An unambiguous reference to the resource in the current context

< META name = “DC.Identifier” content = “http://loc.gov/coll1/img456.jpg” scheme = “URI”>

Page 28: Metadata for the Web From Discovery to Description

Cornell CS 502

Source

A reference to a resource from which the present resource is derived.

< META name = “DC.Source” content = “G3715 1685 .V5 1969 (LOC catalog #)” >

Page 29: Metadata for the Web From Discovery to Description

Cornell CS 502

Language

Language of the intellectual content of the object

< META name = “DC.Language” content = “nl”

scheme = “ISO 639-2”>

Page 30: Metadata for the Web From Discovery to Description

Cornell CS 502

Relation

A reference to a related resource

< META name = “DC.Relation.isPartOf” content = “http://lcweb2.loc.gov/ammem/

gmdhtml/dsxpimg.html” scheme = “URI”>

Page 31: Metadata for the Web From Discovery to Description

Cornell CS 502

Coverage

The extent or scope of the content of the resource

< META name = “DC.Coverage.Spatial” content = “New Jersey” scheme = “TGN" >< META name = “DC.Coverage.Temporal” content = “1650” scheme = W3C-DTF”>

Page 32: Metadata for the Web From Discovery to Description

Cornell CS 502

Rights

Information about rights in and over the resource

< META name = “DC.Rights” content = “http://www.loc.gov/ rights_statement.htm”>

Page 33: Metadata for the Web From Discovery to Description

Cornell CS 502

Distributed ContentThe Metadata Challenge

• From fixed, contained physical artifacts to fluid, distributed digital objects

• Need for basis of trust and authenticity in network environment

• Decentralization and specialization of resource description and need for mapping formalisms

Page 34: Metadata for the Web From Discovery to Description

Cornell CS 502

Multi-entity nature of object description

Photographer

Camera type Software

Computer artist

Page 35: Metadata for the Web From Discovery to Description

Cornell CS 502

Understanding Metadata based on Query Capabilities

• Simple boolean tags?– Creator=“Tom Baker” and “Title” contains “Dublin

Core”

• Agent, time, place questions?– Who was responsible for what and when and where

Page 36: Metadata for the Web From Discovery to Description

Cornell CS 502

Attribute/Value approaches to metadata…

Hamlet has a creator Shakespeare

subject implied verb metadata noun literal

Play

wrig

ht

metadata adjective

The playwright of Hamlet was Shakespeare

R1

“Shakespeare”

“Hamlet”

dc:creator.playwright

dc:title

Page 37: Metadata for the Web From Discovery to Description

Cornell CS 502

…run into problems for richer descriptions…

Hamlet has a creator Stratford

birt

hpla

ce

The playwright of Hamlet was Shakespeare,who was born in Stratford

“Stratford”R1

“Shakespeare”dc:creator.playwright

dc:creator.birthplace

Page 38: Metadata for the Web From Discovery to Description

Cornell CS 502

…because of their failure to model entity distinctions

R1

“Stratford”

creatorR2

name “Shakespeare”

birthplacetitle

“Hamlet”

Page 39: Metadata for the Web From Discovery to Description

Cornell CS 502

Applying a Model-Centric Approach

• Formally define common entities and relationships underlying multiple metadata vocabularies

• Describe them (and their inter-relationships) in a simple logical model

• Provide the framework for extending these common semantics to domain and application-specific metadata vocabularies.

Page 40: Metadata for the Web From Discovery to Description

Cornell CS 502

Events are key to understanding metadata relationships?

• Modeling implied events as first-class objects provides attachment points for common entities – e.g., agents, contexts (times & places), roles.

• Clarifying attachment points facilitates understanding and querying “who was responsible for what when”.

Page 41: Metadata for the Web From Discovery to Description

Cornell CS 502

ABC/Harmony Event-aware metadata ontology• Recognizing inherent lifecycle aspects of

description (esp. of digital content)• Modeling incorporates time (events and

situations) as first-class objects– Supplies clear attachment points for agents, roles,

existential properties

• Resource description as a “story-telling” activity

Page 42: Metadata for the Web From Discovery to Description

Cornell CS 502

Resource-centric Metadata

Title Anna Karenina

Author Leo Tolstoy

Illustrator Orest Vereisky

Translator Margaret Wettlin

Date Created 1877

Date Translated 1978

Description Adultery & Depression

Birthplace Moscow

Birthdate 1828

?

Page 43: Metadata for the Web From Discovery to Description

Cornell CS 502

“translator”

“Margaret Wettlin”“Orest Vereisky”

“illustrator”

“Anna Karenina”

“Tragic adultery andthe search for meaningfullove”

“English”

“author”

“creation”

“1877”“1978”

“translation”

“Russian”

“Leo Tolstoy”"Moscow"

“1828”

Page 44: Metadata for the Web From Discovery to Description

Cornell CS 502

Queries over complex descriptive graphs

• Ability to ask questions like “show me all the translations of War and Peace between 1980 and 1990”