2009.09.29 chris poppe - metadata

41
ELIS – Multimedia Lab Metadata - Aanknopingspunten, Prioriteiten, Toekomsperspectieven en Aantekeningen vanuit de Marge Chris Poppe Multimedia Lab Department of Electronics and Information Systems Faculty of Engineering Ghent University

description

Chris Poppe presents current and future metadata trends at a cultural heritage workshop.

Transcript of 2009.09.29 chris poppe - metadata

Page 1: 2009.09.29   chris poppe - metadata

ELIS – Multimedia Lab

Metadata - Aanknopingspunten, Prioriteiten, Toekomsperspectieven

en Aantekeningen vanuit de Marge 

Chris PoppeMultimedia Lab

Department of Electronics and Information SystemsFaculty of Engineering

Ghent University

Page 2: 2009.09.29   chris poppe - metadata

2/39

ELIS – Multimedia Lab

Multimedia Lab

• Multimedia Lab– Research group of Ghent University (Faculty of

Engineering)– Multimedia

• Video!– Coding,– Processing– Transmission– Analysis– Adaptation– Annotation– …

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

Page 3: 2009.09.29   chris poppe - metadata

3/39

ELIS – Multimedia Lab

Outline

• What is metadata?• Metadata vs. Tags?

– Benefits/disadvantages?• What is a metadata standard?

– Why is it needed?– How does it look like?– What are the problems?

• What is the semantic web?– Web 2.0?– Web 3.0?– Semantic Web Technologies?

• Conclusions

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

Page 4: 2009.09.29   chris poppe - metadata

4/39

ELIS – Multimedia Lab

Metadata

• Data describing data• Museum for the history of sciences

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

Page 5: 2009.09.29   chris poppe - metadata

5/39

ELIS – Multimedia Lab

Metadata

• Data describing data• Digital content

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

ResolutionDpiDate/Time createdCreatorCamera usedFile format (JPG, BMP, GIF, PNG, …)Location shot (GPS)CopyrightTitleGenreRatingCommentKeywordsDepicted event…

Page 6: 2009.09.29   chris poppe - metadata

6/39

ELIS – Multimedia Lab

Use of Metadata

• Understanding of multimedia content• Sharing• Management• Retrieval

– Search– browse

• Processing

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

Page 7: 2009.09.29   chris poppe - metadata

7/39

ELIS – Multimedia Lab

Metadata: tags

• Tag– Free text annotation– Keywords, terms, comments– Informally– Personally– Started as taxonomies or vocabularies used to describe

content– Evolved into folksonomies

• User-driven

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

Page 8: 2009.09.29   chris poppe - metadata

8/39

ELIS – Multimedia Lab

Taxonomies

• Top down• Pre-defined structure• Hierarchy• Controlled vocabularies• Expert

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

Page 9: 2009.09.29   chris poppe - metadata

9/39

ELIS – Multimedia Lab

Taxonomies

• Example– Dewey Decimal Classification– Library classification

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

Page 10: 2009.09.29   chris poppe - metadata

10/39

ELIS – Multimedia Lab

Folksonomy

• Folk + taxonomy– Free form text annotation– No predefined structure– No hierarchy– Users add metadata– Flat name space– Bottom up

• Two types:– Broad– Narrow

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

Page 11: 2009.09.29   chris poppe - metadata

11/39

ELIS – Multimedia Lab

Broad Folksonomy

• Tagging shared content• Anyone can participate• Examples

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

Page 12: 2009.09.29   chris poppe - metadata

12/39

ELIS – Multimedia Lab

Narrow Folksonomy

• Tagging your own content• Tagging friend’s content

– No consolidation– No emerging vocabularies

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

Page 13: 2009.09.29   chris poppe - metadata

13/39

ELIS – Multimedia Lab

Tagging usage

• Navigation– Tag clouds– Organization– Hints

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

Page 14: 2009.09.29   chris poppe - metadata

14/39

ELIS – Multimedia Lab

Tagging howto?

• Totally free• Semi-structured• Hinted

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

Page 15: 2009.09.29   chris poppe - metadata

15/39

ELIS – Multimedia Lab

Tagging problems

• Cultural differences: Genghis Kahn, for some a hero, for others a criminal

• Communities of users can give different meaning to tags: Movie vs. Film vs. Cinema

• Language issues• Ambiguity• Misspelled tags (40% Flickr, 28% del.icio.us)• Semantics of tags

– Factual tags: what is it about, what it is: ‘image’, ‘article’, ‘blog’,…

– Subjective tags: user’s opinion: ‘funny’, ‘hot’, ‘stupid’,…– Personal tags: self reference: ‘toread’, ‘mycomments’,

…– Tag: “nothing to do with Brussels”

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

Page 16: 2009.09.29   chris poppe - metadata

16/39

ELIS – Multimedia Lab

Metadata

• Data describing data• Digital content

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

ResolutionDpiDate/Time createdCreatorCamera usedFile format (JPG, BMP, GIF, PNG, …)Location shot (GPS)CopyrightTitleGenreRatingCommentKeywordsDepicted event…

Page 17: 2009.09.29   chris poppe - metadata

17/39

ELIS – Multimedia Lab

MP2 JPEG MPEG-2 MXF JPEG2000 AVI AAC H.264/MPEG-4 AVC PNG

Motion JPEG2000 TIFF MP4 MPEG WAV FLAC VC-1 Ogg Vorbis DivX AIFF GIF JPEG-LS Matroska OGM/OGG Windows Media Audio DIRAC 3GP DV FLV Betacam Realmedia MOV AC-3/Dolby Digital Theora ASF TTA

• Compression and container formats

• Standards for multimedia– Standards for metadata?

Multimedia

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

Video compressionAudio compressionImage compression Physical Containers

Page 18: 2009.09.29   chris poppe - metadata

18/39

ELIS – Multimedia Lab

• Standard which determines the structure of metadata

Metadata Standard

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

ResolutionDpi

Date/Time createdCreator

Camera usedFile format (JPG, BMP, GIF, PNG, …)

Location shot (GPS)Copyright

TitleGenreRating

CommentKeywords

Depicted event…

<?xml version=“1.0” encoding=“UTF-8” ?><mods xmlns=http://www.loc.gov/mods/…<titleInfo> <title>De geruchten</title></titleInfo><name type=“personal”> <namePart>Claus, Hugo</namePart> <namePart type=“date”>1929-</namePart> <role> <text>creator</text> </role></name><typeOfResource>text</typeOfResource><originInfo>… </originInfo>...</mods>

MODSMetadata Object Description Schema

Page 19: 2009.09.29   chris poppe - metadata

19/39

ELIS – Multimedia Lab

XML

• XML (Extensible Markup Language)– Standardized by W3C (World Wide Web Consortium)– Language to define the structure of a document

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

<? xml version="1.0" encoding="UTF-8" ?><!-- Dit is een boekenlijst. -->

<boekenlijst> <boek categorie="thriller"> <titel>Het Bernini Mysterie</titel> <auteur>Dan Brown</auteur> </boek> <boek categorie="woordenboek"> <titel>Van Dale Frans-Nederlands</titel> <auteur /> </boek></boekenlijst>

•XML element•Attribute•values

Page 20: 2009.09.29   chris poppe - metadata

20/39

ELIS – Multimedia Lab

XML Schema

• XML Schema– Uses XML to denote the structure of a document

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

<? xml version="1.0" encoding="UTF-8" ?><!-- Dit is een boekenlijst. -->

<boekenlijst> <boek categorie="thriller"> <titel>Het Bernini Mysterie</titel> <auteur>Dan Brown</auteur> </boek> <boek categorie="woordenboek"> <titel>Van Dale Frans-Nederlands</titel> <auteur /> </boek></boekenlijst>

•XML schema•Elements:

•Boekenlijst•Boek•Titel•Auteur

•Order•Types (of values)

Determines

Page 21: 2009.09.29   chris poppe - metadata

21/39

ELIS – Multimedia Lab

Metadata Standard

• Describe structure of metadata using XML schema

• Textual specification, explains semantics of the elements– titleInfo : “A word, phrase, character, or group of

characters, normally appearing in a resource, that names it or the work contained in it. “

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

<?xml version=“1.0” encoding=“UTF-8” ?><mods xmlns=http://www.loc.gov/mods/…<titleInfo> <title>De geruchten</title></titleInfo><name type=“personal”>…</name><typeOfResource>text</typeOfResource><originInfo>… </originInfo>...</mods>

MODS XML schema

Determines

Page 22: 2009.09.29   chris poppe - metadata

22/39

ELIS – Multimedia Lab

• Shared information uses common structure• Standard software can be used to parse information

Use of Metadata Standards

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

<?xml version=“1.0” encoding=“UTF-8” ?><mods xmlns=http://www.loc.gov/mods/…<titleInfo> <title>De geruchten</title></titleInfo><name type=“personal”>…</name><typeOfResource>text</typeOfResource><originInfo>… </originInfo>...</mods>

MODS document

DB

<?xml version=“1.0” encoding=“UTF-8” ?><mods xmlns=http://www.loc.gov/mods/…<titleInfo> <title>De geruchten</title></titleInfo><name type=“personal”>…</name><typeOfResource>text</typeOfResource><originInfo>… </originInfo>...</mods>

MODS document<?xml version=“1.0” encoding=“UTF-8” ?><mods xmlns=http://www.loc.gov/mods/…<titleInfo> <title>De geruchten</title></titleInfo><name type=“personal”>…</name><typeOfResource>text</typeOfResource><originInfo>… </originInfo>...</mods>

MODS document

DBSpeak same language

Page 23: 2009.09.29   chris poppe - metadata

23/39

ELIS – Multimedia Lab

Metadata Standards

• Different Metadata Standards exist!– Different domains– Different communities– Different formats– Different focus

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

Page 24: 2009.09.29   chris poppe - metadata

24/39

ELIS – Multimedia Lab

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Problem Metadata Standards

• Different Metadata standards can describe same thing• But in different way!!!

<object id=“0”> <box xc=“77” yc=“73” w=“21” h=“16”/></object>

Box: “Coordinates of the centre and the dimensions of the bounding box of a detected object in pixels.”

metadata example 1

CVML (Computer Vision Markup Language)

<LLID =“LLID1”><Mask> <BB mp7:dim=“4”>67 65 88 91</BB></Mask> </LLID>

BB: “Coordinates of a rectangular segment.”

metadata example 2

VS7 (Video Surveillance Schema)

Page 25: 2009.09.29   chris poppe - metadata

25/39

ELIS – Multimedia Lab

Problems Metadata Standard

• Current metadata standards define structure of metadata• Mappings are needed to use different standards within one

system• Metadata standard does not solve everything!

– For instance: DC creator property• Creator=“Shakespeare, William”• Creator=“William Shakespeare”• Creator=“Shakespeare”• Creator=“W. Shakespare”

– Same problems as tagging can occur• Lack of ways to describe semantics of metadata

– Currently plain text– Not machine readable

• Multimedia content shifts to online repositoriesMetadata

Chris PoppeLevend Geheugen, Brussels, Belgium – September 29 2009

Page 26: 2009.09.29   chris poppe - metadata

26/39

ELIS – Multimedia Lab

Semantic Web ?.0

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

Page 27: 2009.09.29   chris poppe - metadata

27/39

ELIS – Multimedia Lab

The Syntactic Web

• Consider a typical web page:

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

• Mark-up consists of: – rendering

information (e.g., font size and colour)

– Hyper-links to related content

• Semantic content is accessible to humans but not (easily) to computers…

Page 28: 2009.09.29   chris poppe - metadata

28/39

ELIS – Multimedia Lab

Impossible (?) using the Syntactic Web…

• Complex queries involving background knowledge– Give me the telephone number of the responsible

person within Multimedia Lab of the demo about metadata applications

• Locating information in data repositories– Travel enquiries– Prices of goods and services– Results of human genome experiments

• Finding and using “web services”– Visualize surface interactions between two proteins

• Delegating complex tasks to web “agents”– Book me a holiday next weekend somewhere warm, not

too far away, and where they speak French or English

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

Page 29: 2009.09.29   chris poppe - metadata

29/39

ELIS – Multimedia Lab

Semantic Web Technologies

• Technologies developed by the World Wide Web Consortium (W3C)

• Vision: the Web as universal medium for data, information and knowledge exchange

• HTML, XML -> RDF, RDFS, OWL, …• Technologies to interconnect, exchange information

– Applicable for metadata also!

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

Page 30: 2009.09.29   chris poppe - metadata

30/39

ELIS – Multimedia Lab

Why is XML not enough

• http://www.w3.org/DesignIssues/RDF-XML.html (Tim Berners-lee)

• Try to express “The author of the note is Tim” in XML

• For a person, the three representations means the same, but NOT for a machine!– XML contains structures only, no semantics

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

<author> <uri>note</uri> <name>Tim</name> </author>

<author> <uri>note</uri> <name>Tim</name> </author>

<document href="note"> <author>Tim</author> </document>

<document href="note"> <author>Tim</author> </document>

<document uri="note" author="Tim" /><document uri="note" author="Tim" />

Page 31: 2009.09.29   chris poppe - metadata

31/39

ELIS – Multimedia Lab

RDF

• RDF (Resource Description Framework)• Triples: subject – predicate – object• URI to identify resources• “The author of the note is Tim”

• Serialization in XML:• <rdf:RDF xmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#>

<Note rdf:about=http://www.example.org/#note> <hasAuthor rdf:resource="http://www.example.org/#Tim”/> </Note> </rdf:RDF>

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

Note TimhasAuthor

Page 32: 2009.09.29   chris poppe - metadata

32/39

ELIS – Multimedia Lab

RDFS

• RDF Schema• Standardized vocabulary for describing concepts• Introduces classes and instances

• Subclasses, sub properties– Possible to define hierarchies!

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

Note1

TimhasAuthor

ClassNote

ClassPerso

n

type type

Page 33: 2009.09.29   chris poppe - metadata

33/39

ELIS – Multimedia Lab

OWL

• Web Ontology Language, W3C recommendation (2004)• Provides richer vocabulary• Define advanced relations

– Data typing– Cardinalities– Rich typing of properties– …

• Example:

• Allows for intelligent reasoning• Complex ontologies can be created

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

Note1

TimhasAuthor

ClassNote

ClassPerso

n

type type

isAuthorFrom

<owl:ObjectProperty rdf:ID=“isAuthorFrom”> <owl:inverseOf rdf:resource=“#hasAuthor”></owl:ObjectProperty>

Page 34: 2009.09.29   chris poppe - metadata

34/39

ELIS – Multimedia Lab

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

Ontology

• Information in a domain is structured using an ontology• a data model that represents a set of concepts and relations

amongst these concepts within a specific domain

• Thesaurus– Dictionary

• Synonyms

• Taxonomy– Hierarchy

• Subclass and siblings

• Ontology– concepts– relations

Page 35: 2009.09.29   chris poppe - metadata

35/39

ELIS – Multimedia Lab

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

Ontology (using OWL)

• Example: ontology for domain of science

subClassOf

birth date

DatatypeProperty

PersonClass: Person

Class: ScientistScientist

Individualbirth date

“14/10/1801”

OWL constructs• Class• DatatypeProperty• subClassOf• Individual• …

“Joseph Plateau”

Page 36: 2009.09.29   chris poppe - metadata

36/39

ELIS – Multimedia Lab

Semantic Web Technologies

• SPARQL Protocol And RDF Query Language (SPARQL)– SQL-like language for RDF– Example: search for all the notes of Tim

• SELECT ?x WHERE ?x hasAuthor Tim

• Rule Interchange Language (RIF)– Example rule: if Tim is the author of the note, he is also

a contributor– goal is to create an interchange format for different rule

languages and inference engines – closely related to ontologies

• rules combine information and derive new information on top of ontologies

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

Page 37: 2009.09.29   chris poppe - metadata

37/39

ELIS – Multimedia Lab

Semantic Web Technologies

• Data on the web can be linked to each other– Example: ontology on Brussels

• DBpedia.org

– Browsing:• Brussels ->cityofbirth -> Raymon_Goethals ->

managerclubs -> RSC Anderlecht …

– Querying: find all people born in Brussels before 1930– Reasoning: if a person was born in Brussels, he was

also born in Belgium

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

About Brussels.mht

Page 38: 2009.09.29   chris poppe - metadata

38/39

ELIS – Multimedia Lab

Semantic Web Technologies

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

Page 39: 2009.09.29   chris poppe - metadata

39/39

ELIS – Multimedia Lab

Semantic Web ?.0

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

Page 40: 2009.09.29   chris poppe - metadata

40/39

ELIS – Multimedia Lab

Conclusions

• Use metadata standards!– Allows interchange– Structures the metadata

• When no standard is sufficient– Apply proprietary format– Structures the metadata

• If tagging is needed for search/browsing/retrieval– Provide fixed structure

• E.g., who, what, where, when, …

– Provide fixed vocabulary• Thesaurus• Hierarchy• Ontology for advanced reasoning

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009

Page 41: 2009.09.29   chris poppe - metadata

41/39

ELIS – Multimedia Lab

Questions?

MetadataChris Poppe

Levend Geheugen, Brussels, Belgium – September 29 2009