emtacl12 - NTNU€¦ · emtacl12 . Consert Book Newspaper Radio show TV show is interviewed in has...

22
emtacl12

Transcript of emtacl12 - NTNU€¦ · emtacl12 . Consert Book Newspaper Radio show TV show is interviewed in has...

Page 1: emtacl12 - NTNU€¦ · emtacl12 . Consert Book Newspaper Radio show TV show is interviewed in has written has participated in ... has created is played in is mentioned in is reviewed

emtacl12

Page 2: emtacl12 - NTNU€¦ · emtacl12 . Consert Book Newspaper Radio show TV show is interviewed in has written has participated in ... has created is played in is mentioned in is reviewed

Consert

Book

Newspaper

Radio show

TV show

is interviewed in

has written has participated in

has played

Broadcasting domain Text domain

Person

Music domain

Song

has created

is played in is mentioned in

is reviewed in

is played in

is reviewed in

Band

is member of

Page 3: emtacl12 - NTNU€¦ · emtacl12 . Consert Book Newspaper Radio show TV show is interviewed in has written has participated in ... has created is played in is mentioned in is reviewed

black metal

repository

Last FM

DBpedia

MusicBrainz

BBC Music

Deichman

Page 4: emtacl12 - NTNU€¦ · emtacl12 . Consert Book Newspaper Radio show TV show is interviewed in has written has participated in ... has created is played in is mentioned in is reviewed

record level registrator

registration standard (AACR2 etc)

schema level structure

semantics

mapping

repository level (Linked data level) cross collection retrieval

Chan, L. M., & Zeng, M. L. (2006). Metadata interoperability and standardization: A study of methodology part I. D-Lib Magazine, 12(6).

Page 5: emtacl12 - NTNU€¦ · emtacl12 . Consert Book Newspaper Radio show TV show is interviewed in has written has participated in ... has created is played in is mentioned in is reviewed

how to make an infrastructure for searching and browsing Norwegian black metal, based on existing library data?

problem I: incomplete metadata collections

problem II: heterogenous metadata and metadata schemas, locally and externally

problem III: insufficient and inconsistent use of identifiers

sollution: linked data (problem I) + mapping to RDF (problem II) + graph matching (problem II/III)

Page 6: emtacl12 - NTNU€¦ · emtacl12 . Consert Book Newspaper Radio show TV show is interviewed in has written has participated in ... has created is played in is mentioned in is reviewed

seed collection: the national discography (Nordisko)

retrieve MARC(XML) records (z39.50/OAI/SRU) matching a list of preselected black metal bands

RDF

make a simple ”black metal” ontology based on existing vocabularies and convert the MARC records into RDF triples using XSLT upload the triples to a Virtouso triple store

graph matching

use SPARLQ (and PHP) to match clusters of nodes and edges in our seed data against similiar clusters in rich target collections providing SPARQL endpoints

use matching data in target collection for cleaning up and enriching metadata in seed collection

Page 7: emtacl12 - NTNU€¦ · emtacl12 . Consert Book Newspaper Radio show TV show is interviewed in has written has participated in ... has created is played in is mentioned in is reviewed

Burzum

Darkthrone

Emperor

Gorgoroth

Immortal

Satyricon

Thorn

99 MARCXML-records were retrieved from Nordisko as a respons to queries based on pre-selected black metal bands -why black metal? -complex interlinking!

Page 8: emtacl12 - NTNU€¦ · emtacl12 . Consert Book Newspaper Radio show TV show is interviewed in has written has participated in ... has created is played in is mentioned in is reviewed

<marc:datafield tag="700”>

<marc:subfield code="a">Maniac</marc:subfield>

</marc:datafield>

<marc:datafield tag="700”>

<marc:subfield code="a">Blasphemer</marc:subfield>

</marc:datafield>

<marc:datafield tag="700”>

<marc:subfield code="a">Hellhammer</marc:subfield>

</marc:datafield>

<marc:datafield tag="700”>

<marc:subfield code="a">Necrobutcher</marc:subfield>

</marc:datafield>

<marc:datafield tag="710”>

<marc:subfield code="a">Mayhem</marc:subfield>

</marc:datafield>

<marc:datafield tag="740”>

<marc:subfield code="a">Carnage</marc:subfield>

</marc:datafield>

<marc:datafield tag="740”>

<marc:subfield code="a">Necrolust</marc:subfield>

</marc:datafield>

<marc:datafield tag="740”>

<marc:subfield code="a">Deathcrush</marc:subfield>

</marc:datafield>

<marc:datafield tag="740”>

<marc:subfield code="a">Ancient skin</marc:subfield>

</marc:datafield>

<marc:datafield tag="740”>

<marc:subfield code="a">Freezing moon</marc:subfield>

</marc:datafield>

<marc:datafield tag="740”>

<marc:subfield code="a">Fall of seraphs</marc:subfield>

</marc:datafield>

<marc:datafield tag="740”>

<marc:subfield code="a">Chainsaw gutsfuck<marc:subfield>

</marc:datafield>

<marc:datafield tag="900">

<marc:subfield code="a">Eriksen, Rune</marc:subfield>

<marc:subfield code="z">Blasphemer</marc:subfield>

</marc:datafield>

<marc:datafield tag="900”>

<marc:subfield code="a">Stubberud, Jørn</marc:subfield>

<marc:subfield code="z">Necrobutcher</marc:subfield>

</marc:datafield>

<marc:datafield tag="900”>

<marc:subfield code="a">Kristiansen, Sven-Erik<marc:subfield>

<marc:subfield code="z">Maniac</marc:subfield>

</marc:datafield>

<marc:datafield tag="900”>

<marc:subfield code="a">Blomberg, Jan Axel</marc:subfield>

<marc:subfield code="z">Hellhammer</marc:subfield>

</marc:datafield>

Page 9: emtacl12 - NTNU€¦ · emtacl12 . Consert Book Newspaper Radio show TV show is interviewed in has written has participated in ... has created is played in is mentioned in is reviewed

<marc:datafield tag="110”>

<marc:subfield code="a">Kvikksølvguttene</marc:subfield>

</marc:datafield>

<marc:datafield tag="245”>

<marc:subfield code="a">Krieg</marc:subfield>

<marc:subfield code="h">lydopptak</marc:subfield>

</marc:datafield>

<marc:datafield tag="505”>

<marc:subfield code="a">Innhold: In den Arsch gefickt / Kvikksølvguttene. Torture/ Kvikksølvguttene, Vomit. Krieg / Kvikksølvguttene, Vomit (Ztalin, elgitar). More murder / Kvikksølvguttene (Ztalin, elgitar). Anger / Kvikksølvguttene. Ghoul / Kvikksølvguttene, Mayhem. Sluts / Kvikksølvguttene (Ztalin, elgitar). Violent death / Kvikksølvguttene. Fisted sisters / Kvikksølvguttene. Naglekamp / Kvikksølvguttene (Ztalin, elgitar)</marc:subfield>

</marc:datafield>

<marc:datafield tag="700”>

<marc:subfield code="a">Necro</marc:subfield>

</marc:datafield>

<marc:datafield tag="700”>

<marc:subfield code="a">Zathan</marc:subfield>

</marc:datafield>

<marc:datafield tag="700”>

<marc:subfield code="a">Ztalin</marc:subfield>

</marc:datafield>

<marc:datafield tag="700”>

<marc:subfield code="a">H.M.P.D.K.</marc:subfield>

</marc:datafield>

<marc:datafield tag="700”>

<marc:subfield code="a">Andreassen, Ole Petter</marc:subfield>

</marc:datafield>

<marc:datafield tag="710”>

<marc:subfield code="a">Vomit</marc:subfield>

<marc:subfield code="t">Krieg</marc:subfield>

</marc:datafield>

<marc:datafield tag="710”>

<marc:subfield code="a">Mayhem</marc:subfield>

<marc:subfield code="t">Ghoul</marc:subfield>

</marc:datafield>

<marc:datafield tag="710”>

<marc:subfield code="a">Vomit</marc:subfield>

<marc:subfield code="t">Torture</marc:subfield>

</marc:datafield>

<marc:datafield tag="710”>

<marc:subfield code="a">Kvikksølvguttene</marc:subfield>

<marc:subfield code="t">Krieg</marc:subfield>

</marc:datafield>

<marc:datafield tag="710”>

<marc:subfield code="a">Kvikksølvguttene</marc:subfield>

<marc:subfield code="t">Ghoul</marc:subfield>

</marc:datafield>

Page 10: emtacl12 - NTNU€¦ · emtacl12 . Consert Book Newspaper Radio show TV show is interviewed in has written has participated in ... has created is played in is mentioned in is reviewed
Page 11: emtacl12 - NTNU€¦ · emtacl12 . Consert Book Newspaper Radio show TV show is interviewed in has written has participated in ... has created is played in is mentioned in is reviewed

challenges

«Dauði Baldrs» «Hermoðr á Helferð» «Bálferð Baldrs» «Í Heimr Heljar» «Illa Tiðandi» «Móti Ragnarokum»

Page 12: emtacl12 - NTNU€¦ · emtacl12 . Consert Book Newspaper Radio show TV show is interviewed in has written has participated in ... has created is played in is mentioned in is reviewed

Erickson, Rune Eriksen, Rune Espedal, Kristian Eivind Euronymous Fachtal, Arataus Faust Fenris Fenriz Finstad, Børge Frost Gaahl Garbarek, Anja Goat Goatpervertor Greifi Grishnack Greishnackh, Greifi Grim Grishnackh, Greifi Grutle, Kjetil H.M. Daiomonion H.M.P.D.K. Haraldstad, Kjell Vidar Haraldstad, Kjetil Vidar

ambiguity from resource, registrator, registration standard, metadata structure, ontology or transformation?

Page 13: emtacl12 - NTNU€¦ · emtacl12 . Consert Book Newspaper Radio show TV show is interviewed in has written has participated in ... has created is played in is mentioned in is reviewed

browsable data: bibin.hioa.no/blackmetal

Page 14: emtacl12 - NTNU€¦ · emtacl12 . Consert Book Newspaper Radio show TV show is interviewed in has written has participated in ... has created is played in is mentioned in is reviewed

comparing graph structures/ontologies

pattern recognition

semantic correspondences

Raimond, Y., Sutton, C., & Sandler, M. (2008). Automatic interlinking of music datasets on the semantic web. Linked Data on the Web (LDOW2008).

Page 15: emtacl12 - NTNU€¦ · emtacl12 . Consert Book Newspaper Radio show TV show is interviewed in has written has participated in ... has created is played in is mentioned in is reviewed

A B

r

r

r

r

r

s s s

s s

rx sy

rx = http://blackmetal.no/mayhem

black metal repository MusicBrainz

sy = http://musicbrainz.org/artist/c5f9e699-7b0d-4030-86dd-7acc8250d147

owl:sameAs

?

Problem

Page 16: emtacl12 - NTNU€¦ · emtacl12 . Consert Book Newspaper Radio show TV show is interviewed in has written has participated in ... has created is played in is mentioned in is reviewed

4 Artist

”Mayhem”

foaf:name

7 Artist

”Mayhem”

foaf:name

G2 G3

1 Artist

”Mayhem”

foaf:name

G1

= =

Graph matching

matching literals

Page 17: emtacl12 - NTNU€¦ · emtacl12 . Consert Book Newspaper Radio show TV show is interviewed in has written has participated in ... has created is played in is mentioned in is reviewed

comparing literals directly G1:”Mayhem” (node 1) G2:”Mayhem” (node 4) G1:”Deathcrush” (node 2) G2:”Deathcrush” (node 5) G1:”De Mysteriis Dom Sathanas” (node 3) G2:”De Mysteris Dom Sathanas” (node 6) G1:”Mayhem” (node 1) G3:”Mayhem” (node 7) G1:”Deathcrush” (node 2) G3:”Gentle murder” (node 8) G1:”De Mysteriis Dom Sathanas” (node 3) G3:”Pulling Puppet Strings” (node 9)n n literal1 literal2 similarity

1 4 1

2 5 1

3 6 0,9

1 7 1

2 8 0,2

n

Page 18: emtacl12 - NTNU€¦ · emtacl12 . Consert Book Newspaper Radio show TV show is interviewed in has written has participated in ... has created is played in is mentioned in is reviewed

1 Artist

3 Album

2 Album

”Mayhem”

”De Mysteriis Dom

Sathanas”

”Deathcrush”

dc:creator

foaf:name

foaf:name

4 Artist

6 Album

5 Album

”Mayhem”

”De Mysteris Dom

Sathanas”

”Deathcrush”

foaf:made

foaf:name

foaf:name

7 Artist

9 Album

8 Album

”Mayhem”

”Pulling Puppet Strings”

”Gentle

murder”

foaf:made

foaf:name

foaf:name

G1 G2 G3

black metal repository (collection A) Musicbrainz (collection B)

Page 19: emtacl12 - NTNU€¦ · emtacl12 . Consert Book Newspaper Radio show TV show is interviewed in has written has participated in ... has created is played in is mentioned in is reviewed

basic similarity measure for graphs:

graphs matching similarity

G1 G2 MG1:G2a = (1, 4), (2, 5), (3, 6) (1+1+0,9)/3=0,96

G1 G2 MG1:G2b = (1, 4), (2, 6), (3, 5) (1+0,2+0,2)/3=0,46

G1 G3 MG1:G3a = (1, 7), (2, 8), (3, 9) (1+0,2+0,1)/3=0,43

n

Page 21: emtacl12 - NTNU€¦ · emtacl12 . Consert Book Newspaper Radio show TV show is interviewed in has written has participated in ... has created is played in is mentioned in is reviewed

: existing interoperability problems at different levels

: Linked data+graph matching provides

disambiguation both locally and externally

tool for cleaning up local metadata

automatic interlinking

extended local data collection

Page 22: emtacl12 - NTNU€¦ · emtacl12 . Consert Book Newspaper Radio show TV show is interviewed in has written has participated in ... has created is played in is mentioned in is reviewed

thank you! on behalf of Kim Tallerås ([email protected]) Nils Pharo Jørn-Helge Dahl David Massey