Processing Linked Data with Catmandu

31
Processing Linked Data with Catmandu Patrick Hochstenbach | UGent http://librecat.org

description

Introduction into library data processing with help of Catmandu http://librecat.org

Transcript of Processing Linked Data with Catmandu

Page 1: Processing Linked Data with Catmandu

Processing Linked Data with

CatmanduPatrick Hochstenbach | UGenthttp://librecat.org

Page 2: Processing Linked Data with Catmandu

Processing Linked Data with Catmandu | http://librecat.org

LUND

GHENT

BIELEFELD

Page 3: Processing Linked Data with Catmandu

Processing Linked Data with Catmandu | http://librecat.org

RATIONALE

Page 4: Processing Linked Data with Catmandu

Processing Linked Data with Catmandu | http://librecat.org

KAHN-WILENSKI WEBHANDLE

SERVICE PROVIDER

REPOSITORY

REPOSITORY

I search a paper about...

Page 5: Processing Linked Data with Catmandu

Processing Linked Data with Catmandu | http://librecat.org

Hypothesis 1: one network with a common schemaHypothesis 2: object-oriented designHypothesis 3: the resource is the message

Page 6: Processing Linked Data with Catmandu

Processing Linked Data with Catmandu | http://librecat.org

Hypothesis 1: one network with a common schema

GOOGLE EUROPEANA

OPENAIRE CRISVideos

Images

Books

Data sets

Page 7: Processing Linked Data with Catmandu

Processing Linked Data with Catmandu | http://librecat.org

Hypothesis 2: object-oriented design

Drive

Race

Park

Economy Compact

MinivanConvertible

Wheel

Half car

Bicycle

Zeppelin

Page 8: Processing Linked Data with Catmandu

Processing Linked Data with Catmandu | http://librecat.org

Hypothesis 3: the resource researcher is the message

DNS

GOOGLE

REPOSITORY

CLOUD

Dr. Müller

Page 9: Processing Linked Data with Catmandu

Processing Linked Data with Catmandu | http://librecat.org

LIBRECAT/CATMANDU

Page 10: Processing Linked Data with Catmandu

Processing Linked Data with Catmandu | http://librecat.org

CATMANDU

PubMed

MARCMODS

EXCEL

DSPACE

Fedora

SRUOAI-PMH

DBI

ISI Twitter

DBIAtom

EXCELRDF

JSONXML

SolrElasticSearch

MongoDB

FedoraAleph

Fix

Page 11: Processing Linked Data with Catmandu

Processing Linked Data with Catmandu | http://librecat.org

FUNCTIONAL DESIGN

JSON}each

slicetake

group

select

map

reduce

add_field

join_field

lookup

remove_field

marc_mabcount

Page 12: Processing Linked Data with Catmandu

Processing Linked Data with Catmandu | http://librecat.org

LIBRECAT

Institutional Repositories

Search Engines

Image Databases

Archival Systems

Data cleaning workbench

Citation Style Processor

Page 13: Processing Linked Data with Catmandu

Processing Linked Data with Catmandu | http://librecat.org

CATMANDUcatmandu convert MARC to JSON < records.mrc

catmandu convert OAI --url http://server/OAI to JSON

catmandu convert SRU --url http://server/SRU --query dna to JSON

catmandu convert DBI --query ‘SELECT * FROM table’ to JSON

catmandu convert MARC to JSON < records.mrc

catmandu convert OAI --url http://server/OAI to XML

catmandu convert SRU --url http://server/SRU --query dna to YAML

catmandu convert ArXiv --query ‘all:electron’ to CSV

CONVERT

Page 14: Processing Linked Data with Catmandu

Processing Linked Data with Catmandu | http://librecat.org

CATMANDU

catmandu convert X to Y --fix ‘marc_map(“245”,”title”)’

catmandu convert X to Y --fix ‘prepend(“title”,”abcd-”)’

catmandu convert X toY --fix fixes.txt

fixes.txt: remove_field(“_id”); marc_map(“001”, “merge.id”); prepend(“merge.id”, “author:”); add_field(“merge.source”,”author”); copy_field(“merge.id”,”_id”);

FIX

set_field add_field move_field copy_field remove_field upcase downcase capitalize trim substring prepend appendlookup lookup_in_store countcmdsplit_field join_field retain_field replace_all collapse expand cloneif_all_match if_any_match if_exists

Page 15: Processing Linked Data with Catmandu

Processing Linked Data with Catmandu | http://librecat.org

CATMANDU

catmandu import JSON to MongoDB --opt ... --opt ...catmandu import MARC to ElasticSearch catmandu import DC to FedoraCommonscatmandu import CSV to DBI

catmandu export MongoDB to JSONcatmandu export Solr to YAMLcatmandu export DBI to CSVcatmandu export FedoraCommons to Template --template test.tt test.tt: (TemplateToolKit)

[%- FOREACH f IN record %] [% _id %] [% f.shift %][% f.shift %][% f.shift %][% f.join(“:”) %][%- END %]

IMPORT / EXPORT

Page 16: Processing Linked Data with Catmandu

Processing Linked Data with Catmandu | http://librecat.org

CATMANDU

https://metacpan.org/pod/Catmanduhttps://github.com/LibreCat/Catmandu http://librecat.org/tutorial/http://librecat.org/catmandu/2013/06/21/catmandu-cheat-sheet.html

Page 17: Processing Linked Data with Catmandu

Processing Linked Data with Catmandu | http://librecat.org

LIBRECAT http://biblio.ugent.be

Page 18: Processing Linked Data with Catmandu

Processing Linked Data with Catmandu | http://librecat.org

LIBRECAT http://pub.uni-bielefeld.de/en

Page 19: Processing Linked Data with Catmandu

Processing Linked Data with Catmandu | http://librecat.org

LIBRECAT http://adore.ugent.be

Page 20: Processing Linked Data with Catmandu

Processing Linked Data with Catmandu | http://librecat.org

LIBRECAT http://libnew.ugent.be

Page 21: Processing Linked Data with Catmandu

Processing Linked Data with Catmandu | http://librecat.org

Architecture

FEDORA

MEDIAMOSA

VLE BIBLIO SCANNING

RECEIVE

ALEPH ABS

CLOUD

DEDUP/MERGE/AUGMENT

BLACKLIGHT

Page 22: Processing Linked Data with Catmandu

Processing Linked Data with Catmandu | http://librecat.org

LINKED DATA

Page 23: Processing Linked Data with Catmandu

Processing Linked Data with Catmandu | http://librecat.org

PRODUCTION

CATALOG

MARC

245 $$a ... $$b 260 $$a ... 700 $$a ...

JSON/YAML

LINKED DATA

Page 24: Processing Linked Data with Catmandu

Processing Linked Data with Catmandu | http://librecat.org

STAGE 1: CATALOG to MARC

CATALOG

MARC

245 $$a ... $$b 260 $$a ... 700 $$a ...

$ catmandu export ALEPH to MARC

Page 25: Processing Linked Data with Catmandu

Processing Linked Data with Catmandu | http://librecat.org

STAGE 2: MARC to JSON

MARC

245 $$a ... $$b 260 $$a ... 700 $$a ...

JSON/YAML

Page 26: Processing Linked Data with Catmandu

Processing Linked Data with Catmandu | http://librecat.org

MARC

245 $$a ... $$b 260 $$a ... 700 $$a ...

JSON/YAML

STAGE 2: MARC to JSON

Tolstoj, Lev Nikolaevič,Author

War and peace /Title

1952.Publication Year

Napoleonic Wars,Subject

Page 27: Processing Linked Data with Catmandu

Processing Linked Data with Catmandu | http://librecat.org

STAGE 2: MARC to JSON

MARC

245 $$a ... $$b 260 $$a ... 700 $$a ...

JSON/YAML

Tolstoj, Lev Nikolaevič,War and peace /1952.Napoleonic Wars,

AuthorTitleYear

Subject

Tolstoj, Lev NikolaevičWar and peace1952Napoleonic Wars

AuthorTitleYear

Subject

FIX

Page 28: Processing Linked Data with Catmandu

Processing Linked Data with Catmandu | http://librecat.org

STAGE 3a: JSON to RDF

JSON/YAML

LINKED DATA

Tolstoj, Lev NikolaevičWar and peace1952Napoleonic Wars

AuthorTitleYear

Subject

?

Tolstoj, Lev NikolaevičWar and peace1952Napoleonic Wars

dc:creator

dc:title

dc:date

dc:subject

FIX

Page 29: Processing Linked Data with Catmandu

Processing Linked Data with Catmandu | http://librecat.org

JSON/YAML

LINKED DATA

STAGE 3a: JSON to RDF

<http://example.org/000000008> <http://purl.org/dc/elements/1.1/creator> “Tolstoj, Lev Nikolaevič”; <http://purl.org/dc/elements/1.1/title> “War and peace” ; <http://purl.org/dc/elements/1.1/date> “1952” ; <http://purl.org/dc/elements/1.1/subject> “Napoleonic Wars” ; a <http://www.europeana.eu/schemas/edm/Book> .

FIX

RDF/Turtle

http://demo.librecat.org/

Page 30: Processing Linked Data with Catmandu

Processing Linked Data with Catmandu | http://librecat.org

STAGE 3b: RDF to Linked Data

JSON/YAML

LINKED DATA

<http://example.org/000000008> <http://purl.org/dc/elements/1.1/creator> “Tolstoj, Lev Nikolaevič”; <http://purl.org/dc/elements/1.1/title> “War and peace” ; <http://purl.org/dc/elements/1.1/date> “1952” ; <http://purl.org/dc/elements/1.1/subject> “Napoleonic Wars” ; a <http://www.europeana.eu/schemas/edm/Book> .

<http://example.org/000000008> <http://purl.org/dc/elements/1.1/creator> <http://viaf.org/viaf/96987389>; <http://purl.org/dc/elements/1.1/title> “War and peace” ; <http://purl.org/dc/elements/1.1/date> “1952” ; <http://purl.org/dc/elements/1.1/subject> <http://dbpedia.org/page/Napoleonic_Wars> ; a <http://www.europeana.eu/schemas/edm/Book> .

FIX

Page 31: Processing Linked Data with Catmandu

Processing Linked Data with Catmandu | http://librecat.org

THANK YOU

Nicolas SteenlantNicolas Franck Snorri Briem

Jörgen ErikssonMaria HedbergDave Sheroman

Friedrich SummannNajko JahnVitali PeilPetra KohorstChristian PietschMathias Lösch

Johan RolschewskiJakob Voß

UGENTLUND

BIELEFELD

GBV STAATSBIBLIOTHEK ZU BERLIN

Wouter WillaertINUITS