Johannes Hercher Developer Linking Data presentation Fusepool

24
Linking Library Data with Fusepool Johannes Hercher (Free University Berlin) June 25, 2014 @jhercher

Transcript of Johannes Hercher Developer Linking Data presentation Fusepool

Linking Library Data with Fusepool

Johannes Hercher (Free University Berlin) June 25, 2014

@jhercher

Fusepool final public workshop!Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library

ContextI care for metadata Ugh!

Your OPAC sucks

We cooperate…

How to link Library Data with the „Oceans“ of WWW ?

German National Library

published authority data

Fusepool final public workshop!Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library

Example

a search in subject index (with GND Identifiers)

a search in full text http://primo.fu-berlin.de

• GND = Thesaurus for subject indexing in Germany

• Search with GND limited tolocal resources

Fusepool final public workshop!Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library

• search beyond the local holdings => easier, more reliable

• suggest content using semantic relations ( GND is a Thesaurus ! )

You* should use identifiers

*publishers, authors, aggregators

Assigning IDs is time consuming

- Reality -

Assigning IDs is fun

- Vision -

Fusepool final public workshop!Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library

Questions & Tasks

• Could machines do the subject indexing?-> Use SMA to enrich DBpedia pages with GND IDs

• Can we support Librarians in subject indexing? -> Build Annotator Prototype https://github.com/jhercher/LEE/

Fusepool final public workshop!Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library

Demonstrator

AnnotatorApp: filters stoppwords and displays Library entities

for your text

Fusepool final public workshop!Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library

Review concepts and start a search using concept id’s

https://github.com/jhercher/LEE

Fusepool final public workshop!Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library

How to Fusepool

Fusepool final public workshop!Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library

Workflow1. Select a subset of GND Subject Headings using SPARQL

2. Import Subject Headings

3. Configure SMA dictionary component

4. Import documents (Graph)

5. Batch matching of documents with dictionaries using Fusepools DLC

6. Review results and build services on top

Fusepool final public workshop!Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library

http://zbw.eu/beta/sparql/gndhttp://d-nb.info/standards/elementset/gnd

NomenclatureInBiologyOrChemistry

SubjectHeadingSensoStrictoProductNameOrBrandName

HistoricSingleEventOrEraEthnographicName

GroupOfPersonsSubjectHeading

Language

Fusepool final public workshop!Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library

http://localhost:8080/admin/graphs/

Fusepool final public workshop!Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library

Fusepool final public workshop!Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library

Fusepool final public workshop!Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library

Results

Fusepool final public workshop!Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library

<http://de.dbpedia.org/resource/Wilder_Streik_bei_Ford_(1973)> <http://purl.org/dc/elements/1.1/subject> <http://d-nb.info/gnd/7708211-4> , # Drug-eluting Stent(syn: DES) <http://d-nb.info/gnd/4302110-4> , # Ford <http://d-nb.info/gnd/4578282-9> , # sich [„self“@en] <http://d-nb.info/gnd/4248646-4> , # Spitzel [„spy“@en] (syn: IM) <http://d-nb.info/gnd/4389837-3> , # August (month) <http://d-nb.info/gnd/4291333-0> , # Niederlage [„defeat“@en] <http://d-nb.info/gnd/4002623-1> . # Arbeitnehmer [„employee“@en]

• GND Dictionary includes: articles, prepositions, adjectives… • Acronyms („IM, DES“) -> activate „Case Sensitivity“ • Not every match is useful in the context („August, Defeat“)

http://localhost:8080/graph?name=urn:x-localinstance:/dlc/{yourDataset}/enhance.graph

Fusepool final public workshop!Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library

human (found in GND) = 1

SMA GND suggestions = 7

SMA correct = 3

precision = 33%

recall = 100%

SMA false = 1

Prototype: GND AnnotatorPersons LocationsTopics Time

manual Evaluation only for Topics

okok

not relevantfalse

not relevantok

not relevant

Fusepool final public workshop!Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library

Results (1)

Recall: 78%"Precision: 73%

Fusepool final public workshop!Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library

Results (2)

Recall: 90%"Precision: 72%

Fusepool final public workshop!Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library

http://primo.kobv.de/docId=TN_thieme_articles10.1055/s-0029-1237743

Fusepool in the wild (1)

no exact string match

chemical term geographic

financialeducation

too broad

Fusepool final public workshop!Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library

Fusepool in the wild (2)AbstractReviewsTOC

ISBN: 9783642371103

Drawback: Quality of annotations depend on text input

Fusepool final public workshop!Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library

Feedback

Fusepool final public workshop!Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library

Why Fusepool?1. Ready for the Semantic Web"

• can handle graphs (clerezza, TDB,…)

• Data i/o using REST

2. String Matching SMA"

• Import & configuration of dictionaries (e.g. a Thesaurus)

• batch matching & annotation using Data Life Center (DLC)

3. Easy to install Builds at http://jenkins.fusepool.info

Fusepool final public workshop!Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library

Conclusion!

• Fusepool: Infrastructure to build new services

• … better linking beyond the aquarium(s)

• TODO:

• build tailored interfaces for annotation, search, recommender

• improve the dictionaries

Fusepool final public workshop!Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library

Thank You!

twitter: @jhercher github: https://github.com/jhercher/ mail: [email protected]