Integration of research literature and data (InFoLiS)

51
Integration of research literature and data (InFoLiS) Katarina Boland 1 Philipp Zumstein 2 1 GESIS - Leibniz Institute for the Social Sciences, Cologne, Germany 2 Mannheim University Library, Mannheim, Germany CNI 2015 Spring Membership Meeting April 14th, 2015

Transcript of Integration of research literature and data (InFoLiS)

Integration of research literature and data(InFoLiS)

Katarina Boland1 Philipp Zumstein2

1GESIS - Leibniz Institute for the Social Sciences, Cologne, Germany2Mannheim University Library, Mannheim, Germany

CNI 2015 Spring Membership Meeting

April 14th, 2015

the InFoLiS project:Integration of research data and publications

InFoLiS I: 05/2011 - 05/2013InFoLiS II: 08/2014 - 08/2016

InFoLiS is funded by the DFG (SU 647/2-1)

Integration of research literature and data (InFoLiS) 2/22

Introduction

Catalogue:Publications

SSOAR (GESIS), Primo (UB MA),

...

DataCatalogue:Research Datada|ra (GESIS),

...

Query

Query

Response

Links

Response

Respo

nse

Response

Integration of research literature and data (InFoLiS) 3/22

InFoLiS Project Goals

1 Part I: Generation of Links

2 Part II: How can you reuse it?

Integration of research literature and data (InFoLiS) 4/22

Outline

Integration of research literature and data (InFoLiS) 5/22

Part 1: Generation of Links

Recommendation:1:

Creator (Publication Date): Title. Publication

Agent. Identifier

Creator (Publication Date): Title. Version.

Publication Agent. Type of Resource. Identifier.

→ Extraction based on these patterns?

1seehttp://auffinden-zitieren-dokumentieren.de/zitieren/empfohlene-datenzitation/

Integration of research literature and data (InFoLiS) 6/22

Citation of Research Data

presentation and discussion of the empirical findings. For this purpose, datafrom the Socio-Economic Panel (SOEP) of the years 1990 and 2003 are usedand for both periods, the impact factors are estimated using linear regressionmodels.

data from the title of the years year are used

Integration of research literature and data (InFoLiS) 7/22

References to Datasets

Table 1: Population forecast for Germany depending on age cohorts -proportion in percent.

Data base: 10th Population Forecast of the Federal Statistical Office , variant 5.

(Data base: number title of the publication agent, variantvariant)

Integration of research literature and data (InFoLiS) 8/22

References to Datasets

Consulted were furthermore ...

Consulted were furthermore title1, title2, title3, ..., titleN.

Integration of research literature and data (InFoLiS) 9/22

References to Datasets

Table 3: Sample of the surveys conducted in the years 2003 and 2004 as wellas size of the sample, with valid data from both surveys

(Source: Ditton et al. 2005a)

(Source: citation of descriptive publication)

Integration of research literature and data (InFoLiS) 10/22

References to Datasets

...are hard to detect!

see also...Green, Toby (2009). We Need Publishing Standards forDatasets and Data Tables. OECD Publishing White Paper.doi: 10.1787/603233448430

Altman, Micah and Gary King (2007). A Proposed Standardfor the Scholarly Citation of Quantitative Data. In: D-LibMagazine 13.3.url: http://www.dlib.org/dlib/march07/altman/03altman.html

Integration of research literature and data (InFoLiS) 11/22

References to Datasets

Integration of research literature and data (InFoLiS) 12/22

Automatic Identification ofReferences

Why not simply search for study titles in publications?

Integration of research literature and data (InFoLiS) 12/22

Automatic Identification ofReferences

Why not simply search for study titles in publications?

“ALLBUS/GGSS 1996 (Allgemeine Bevolkerungsumfrage derSozialwissenschaften/German General Social Survey 1996)”

Integration of research literature and data (InFoLiS) 12/22

Automatic Identification ofReferences

Why not simply search for study titles in publications?

“ALLBUS/GGSS 1996 (Allgemeine Bevolkerungsumfrageder Sozialwissenschaften/German General Social Survey 1996)”

“ALLBUS 96”

Integration of research literature and data (InFoLiS) 12/22

Automatic Identification ofReferences

Why not simply search for study titles in publications?

“Youth 2010”

How do humans recognize study references?

Source: Estimations based on SOEP, wave 2002.

Integration of research literature and data (InFoLiS) 13/22

General idea

How do humans recognize study references?

Source: Estimations based on xyz, wave 2002.

Integration of research literature and data (InFoLiS) 13/22

General idea

Integration of research literature and data (InFoLiS) 14/22

Algorithm

for details see...Katarina Boland, Dominique Ritze, Kai Eckert & Brigitte Mathiak (2012).

Identifying References to Datasets in Publications. In: Proceedings of the

Second International Conference on Theory and Practice of Digital Libraries

(TPDL), Lecture Notes in Computer Science Volume 7489, pp. 150-161. Berlin:

Springer. doi:10.1007/978-3-642-33290-6 17

Integration of research literature and data (InFoLiS) 15/22

Reference Extraction

Integration of research literature and data (InFoLiS) 16/22

Mapping to Datasets in da|ra

Strategies: 1) greedy; 2) exact; 3) bestIntegration of research literature and data (InFoLiS) 17/22

Mapping to Datasets in da|ra:

granularity of registration vs. citation

ALLBUS

ALLBUS 2000 ALLBUS 1996ALLBUS 1998

ALLBUS 2000 CAPI/PAPI

ALLBUScompact 2000 CAPI/PAPI

ALLBUScompact 2000 CAPI

ALLBUS - Cumulation 1980-2006 ALLBUS - Cumulation 1980-2008ALLBUScompact - Cumulation 1980-2010

ALLBUScompact 2000 ... ... ...

......

... ... ... ... ...

... ... ... ... ... ... ... ... ... ... ... ......

ALLBUScompact

→ use ontologyIntegration of research literature and data (InFoLiS) 18/22

Mapping to Datasets in da|ra

Vocabulary: e.g. DDI-RDF Discovery Vocabulary2

2Thomas Bosch, Richard Cyganiak, Arofan Gregory, Joachim Wackerow (2013): DDI-RDF Discovery Vocabulary: A Metadata

Vocabulary for Documenting Research and Survey Data. In: Proceedings of the 6th Linked Data on the Web (LDOW) Workshop atthe 22nd International World Wide Web Conference (WWW). CEUR Workshop Proceedings, pp. 46-55

Integration of research literature and data (InFoLiS) 19/22

Ontology: Approach

Integration of research literature and data (InFoLiS) 20/22

Links

Example: da|raExample: SSOAR

Integration of research literature and data (InFoLiS) 21/22

Integration of Links into Information

Systems

Thank you for your [email protected]

Integration of research literature and data (InFoLiS) 22/22

Next part: How can youreuse it?

Part II

How can you reuse it?

! Work in Progress

Interna, Data Structure, Technology

(Internal) Data structure

Document

Pattern

Executation of

Algorithm

Study Title

Study URI

(Internal) Data structure

Document

Pattern

Executation of

Algorithm

Study Title

Study URI

Which studies

are found in a

document?

(Internal) Data structure

Document

Pattern

Executation of

Algorithm

Study Title

Study URI

How was a

pattern derived?

Which studies

are found in an

document?

(Internal) Data structure

Document

Pattern

Executation of

Algorithm

Study Title

Study URI

Which other study

titles are found with

the new

configuration of the

algorithm?

How was a

pattern derived?

Which studies

are found in an

document?

Technology stack

Web Services

RESTful API (web services)

GET, POST, PUT, DELETE, PATCH resources

Search, perform algorithms, upload files

open for integration into other workflows, e.g. in

ressource discovery systems

research data catalogues

digital repositories

possible to orchestrate over a web interface for

individual use

Lookup services

DB

(links)

lookup service

publication

URI study URI

study URI

reverse lookup

service

publication

URI

Extraction of study URIs from a PDF

pdf (fulltext)

DB

(patterns)

pdf2txt

txt (fulltext) extract study titles

study URI

study titles

linking

Recognizing patterns

pdfs

(fulltext) pattern recognizer

seed

DB

(pattern)

Integration of publications and

research data

Current situation: Several steps needed

Common situation today:

Search online catalogue

Evaluate search results

Find fulltext to relevant source

Read the publication

Spot the research data

Moreover, often the reverse information is missing

completely

Which publications are built on some specific

research data?

Clientside

load additional data in

catalogue view (e.g. over

Ajax)

enrich view, links

up-to-date data

Embedd data in the web

presentation

Serverside

add additional data in your

catalogue database (e.g.

Primo enrichement process)

enrich view, links, search,

sort, filter

time-lagged because of

the update mechanism

Do the data fit into

existing infrastructure?

(fields, tables, database)

Two Approaches

Integration as links

Link from catalogue entry ...

… to the corresponding research data

Integration as popup

Cited research data: 2

• ALLBUS 2010 (used in 512 publications)

• part of ALLBUS (used in 13.456 publications)

• own research data (used in 1 publications)

Integration in search/sort

Cited data sets 4

Cited data sets 1

Sort by data

citation

Integration in search/filter

Research data available

Enrich your research data catalogue

Cited in: Ritze, D., Paulheim, H., &

Eckert, K. (2013). Evaluation Measures

for Ontology Matchers in Supervised

Matching Scenarios. In The Semantic

Web – ISWC 2013 (p. 392–407).

Tags from Publication: Supervised

Ontology Matching, Evaluation, Recall,

Precision, F-Measure, Precision@N-

Curves, ROC-Curves, Precision-Recall-

Curves

Current Goals of the Project

1. Expansion to other disciplines and languages

2. Linked data based infrastructure

3. Improve the reusability of generated links

Dissemination

our web services will be open for everyone

project webpage

http://infolis.github.io/

background information,

slides, publications, news

Additionally our code is open source

https://github.com/infolis

you can install/try out everything locally

development of code