Integration of research literature and data(InFoLiS)
Katarina Boland1 Philipp Zumstein2
1GESIS - Leibniz Institute for the Social Sciences, Cologne, Germany2Mannheim University Library, Mannheim, Germany
CNI 2015 Spring Membership Meeting
April 14th, 2015
the InFoLiS project:Integration of research data and publications
InFoLiS I: 05/2011 - 05/2013InFoLiS II: 08/2014 - 08/2016
InFoLiS is funded by the DFG (SU 647/2-1)
Integration of research literature and data (InFoLiS) 2/22
Introduction
Catalogue:Publications
SSOAR (GESIS), Primo (UB MA),
...
DataCatalogue:Research Datada|ra (GESIS),
...
Query
Query
Response
Links
Response
Respo
nse
Response
Integration of research literature and data (InFoLiS) 3/22
InFoLiS Project Goals
1 Part I: Generation of Links
2 Part II: How can you reuse it?
Integration of research literature and data (InFoLiS) 4/22
Outline
Recommendation:1:
Creator (Publication Date): Title. Publication
Agent. Identifier
Creator (Publication Date): Title. Version.
Publication Agent. Type of Resource. Identifier.
→ Extraction based on these patterns?
1seehttp://auffinden-zitieren-dokumentieren.de/zitieren/empfohlene-datenzitation/
Integration of research literature and data (InFoLiS) 6/22
Citation of Research Data
presentation and discussion of the empirical findings. For this purpose, datafrom the Socio-Economic Panel (SOEP) of the years 1990 and 2003 are usedand for both periods, the impact factors are estimated using linear regressionmodels.
data from the title of the years year are used
Integration of research literature and data (InFoLiS) 7/22
References to Datasets
Table 1: Population forecast for Germany depending on age cohorts -proportion in percent.
Data base: 10th Population Forecast of the Federal Statistical Office , variant 5.
(Data base: number title of the publication agent, variantvariant)
Integration of research literature and data (InFoLiS) 8/22
References to Datasets
Consulted were furthermore ...
Consulted were furthermore title1, title2, title3, ..., titleN.
Integration of research literature and data (InFoLiS) 9/22
References to Datasets
Table 3: Sample of the surveys conducted in the years 2003 and 2004 as wellas size of the sample, with valid data from both surveys
(Source: Ditton et al. 2005a)
(Source: citation of descriptive publication)
Integration of research literature and data (InFoLiS) 10/22
References to Datasets
...are hard to detect!
see also...Green, Toby (2009). We Need Publishing Standards forDatasets and Data Tables. OECD Publishing White Paper.doi: 10.1787/603233448430
Altman, Micah and Gary King (2007). A Proposed Standardfor the Scholarly Citation of Quantitative Data. In: D-LibMagazine 13.3.url: http://www.dlib.org/dlib/march07/altman/03altman.html
Integration of research literature and data (InFoLiS) 11/22
References to Datasets
Integration of research literature and data (InFoLiS) 12/22
Automatic Identification ofReferences
Why not simply search for study titles in publications?
Integration of research literature and data (InFoLiS) 12/22
Automatic Identification ofReferences
Why not simply search for study titles in publications?
“ALLBUS/GGSS 1996 (Allgemeine Bevolkerungsumfrage derSozialwissenschaften/German General Social Survey 1996)”
Integration of research literature and data (InFoLiS) 12/22
Automatic Identification ofReferences
Why not simply search for study titles in publications?
“ALLBUS/GGSS 1996 (Allgemeine Bevolkerungsumfrageder Sozialwissenschaften/German General Social Survey 1996)”
“ALLBUS 96”
Integration of research literature and data (InFoLiS) 12/22
Automatic Identification ofReferences
Why not simply search for study titles in publications?
“Youth 2010”
How do humans recognize study references?
Source: Estimations based on SOEP, wave 2002.
Integration of research literature and data (InFoLiS) 13/22
General idea
How do humans recognize study references?
Source: Estimations based on xyz, wave 2002.
Integration of research literature and data (InFoLiS) 13/22
General idea
for details see...Katarina Boland, Dominique Ritze, Kai Eckert & Brigitte Mathiak (2012).
Identifying References to Datasets in Publications. In: Proceedings of the
Second International Conference on Theory and Practice of Digital Libraries
(TPDL), Lecture Notes in Computer Science Volume 7489, pp. 150-161. Berlin:
Springer. doi:10.1007/978-3-642-33290-6 17
Integration of research literature and data (InFoLiS) 15/22
Reference Extraction
Strategies: 1) greedy; 2) exact; 3) bestIntegration of research literature and data (InFoLiS) 17/22
Mapping to Datasets in da|ra:
granularity of registration vs. citation
ALLBUS
ALLBUS 2000 ALLBUS 1996ALLBUS 1998
ALLBUS 2000 CAPI/PAPI
ALLBUScompact 2000 CAPI/PAPI
ALLBUScompact 2000 CAPI
ALLBUS - Cumulation 1980-2006 ALLBUS - Cumulation 1980-2008ALLBUScompact - Cumulation 1980-2010
ALLBUScompact 2000 ... ... ...
......
... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ......
ALLBUScompact
→ use ontologyIntegration of research literature and data (InFoLiS) 18/22
Mapping to Datasets in da|ra
Vocabulary: e.g. DDI-RDF Discovery Vocabulary2
2Thomas Bosch, Richard Cyganiak, Arofan Gregory, Joachim Wackerow (2013): DDI-RDF Discovery Vocabulary: A Metadata
Vocabulary for Documenting Research and Survey Data. In: Proceedings of the 6th Linked Data on the Web (LDOW) Workshop atthe 22nd International World Wide Web Conference (WWW). CEUR Workshop Proceedings, pp. 46-55
Integration of research literature and data (InFoLiS) 19/22
Ontology: Approach
Example: da|raExample: SSOAR
Integration of research literature and data (InFoLiS) 21/22
Integration of Links into Information
Systems
Thank you for your [email protected]
Integration of research literature and data (InFoLiS) 22/22
Next part: How can youreuse it?
(Internal) Data structure
Document
Pattern
Executation of
Algorithm
Study Title
Study URI
Which studies
are found in a
document?
(Internal) Data structure
Document
Pattern
Executation of
Algorithm
Study Title
Study URI
How was a
pattern derived?
Which studies
are found in an
document?
(Internal) Data structure
Document
Pattern
Executation of
Algorithm
Study Title
Study URI
Which other study
titles are found with
the new
configuration of the
algorithm?
How was a
pattern derived?
Which studies
are found in an
document?
RESTful API (web services)
GET, POST, PUT, DELETE, PATCH resources
Search, perform algorithms, upload files
open for integration into other workflows, e.g. in
ressource discovery systems
research data catalogues
digital repositories
possible to orchestrate over a web interface for
individual use
Lookup services
DB
(links)
lookup service
publication
URI study URI
study URI
reverse lookup
service
publication
URI
Extraction of study URIs from a PDF
pdf (fulltext)
DB
(patterns)
pdf2txt
txt (fulltext) extract study titles
study URI
study titles
linking
Quoting the Horizon Report 2014
“Visionary leadership for research data management
models is also required to determine how to best
incorporate data connections into library catalogs” (NMC
Horizon Report 2014 - Library Edition, p. 7)
Current situation: Several steps needed
Common situation today:
Search online catalogue
Evaluate search results
Find fulltext to relevant source
Read the publication
Spot the research data
Moreover, often the reverse information is missing
completely
Which publications are built on some specific
research data?
Clientside
load additional data in
catalogue view (e.g. over
Ajax)
enrich view, links
up-to-date data
Embedd data in the web
presentation
Serverside
add additional data in your
catalogue database (e.g.
Primo enrichement process)
enrich view, links, search,
sort, filter
time-lagged because of
the update mechanism
Do the data fit into
existing infrastructure?
(fields, tables, database)
Two Approaches
Integration as popup
Cited research data: 2
• ALLBUS 2010 (used in 512 publications)
• part of ALLBUS (used in 13.456 publications)
• own research data (used in 1 publications)
Enrich your research data catalogue
Cited in: Ritze, D., Paulheim, H., &
Eckert, K. (2013). Evaluation Measures
for Ontology Matchers in Supervised
Matching Scenarios. In The Semantic
Web – ISWC 2013 (p. 392–407).
Tags from Publication: Supervised
Ontology Matching, Evaluation, Recall,
Precision, F-Measure, Precision@N-
Curves, ROC-Curves, Precision-Recall-
Curves
Current Goals of the Project
1. Expansion to other disciplines and languages
2. Linked data based infrastructure
3. Improve the reusability of generated links
Dissemination
our web services will be open for everyone
project webpage
http://infolis.github.io/
background information,
slides, publications, news
Additionally our code is open source
https://github.com/infolis
you can install/try out everything locally
development of code
Questions, Discussions, Feedback
Questions?
Discussions
Give us feedback
Small online survey: http://t1p.de/infolis
http://wiki.bib.uni-mannheim.de/limesurvey/index.php?sid=55594
Top Related