Maureen C Kelly Managing Access in New World of Scholarly Research

Making It Easy for Scholars & Researchers to Utilize Content

Managing Access in This New World of Scholarly Research Results: Data, Software, and Ongoing Change

Maureen C. KellyNISO - January 11, 2017

Key Premise:Access to existing knowledge is critical for the discovery of new knowledge

Age of Manuscripts

§ Very slow production

§ Limited circulation

§ Stored in Libraries

§ Users traveled to the content

The Printing Press

§ Gutenberg’s printing

press (1452)

§ Copyright: England

(1709) & US (1783)

§ Much broader

distribution possible

§ Libraries remain key

Digital Creation & Distribution

§ Print & CD-ROMs§ PDFs ,XML, HTML, Web § Distributed databases &

Search Engines§ Remote access & mobile

access§ Libraries pay for content

2000 Years of Publishing Technology

New Pressures for Change

� Need for faster turnaround� Need for deeper access to content (Text &

Data Mining)� Need to support new types of content:◦ Videos, Vector Graphics, etc.◦ “Real” Data (not just tables & graphs)◦ Supporting Software

� New content types call for new infrastructures & new business models

Publishing Business Models

� Traditional: Pay to Access / Use◦ Key producers: scholarly societies & commercial

publishers◦ Subscription revenue – print, print & electronic, electronic

only� Emerging: Pay to Publish / Open Access◦ Institutions, funders, authors pay◦ Still struggling to make this work on a large scale

� New: Open Publishing Initiatives ◦ e.g., Wellcome Open Research

� Recognize need for change – but still searching for a robust solution

Sources of Friction

� Scholarly society publishers rely heavily on revenue from publications

� Traditional role of libraries as the gateway to content is diminishing

� Libraries have been traditional sources of subscription revenue, but under pressure

� No reliable/sustainable business models for repositories of non-journal content (e.g., videos, data, software)

So… What Is Needed to Make Content Usable Today?� Systems for vetting content � Publication of & access to content in “functional”

format� Reliable content IDs (e.g., DOIs) & citation

practices� Open availability of robust supporting metadata� Functional links to & across content� Discovery tools, incl. ‘deep’ searching & AI� Access to supporting data behind the research � Access to software used to analyze the data

AND workable business models to support it all!

Vetting of Content

� Supporting software systems� Still struggle with delays in peer review

step� Continuing initiatives for open review and

post-publication review

Publication of & Access to Content in a Functional Format

� Finally making progress beyond the PDF� Licensing agreements now provide for text &

data mining, though terms are still limiting� Large commercial publishers are beginning to

make provisions for data & software publication, but costly undertaking

� Relying on Open Source tools like GitHub for software

Reliable Content IDs & Citation Practices� DOIs rule for journal content! ◦ Wide use. Reliable supporting infrastructure. ◦ Sustainable business model

� Much progress in establishing standards for data citation:◦ Force11: worked on Joint Declaration of Data Citation

Principles

� Software is trickier◦ Publishers encourage use of Code Repositories like

GitHub for software

� Challenges with versioning for data & software

Open Availability of Robust Metadata

� We’ve come a long way in making metadata available

� Need wider use of standards like ORCID that offer supplementary metadata

� Metadata underpins useful search “Landing Pages”

� Metadata is important for usage & citation analytics

Functional Links

� Reliable hyperlinks have changed the way scholars work

� They make citations functional & more efficient� Tools have emerged to let scholars capture the

linked documents, store the associated PDFs and share them with collaborators◦ e.g., Overleaf, PaperPile, Covidence

Discovery Tools

� We continue to see advances in the area of discovery

� Summon & EBSCO Discovery Service enable searching across databases

� Google & Google Scholar ◦ Mining of vast collections of content to make search

smarter ◦ Continue to raise the bar other providers◦ Significant investments in AI

� AI & Semantic Tagging◦ New players like Semantic Scholar

Access to Supporting Data

� Commercial publishers◦ Scientific Data (Nature) – Database of data

descriptions◦ Data Citation Index (Thomson)◦ Elsevier’s Data Archiving & Networked Services

� Issues re Open Data Policies� Funders◦ Wellcome Open Research

Access to Software Used to Analyze Data

� Custom software has become important for analyzing data◦ Elsevier estimates that 38% of researchers now spend at

least 1 day per week on software development

� Commercial publishers◦ Elsevier - Original Software Publications◦ Nature Methods

� GitHub most widely used repository� Reproducible Reports◦ Galaxy, Jupyter Notebook, knitr, dynamic report generator

for R

Critical Challenges Ahead

� Open Access is an important but limited strategy◦ Like canary in the coal mine◦ It calls attention to a problem but does not really

address the underlying issues� Key issue remains: how to fund creation,

distribution, archiving & access to content◦ What role do libraries (& their universities) play in

funding & mediating access to scholarly content?◦ What new infrastructures are needed to support

access to & archiving of new kinds of content◦ And who pays for it?

Maureen C Kelly Managing Access in New World of Scholarly Research

Education

Transcript of Maureen C Kelly Managing Access in New World of Scholarly Research