@openaire_euEverything Counts in Large Amounts:Measuring the Impact of Usage Activity in Open Access Scholarly Environments
DI4R December 2017, Brussels
Dimitris Pierrakos, Athena Research CenterJochen Schirrwagen, Bielefeld University
● OpenAIRE infrastructure and Usage Statistics Service.
● Usage Data Collection strategies.
● Using Piwik for tracking and analytics.
● Applying COUNTER rules.
● Metrics in the Repository Manager Dashboard.
● Relation to Open Metrics and Next Generation Metrics.
Overview
• A pan-European Research Information platform to
monitor OA research outcomes from EC and other
national funders.
• Research analytics tools to promote new scientific
metrics & support evidence-based decision-making.
• Implementation of an OpenAIRE usage statistics
service for usage data collected from data providers.
OpenAIRE 2020
● Task in OpenAIRE2020 covers:○ aligning policies and standards for gathering and sharing of usage data
-> guidelines○ considering legal aspects (data protection / data privacy)○ relating usage statistics to other kinds of metrics○ collecting and processing of usage data and producing consolidated,
standards-based usage statistics
● Task team: Athena Research Center, University of Bielefeld, University of Minho, Jisc IRUS-UK, Couperin + NOADs
Usage Statistics in OpenAIRE
● OpenAIRE collects from 980 compatible data providers
~21 Mio documents
● currently 32 active data providers participating in
Usage statistics + IRUS-UK
● Usage statistics deployment under cc-0.
○ in OpenAIRE dashboard, portal and API.
Usage Statistics in the OpenAIRE Infrastructure
● Tracking of views and downloads / collecting COUNTER reports
○Push or Pull collection workflows.
● Anonymisation of IP-addresses.● Metadata de-duplication enables accumulation of
views and downloads for same documents ● COUNTER Code of Practice compatibility.
○standards based usage statistics.○enables comparability with statistics from other data sources.
Usage Statistics Service Features
• World's leading open-source analytics platform.• Valuable insights into website traffic and visitors activity. • Piwik collects and stores PII (personally identifiable
information).• Keeps full data ownership and can control who has access. • Robot filtering plugin.• Compliant with EU regulations.• Recommended by privacy organizations such as ULD
(Germany) and CNIL (France).
Piwik Analytics platform
Piwik Google Analytics
Number of Hits per Month Unlimited 10 million
Number of user accounts per login Unlimited 10
Data storage time Unlimited 25 months
Number of properties
(websites, apps etc.) tracked per
account Unlimited 50
Custom Variables 5 5
Data Export Unlimited 5000 rows
Real time Analytics
Piwik offers real-time web
analytics
in all of its reports.
GA monitors user activity right
after it happens,
although period of delay is not
explicitly stated.
Piwik Facts
Metadata-Index
UsageStatistics-DB
● Repository
● CRIS
● eJournal
● National
Statistics Node
● Publisher
PULLCOUNTER
Report
PUSHtracked
event
IP-Anonym.
processing script
processing script
2-Tiers Collection Workflows for Usage Statistics
• An institutional repository is registered in Piwik.• Server side tracking: Plugins (Dspace) or patches
(Eprints) using Piwik’s HTTP API.• Usage Activity is tracked and logged at Piwik
platform in real time.• Ιnformation is transferred offline, using Piwik’s API,
to OpenAIRE’s DBs for statistical analysis.• Statistics are deployed via OpenAIRE’s Portal or
Sushi-Lite API.
Tier-1: Push Usage Statistics Tracking Workflow
Parameter Description
idSite the ID of the repository
idVisit a visitor/session ID (an 8 byte binary string)
visitIP (optionally anonymized) the IP address of the visitor
action the action performed (view, download, outlink, etc)
url the url of the requested item
timestamp the date & time of the request
OAI-PMH Identifier
the Open Access Initiative identifier of the item being
viewed/downloaded
agent the Web Browser and the operating system of the visitor
referrer The url linked to the item requested
Tier-1: Piwik Tracking Parameters
● Usage events can be considered privacy-sensitive information (user-agent, ip-address, ...)
● Usage statistics services must comply with data protection laws and regulations for both usage data- and service-providers○ but legal situation differs between the countries○ OpenAIRE must comply with the EU-General Data Protection
Regulation
● Tracking plugins issued by OpenAIRE anonymize usage data already on the client-side
Data Protection Aspects
Usage Activity in real time
Real time Visitor Map
• Applying data processing rules according to COUNTER Code of Practice:• ie. counting requests depending on session duration, tracing double-
clicks
• Bot filtering• Piwik Bot Plugin• COUNTER Robots Working Group
• Link of usage event with metadata record in OpenAIRE
• Accumulate views and counts of de-duplicated records
Cleaning and Consolidation
Repository Pilot Statistics
• Gathering of consolidated statistics reports from aggregation services, such as IRUS-UK, using protocols such as SUSHI-Lite.
• Statistics are stored to OpenAIRE’s DB for statistical analysis.
• Statistics are deployed via OpenAIRE’s Portal or Sushi-Lite API.
Tier-2: Collecting (Pull) Consolidated Usage Statistics Reports
entityId/orid
entityId/orid
entityId/orid
entityId/orid
source
source
OpenAIRE Usage Statistics DB
● Four steps to join OpenAIRE Usage Statistics1. Download. 2. Configure. 3. Deploy. 4. Validate (by OpenAIRE).
● Or enter SUSHI endpoint to let OpenAIRE collect COUNTER reports
OpenAIRE Repository Manager Dashboard
Content Provider Dashboard -Start Page
Content Manager’s Datasource selection for Metrics
Enable Metrics for selected Datasource
Configure Metrics for selected Datasource
000
01233456
Summarized Usage Statistics on the content provider level
Usage Statistics on the Article Level
● Available as beta with the help of IRUS-UK○ http://beta.services.openaire.eu/usagestats/sushilite/
● Supports COUNTER R4 compatible reports:○ Article Reports (AR) and Book Reports (BR) using identifiers like
openaire, doi, oai-record-id○ Item Reports (IR)○ Repository Reports (RR) using identifiers issued by OpenAIRE or
OpenDOAR○ Journal Reports (JR) using identifiers like ISSN
SUSHI-Lite Interface
Repository Report Item Report
SUSHI response example (JSON)
• Quantitative indicators for research
• Governance
• Management
• Assessment
• Dimensions
• Robust metrics in terms of accuracy and scope;
• Humble metrics recognizing that quantitative evaluation should support qualitative,
expert assessment;
• Open and Transparent metrics;
• Diverse metrics by field in order to support the plurality of research and researcher career
paths across the system;
• Reflexible metrics for recognising, anticipating and updating the systemic and potential
effects of indicators.
OpenAIRE: A Usage Statistics Hub for Responsible Metrics
• Standardization: following COUNTER Code of Practice• by update to COUNTER R5• by contribution to COUNTER Robots Working Group
• Put usage statistics into context with conventional and alternative metrics and (open) peer review
Considering the HLEG Altmetrics Recommendations
● Develop Piwik plugins for other Repository platforms (eg. Fedora, Samvera)
● Promote the service to content provider managers● Support national usage statistics initiatives to
become a node in OpenAIRE Usage Statistics● Contribute to the Open Metrics concept and vision● Activities in OpenAIRE-Advance starting in 2018:
○support LA Referencia to set up a regional usage statistics network and interlink
○working towards Open Metrics
Next Steps
●Standardize usage statistics to enable assessment of research impact
○Standardize usage statistic metrics across OpenAIRE and EOSC-hub
○Collaborate with RDA (e.g. Make Data Count BoF working group)
○Promote common guidelines to and across communities
○Take EC rules and GDPR regulations into account
●Enable the collection/aggregation of usage stats from content providers
○Adopt OpenAIRE and EOSC-hub services for collecting user statistics, services in scope:
■EGI: Accounting System, AppDB
■ EUDAT: DPMT, B2SHARE, B2FIND, B2SAFE
○Adopt OpenAIRE Usage Statistics Services to collect user stats for all products of
science
■ e.g. literature, datasets, software, research objects
■ Integrating with EOSC-hub services for usage statistics/metrics
Collaboration with EOSC-hub
● OpenAIRE Usage Statistics Deliverable Report○ https://doi.org/10.5281/zenodo.1034163
● Repository Tracking Plugins (github)○ https://github.com/openaire/OpenAIRE-Piwik-DSpace○ https://github.com/openaire/EPrints-OAPiwik
● SUSHI-Lite API (beta)○ http://beta.services.openaire.eu/usagestats/sushilite/
References
Top Related