EzPAARSE and ezMESURE - OCLC · 2020-06-02 · EzPAARSE and ezMESURE : Assembling national...
Transcript of EzPAARSE and ezMESURE - OCLC · 2020-06-02 · EzPAARSE and ezMESURE : Assembling national...
EzPAARSE and ezMESURE : Assembling national dashboards from locally generated and
fine-grained access events to electronic resources
DOMINIQUE LECHAUDEL - INIST-CNRS
THOMAS JOUNEAU - UNIVERSITE DE LORRAINE
Product owner of ezMESURE project at INST-CNRS. The CNRS, the French Center for Scientific Research
Dominique Lechaudel
E-librarian @ Université de Lorraine
EzPAARSE and ezMESURE userCo-animator of the Couperin.org « Indicators » WG, memberof the Project COUNTER Executive Committee
Thomas Jouneau
EzPAARSE and ezMESURE :
Assembling national dashboards from
locally generated and fine-grained
access events to electronic resources
ezPAARSE and ezMESURE
ezPAARSE
The free and open source software
produces uniform electronic resourcesusage data
ezMESURE
Our national repository and
dashboard tool to visualize
ezpaarse collected data
http://ezpaarse.org
What is ezPAARSE ?
• Free open source software
• Specialized niche software
• Locally installed by institutions
• Produces uniform electronic resources usage data
BiblioMap shows
ezPAARSE running live
http://bibliomap.inist.fr
What is ezPAARSE ?
Log files
Access Events
files
- Filter
- Identifiy
- Enrich
- Encrypt
- …
…………
…………
How does ezPAARSE work ?107.206.236.51 - - - [31/Dec/2017:23:06:56] "GET http://insb.bib.cnrs.fr:80/login?url=http://www.sciencedirect.com/science/journal/13698486
HTTP/1.1" 302 0 -
107.206.236.51 OvIjRz - [email protected]_O_CNRS_I_DS53_OU_UMR8197 [31/Dec/2017:23:07:28] " HTTP/1.1 GET
https://www.ncbi.nlm.nih.gov:443/pmc/articles/PMC5511345/pdf/ncomms16088.pdf HTTP/1.1 " 200 108723 insb
92.91.207.211 jjGjY9Q - 16SBIUMR7255_O_CNRS_I_DS53_OU_UMR7255 [01/Jan/2018:02:44:07] " GET https://cdn.els-
cdn.com:443/sd/css/css_gen_v01_1712R2.css" 200 3410106 insb
83.221.104.173 3aVq1a - [email protected]_O_CNRS_I_DS53_OU_UMR7275 [01/Jan/2018:05:02:38] "GET
http://www.physiology.org:80/doi/pdf/10.1152/ajplung.00348.2002 HTTP/1.1" 200 612360 insb
91.140.193.126 NZBUs1 - [email protected]_O_OTHER_I_DS53_OU_UMR7213 [01/Jan/2018:06:53:18] "GET
http://www.sciencedirect.com:80/science/article/pii/S0009308416300159/pdfft?pid=1-s2.0-S0009308416300159-main.pdf HTTP/1.1" 200 7890 insb
log lines
Access Events
174 parsers
datehost login mimeprint_identifierplatform publication_title doi geoipunitidsession
parsers are dedicated to the various recognized platforms
ACM
AIP
Nature
Springer
Ovid
ScienceDirect
……
………………
How does ezPAARSE work ?
http://www.sciencedirect.com:80/science/article/pii/S0009308416300159/pdfft?pid=1-s2.0-S0009308416300159-main.pdf HTTP/1.1
Access Events
Springer
Ovid
ScienceDirect
parser 16 middlewares
datehost login mimeprint_identifierplatform publication_title doi geoipunitidsession
URL in the log line
middlewares perform particulartasks like crossref enrichment
filter
deduplicator
enhancer
crossref
Example of an ezPAARSE output
KBART fields
De
dup
lica
ted
acce
ss e
vents
CO
UN
TE
R r
eco
mm
end
atio
n
Text file
(CSV or JSON format)
Geoip fields
Librarian and computer scientist collaborate to
produce parsers
SpringerSpringer
ScienceDirect174 parsers
http://analyses.ezpaarse.org
How does ezPAARSE works ?
http://analyses.ezpaarse.orgTrello Account
Maintaining and expanding ezPAARSE recognition capacity to a new platform is a collaborative work
Librarians
How does ezPAARSE works ?
Computer scientist
ARC
http://analyses.ezpaarse.org
Maintaining and expanding ezPAARSE recognition capacity to a new platform is a collaborative work
ezPAARSE installation - usageVery easy use :
- 5 minutes for installation from GitHub
- 5 minutes for its configuration
- Fully automatable treatments - Fully automatable updates
- Web interface
- Command line
- docker container available
ezPAARSE worldwide
• We target ~130 french institutions (with a majority
of universities) declaring using a reverse-proxy
• 80 of them explicitly declared being interested
• 40 have a proper logformat parameter defined and
tested at least a log sample
• 50 have installed and use ezPAARSE on a regular basis
• ~120 instance installations out of France
• 60 in the USA
ezPAARSE / ezMESURE Ecosystem
What is ezMESURE ?
Access Events
files
Dashboards
- Stock
- Aggregate
- Compare
- Visualize
- Highlight- …
Université de Lorraine - Pioneer times : Designing,
installing, testing, strengthening a local ezPAARSE
installation
Using ezPAARSE• The key question : how to merge
ezPAARSE output with valid,
local data such as : the cursus,
the research lab, the user’s
status
• I’ll expose how we did it at the
Université de Lorraine, in 2
different ways (one currentlyactive, the other we wish to
implement more completely)
?
The context, the goalsContext : how we became a pilot institution
● University born from a merger in 2012 but we use ezproxy since 2009
● All accesses (distant AND local) go through the reverse proxy
● A geographical proximity with the Inist, making collaborations easier
Goals
● Supplement the publisher statistics with data regarding the non-Counter platforms
● Deepen publisher statistics with user profile data
● Use the data and indicators produced as a steering tool (documentary policy, service delivery)
Resources
• 1 librarian, 1 technician (part-time for both)
Brief chronology (UL – Inist – Couperin)
2012
2013
2014
Product vision
Version 0.1
First experiments at U. de Lorraine
First dashboards!
Consolidation of the UL installation
Results files accessed through a web interface
2015-2017AGIMUS project
New way to characterize the events.
ezproxy and cgi
script (local user database)
Output
(anonymised,
enriched)
ezpaarse
-results
raw ezproxy logs
ezproxy logs
enriched by the
CGI script and
anonymised,
secured for later use.
Univ ersity
authentication
sy stem
(LDAP + CAS)
The main installation framework since 2014
Post-processingand dashboards
Standard ezPAARSE output
+
Local fields
?Logs into
Select local relevant fields
Possible affiliations list
BC list
ETAPE repository
Early contacts with IT teams have allowed us to obtain a regular access to the BC and affiliation lists (and their updates)
User affiliation
User status« Business Category »
« ETAPE » code(national cursus reference code)
Available data...
... are translated with the help of static lists during
the post-processing phase (Visokio).
User related fields
Standard ezPAARSE fields
Standard VS local
fields
TOC (Table of contents) et ABS (tracts) excluded : 2014 : More than 1 700 000 ECs2015 : More than 2 000 000 ECs2016 : More than 3 000 000 ECs2017 : More than 4 500 000 ECs...
Size of output of an active month once compressed :12 Mb in 201450 Mb in 2017.
1 line is 1 consultation event
A server internally accessible with a nice web-based interface allows to retrieve daily ezPAARSE outputs, and monthly
concatenations
Access to the output files
Post-processing(1) : Research and academic units
Research : 62 possible research units in 10 « Poles »
A2F : Agronomie, agroalimentaire, f orêt
BMS : Biologie, médecine, santé
CPM : Chimie et phy sique moléculaires
M4 : Matière, matériaux, métallurgie, mécanique
TELL : Temps, espaces, lettres, langues
AM2I : Automatique, mathématiques, inf ormatique et
leurs interactions
CLCS : Connaissance, langage, communication, sociétés
EMPP : Énergie, mécanique, procédés, produits
OTELo : Observ atoire Terre et env ironnement de Lorraine
SJPEG : Sciences Juridiques, Politiques, Économiques et
de Gestion
Academic departments and
« Collegiums »Arts, lettres et langues (ALL)
Droit, économie, gestion (DEG)
Lorraine – INP (écoles d'ingénieurs)
Lorraine Management Innovation (LMI)
InterfaceSanté
Sciences et technologies
Sciences humaines et sociales (SHS)
Technologie
Each patron can have up to 4
affiliations, and very often 2
For every CE we take the first affiliation of each kind : Teaching, Research, Administration.
We add extra information regarding each of those affiliations (the « real » name, the group in the organigram...
The « teaching » affiliations are attached with their respective collegiums and the « research » with their poles.
There are more than 1000 possible
affiliations
Post-processing(2) : Business categories
Local data (3) : « ETAPE » codes
The « ETAPE » Codes are a national repository for every cursus and year in France.
This information when and once obtained allows to know the level (year) of the students.
Ex. 1 : Academic departments in Law and Economy
Ex.2 : Research lab in materials science
Participation to AnalogIST
Université de Lorraine - Interactive dashboards :
ezMESURE, AGIMUS (2016 and beyond)
xkc
d.c
om
ezPAARSE and AGIMUSAgimus is a national project used
by IT teams in some universities.It works very similarly to ezPAARSE but with every
other electronic service (Intranet, wi-fi accesses,
Moodles…)
ezproxyand
CGI script (local user database)
Output
(anonymised,
enriched)
ezPAARSE output
- ev ery ezPAARSE f ield
- Business category (user category )
- Af f iliation
- ETAPE (National student cursus
ref erence number)
- encry pted logins
ezpaarse
-results
Ezproxy logs are enriched directly on the ezproxy server using
a custom script also used for managing the different
authentication levels (for ex. local access vs. local+distant).
EzPAARSE output files are analysed and post-processed by
T. Jouneau with Omniscope Visokio, and dashboards are
produced.
raw ezproxy logs
ezproxy logs
enriched by the
CGI script and
anonymised,
secured for later use.
User related fields
LDAP
Standard ezPAARSE fields
Past (2014-2017) : The main installation framework (quick reminder)
Post-processingand dashboards
ezproxyand
CGI script (local user database) 2 distinct outputs
Traditional ezPAARSE output
- ev ery ezPAARSE f ield
- Business category (user category )
- Af f iliation
- ETAPE (National student cursus
ref erence number)
- encry pted logins
ezpaarse
-results
Ezproxy logs are enriched directly on the ezproxy server using a custom script also used for managing the different authentication levels (for ex. local access vs. local+distant).
EzPAARSE output files are analysed and post-processed by T. Jouneau with Omniscope Visokio, and dashboards are
produced.
Simultaneously output files with only the unencrypted login are
sent to Agimus which enrich them with LDAP data based on the login, before sending them to ezMESURE.
raw ezproxy logs
ezproxy logs
enriched by the
CGI script and
anonymised,
secured for later use.
LDAP
Present (2017) : Workflow ezProxy → ezPAARSE → Agimus → ezMESURE
Post-processingand dashboards
Unencrypted ezPAARSE output
- ev ery ezPAARSE f ield
- no user data except
logins (not encry pted)
LDAP
ezproxy ezproxy logs
enriched by
Agimus &
anony mised,
secured f or later
use.
ezpaarse
-results
No more CGI script. AGIMUS brings directly the local fields to the ezproxy logs, which are then secured for later use and reparsings.
Agimus does not send data to ezMESURE anymore. We have a « natural » ezPAARSE > ezMESURE workflow.
While Visokio may still be used for some time it will realistically be phase out eventually.
Frequent reparsings
Future (2018?) : Redesigning the workflow
Ex. 3 (ezMESURE) : platform profiles
Ex.4 (ezMESURE) : JSTOR profile
Ex.5 (ezMESURE) : Academic departments
Ex.6 (ezMESURE) : Academic departments in Sciences
Ex.7 (ezMESURE) : Rsearch labs and poles
Université de Lorraine -Université de Lorraine -
More informations
http://ezpaarse.orghttp://analyses.ezpaarse.org
https://ezmesure.couperin.org
To contact us:
https://twitter.com/[email protected]
To Collaborate
https://github.com/ezpaarse-project/*
Live demo
Thank you
Dominique LechaudelINIST-CNRS
Thomas JouneauUNIVERSITÉ DE LORRAINE