Funding Discovery in PURE - A Proof of Concept - …...-Pick and Match work inefficient / labour...
Transcript of Funding Discovery in PURE - A Proof of Concept - …...-Pick and Match work inefficient / labour...
| 0
James Toon | Pure Customer Consultant | Elsevier Email: [email protected]
Dr. Stephen Peuchen | Head of the UMCG Research Office, University Medical Center Groningen, University of GroningenEmail: [email protected]
Funding Discovery in PURE - A Proof of Concept -
PoC RIGHT ON
October 11, 2017
| 1
Overview
• Introduction + purpose
• Context for the trial (i.e. Pure components)
• PoC Design, results and conclusions
• Future recommendations
• Acknowledgements
| 2
Purpose of trial
“The goal of the proof of concept is to establish a sufficiently
accurate match between the profile of a principal investigator
and upcoming Funding Opportunities.”
| 3
Background context
Section 1
| 4
1 - The Pure Funding Discovery Module
Goal - To create a module that supports researchers and administrators in their
quest to find funding opportunities.
| 5
1 - The Pure Funding Discovery Module
• Released 4.19
• Companion product to award management module
• Allows setup of multiple funding profiles for each researcher
• Researchers can browse results, bookmark, share or reject.
• Researchers able to start application via ‘one click process’
• Administrators can assist researchers with setting up and tuning profiles
as required
• Administrators can view profile activity – track module usage
| 6
2 - Fingerprint Engine - ‘Structured representation of
unstructured data’
“mines the text of scientific documents – publication
abstracts, funding announcements and awards, project
summaries, patents, proposals/applications, and other
sources – to create an index of weighted terms which defines
the text, known as a Fingerprint™ visualization.”
| 7
How does the Elsevier Fingerprint Engine work?
| 8
Fingerprinting in Pure
• Fingerprints produced on research publications, award, equipment
- Publication only in context of PoC.
- English language, title & Abstract required (Title only not supported)
- WoS UT reference *cannot* be primary source ID
• Fingerprint profile produced via cron job(s) and interaction with FPE
- Content fingerprinting
- Person and Organisation aggregation
- Project aggregation
• Fingerprint settings to allow simple config for
- Period (i.e. years coverage)
- Thresholds (i.e. ranking minimum, concept max.)
| 9
3 - Scival Funding (soon to be Funding Institutional)
• Information about grant opportunities and award recipients
• Coverage inc. Australia, Canada, European Commission, India,
Ireland, New Zealand, Singapore, South Africa, United Kingdom, and
United States.
• Service indexes funding content from over 3,500 international
sponsors, providing c24,000 active opportunities (4th Oct)
• Comprehensive, accurate, and current grant data is captured directly
from the sponsor websites.
• Content improvement program underway
| 10
The data is curated and
further enriched with:
• research discipline
classification
• disambiguated
researcher names for
awarded grants
• disambiguated affiliation
names for awarded grants
Statistics on funding opportunities, funder profiles and awarded grants
As of September 2017 Active opportunities
(currently active)
Awarded grants
(2009 – now)
Top funder types
Government 6,000 3,000,000Foundations, societies & charities 11,000 1,000,000
Academic 5,000 83,000
Top funding categories
Research grants 7,500 1,707,000Academic and training grants 7,000 415,000
Prizes 5,000 32,000
Top disciplines
Social sciences 9,500 2,900,000Medicine 7,500 830,000
Arts & humanities 4,000 450,000Engineering 3,000 300,000
Biochemistry 2,500 135,000
Top funding countries
USA 16,000 3,400,000UK 2,500 300,000
Australia 1,300 190,000Canada 1,100 410,000
Funding Content
Content about funders, funding opportunities they offer and grants they awarded
| 11
All UK Funders: 957
Academic Institutions
29%
Charities and Non-Profits
24%
Professional Societies
19%
Foundations13%
Corporations5%
State Government
4%
Local Government
4%
International Organizations
2%
UK Funders with active opportunities: 273
0 100 200 300 400
Academic Institutions
Charities and Non-Profits
Professional Societies
Foundations
Corporations
State Government
Local Government
International Organizations
All Funders
Improvement programme - Content ‘white space analysis’ by region and
assessment of data sources
| 12
Setup of trial data
• Set up sandbox for Groningen
• Extracted publications data from Scopus for named PI
• Created persons/organisations within Pure
• Created publications XML from Scopus to bulk upload
• Set up fingerprinting on output
• Aggregated fingerprints to person/organisations
Assumptions
• Only used data contained within
Scopus for PI
• No change from default
settings/thresholds for fingerprints
• Omitted additional signals from
award/project data (considered out
of scope)
Approach to setting up PoC (Pure)
| 13
PoC Design and Results
Section 2
Dr. Stephen Peuchen | Head of the UMCG Research Office, Disclosure: no personal or financial relationship with Elsevier B.V.
CURRICULUM VITAE
• B.S. Chemistry, USA
• Ph.D. Clinical Chemistry, MCV-VCU, VA, USA
• Postdoctoral Researcher, UCL, UK
• Head of UMCG Research Office, UMCG-NL
• Lecturer/tutor Master Programme Transfusion Medicine UMCG / UoG / Sanquin, The Netherlands
Profiles
Google: Peuchen RUG
LinkedIn: Stephen Peuchen
Coincidence is logical. Johan Cruyff
Funding Discovery in PURE - A Proof of Concept - PoC RIGHT ON
| 14
Institutional Policy Towards Grants
- Pick and Match work
inefficient / labour
intensive.; i.e. comms of
relevant Funding Opps.
- Bespoke solution was
sought in Q1 2016 in
collaboration with
Idox plc. Competition has
Funding DB in
combination with Web
Crawlers (since 2011).
GOAL: Increase Funding Footprint (magnitude and circumference)
| 15
Funding Disc. Module
Targeted Funding (S2) Applications
(S3,S4)
Awards (S5) - MTR
Projects (S1 & S6)
Award Management
(AMM) S1 thru S5 –
approvals – sign-offs
Grants Life
Cycle
S1: Generate
Idea
S2: Find Funding
S3: Develop Proposal
S4: Submit
Proposal
S5: Manage Award
S6: Share
Research
Grants Life Cycle & PURE
S0: Establishing Topics (lobby; out of scope)
| 16
AI /Fingerprint Engine
SciVal Funding
Matched output in PURE
List FundingOpps per PI
PURE Pubs
(Semantic) Profile
Fine Tuning
REJECT
End
ACCEPT
Superimpose Filters
PoC FlowchartSample Selection:
- Purposive
- N=4; 2 Clinicians, 2
Biologists/Biochem.
- Selection Bias?
Domains:
- Neurobiology
- Psychiatry
- Paediatrics
- Cardiology
| 17
Semantic Profile Tuning (1/2)
| 18
Semantic Profile Tuning (2/2)
PI PI1
(No. S/hits)
PI2
(No. S/hits)
PI3
(No. S/hits)
PI4
(No.S/hits)
Def. Profile 87/757 74/474 191/482 66/2385
Adj. Profile 51/235 26/635 126/685 54/1138
Input (P) 24 193 211 371
Profile Issues:
- Paradoxical increase in No. of hits with decrease in No. of concepts.
- Operator adjustment time approx. 30 mins max. per S-profile
- Categorical adjustment only with what has been matched; no booleans as of yet or
choice from MeSH e.g.
- Significant No. of False Positive Concepts (unexplained; FPE/PURE mapping?
- N=23,724 hits
w/o profile
selected.
- Results shown
with no further
selection (filters
etc.)
- Further
refinements of
semantic profiles
made in 2nd
iteration prior to
testing.
| 19
Unified Test Conditions & Scoring Sheet
• Refined Semantic Profile
• Funding Opportunity Types = Research Grants and Fellowships
• FILTER: Award Ceiling from 50.000 (USD, EUR, GBP etc.)
• FILTER: Deadline = + 6 months
• Filters result in substantial sequential decrease in number of matches.
Test Score: ACCEPT, or REJECT with REASON CODE.
Sundries: e.g. observations about skew or false negatives
| 20
PI-X: 36 matches of which 18 (50%) accepted and 50% rejected.
Reject categories / reason codes:
86% off-topic = plant or military applications.
14% On target but concerning the direct application side of ion channels and
liposomes, i.e. wrong angle
Other issues:
Skew towards US Grants (94%)
False negatives: H2020 / National Grants (NL) both career development
grants and general consortium grants.
Test Results (1/2)
| 21
• PI-Y: 246 matches of which 200 rejected (81%) and 12 accepted (6%) and
34 matches for follow-up (14%).
• PI-Z: 198 matches of which 8 accepted (4%). No reason codes given w.r.t.
rejection – time constraints.
• PI-T: 103 matches assessed of which 14 accepted (14%), 76 rejected
(74%), 11 inelligible on follow-up (11%), 2 uncertain.
Test Results (2/2)
| 22
Limitations• Small sample size (n=4)
• Selection bias 1: mid-career scientists (issue of cold start or drift not assessed).
• Influence of ‘projects’ on semantic profile not assessed (only Scopus publications used).
• Selection bias 2: limited to the Life Science and Health domain; thus leaning mostly on MeSH thesaurus.
• Measurement-bias: eligibility insufficiently examined due to incomplete source data and time constraints over-estimate of accepted matches.
• Cross sectional analysis at discrete time points concurrent with system changes system upgrades and downtime; no steady-state for testing
• Funding content heavily skewed towards US Grants; limiting its current use in the EU and other continents.
| 23
Conclusions and Next Steps
• The eco-system is viable and sufficiently specific.
• Number of accepted hits under test conditions in manageable volume.
• Selection of Grants based on continent (e.g. EU vs North America) is
essential.
• Development /incorporation of a Dutch Corpus i.e. languages other than
English essential.
• In its current state the FD module could serve as an add-on to the
currently surveyed Funding Opps from multiple sources (public and
private) but not as a replacement.
• Upscaling to a larger test (n=50 PIs?) after system improvements seems
‘logical’.
| 24
• Prof. Folkert Kuipers (Dept Paediatrics), UMCG
• Prof. Armagan Kocer (Dept. Neurosci), UMCG
• Prof. Pim van der Harst (Cardiology Dept.), UMCG
• Prof. Robert Schoevers (Psychiatry), UMCG
• Dr. Marijke Schreurs (Dept. Paediatrics), UMCG
• Dr. Irene Mateo (Cardiology Dept.), UMCG
• Dr. Anja Smykowski, (Research BV), UMCG
• University of Groningen: Dr. Jules van Rooij, R&V, Liaison RUG,
Medical Library and University Library Staff.
Acknowledgements
| 25
Future Developments
Section 3
Funding Discovery in PURE - A Proof of Concept - PoC RIGHT ON
| 26
1. Follow-up or bypass of current coverage limitations with manual upload of funding opps covering National and some EU grants to every default profile?
2. Custom filters and arithmatic on semantic concepts.
3. Easy clustering of PI’s according to PURE Research Organisation Tree, Career stage, and adjusting semantic profiles within the group.
4. Improvements in Pure UI - Reporting/analytics for funding opportunities – number of views/shares, rejection reasons etc. Improvements to semantic concept ‘tuning’ functionality for PI/Administrative staff
5. Moving beyond content filtering (e.g. similar opps) to user-input driven collaborative filtering. So how do we rate the product? (relevance? / success rate? etc), Who is rating the product? And what do they (the peers) recommend?
6. Trending Grant Topics, statistical views of its geographical use and its users and matches with PIs and clusters of PIs. (perhaps related to Topic Prominence work now in Scival)
Future Developments
| 27
www.elsevier.com/research-intelligence
Thank you! James and Stephen
Funding Discovery in PURE - A Proof of Concept - PoC RIGHT ON