REC4LRW – SCIENTIFIC PAPER RECOMMENDER SYSTEM FOR
LITERATURE REVIEW AND WRITING
Aravind Sesagiri Raamkumar, Schubert Foo & Natalie Pang
Wee Kim Wee School of Communcation and InformationNanyang Technological University, Singapore
Presentation for ICADIWT’1512th February 2015
What are we concerned about?
“How to get the best set of relevant documents for a researcher’s literature review and publication purposes?”
How (Process)
+
Relevant (User-specific)
+
Literature Review & Publication (Requirement)
RELATED AREAS OF RESEARCH Literature Review
To enumerate the different stages, steps and activities involved in a researcher’s literature review
Scientific Information Seeking
Information Behavior (IB) research has modeled user information seeking activities at an abstract level (Case, 2012)
Recommender Systems (RS)
The most relevant area as it can collect user requirements in flexilble manner along with personalization, use wisdom of crowd and provide output at any stage and in different forms (Burke, 2002)
RECOMMENDER SYSTEMS (RS)What is a Recommender System?
“Any system that produces individualized recommendations as output or has the effect of guiding the user in a personalized way to interesting or useful objects in a large space of possible objects” (Burke, 2002)
(Source: IMDB.com)
Why Recommender Systems are required?
•Inability of Information Retrieval (IR) systems in capturing contextual dimensions
•Inability of current systems in providing personalized outputs
USE OF RS IN SCHOLARLY DOMAINRS have been previously used for scholarly recommendations for the following scenarios:
•Identifying conference reviewers (Basu et al, 2001)
•Identifying topical experts (Chen et al, 2013)
•Identifying potential co-authors for a paper (Huynh et al, 2012)
•Recommending similar research papers (Liang et al, 2011)
•Recommending reading list of papers (Ekstrand et al, 2010)
Techniques used in RS
•Collaborative Filtering (CF)
•Content-based (CB) recommendation algorithm (more or less IR)
•Hybrid versions involving CF and CB, combined with techniques such as topic models, language models, and citation graphs
RELATED WORKRecommending papers for information seeking tasks (Mcnee, 2006)
• Theoretical model – “Human Recommender Interaction (HRI)” conceptualized for recommending papers for six information seeking tasks
• Experience level connected to RS metrics through aspects (for e.g. correctness, trust)
Recommending reading list of research papers
• CF recommender reinforced with graph ranking algorithms (PageRank, HITS and SALSA) (Ekstrand et al, 2010)
• Latent Dirichlet Allocation (LDA) (Jardine, 2014) and hybrid approaches based on multiple similarity measures (Bae et al, 2014)
Finding similar papers based on a seed set of papers
• Metadata-based similarity (Martin et al, 2013) and citation-based similarity (Liang, 2011) approaches to identify relevant papers,
• Data items such as title, abstract, keywords, bibliographic references and citation web are used
Few online stand-alone citation RS
• RefSeer is a citation RS built on top of CiteSeer digital library data (Hwang, 2003)• theadvisor is a recent online citation RS that recommends papers based on a seed set of
papers (Küçüktunç, 2013)• Docear is a reference management tool with an inbuilt recommendation module (Beel, 2013 )
WHAT’S MISSING THEN?Plentitude of diverse techniques with different data items
⇒Difficult proposition for replication
⇒Lack of intermediate structure
Lack of interconnection between sequential tasks
=> Researchers’ selection of papers evolves through tasks in a natural setting
Use of ‘Article Type’ as a contextual dimension
⇒Article type ranges from journal survey/review papers, journal case studies to conference long papers and short papers
⇒Useful in shortlisting papers for inclusion in manuscripts
CRITERIA USED IN REC4LRW (1)First set of criteria for capturing the relations between Research paper and its bibliography
•References Count (RC)
• Data has the potential for setting the number of the recommendations in the recommendations list provided to the user
•Grey Literature Percentage (GL)
• Non-scientific references which are yet to be formally published are referred to as grey literature
• Intended to be used for the purpose of calculating the extent of inclusion of grey literature references in papers
•Coverage (C)
• Measures the ability of the bibliography in covering the important papers for the topic(s) being addressed in the main paper
CRITERIA USED IN REC4LRW (2)Second set of criteria for capturing the relations between the research paper and each reference in the bibliography
•Recency (RE)• Shows how recent the referenced papers are in the bibliographies of papers• Calculated by finding difference in years between the publication date of the parent
paper and references
•Textual Similarity (TS)• For calculating the topical similarity between the parent paper and the references• Semantic Textual Similarity (STS) and Letter-pair Similarity are the preferred
methods
•Specificity (S)• A vertical characteristic as it looks at the relations from a top-down perspective
(similar to broad-narrow relations in theasuri)• Measurement will make use of the keywords specified by the author(s) in the
article metadata
•Citation Count (CC)• To identify the extent to which citation count of references is given importance in
the target artefact
TASKS HANDLED IN RECLRWLiterature Review
Task 1: Building a reading list of research papers
Task 2: Finding similar papers based on a set of papers
Manuscript Writing
Task 3: Shortlisting papers from the final reading list for inclusion in manuscript based on article type
METHODOLOGYStages
Stage 1 (S1): Criteria Measurement for the articles in ACM Dataset
Stage 2 (S2): Building of Recommender System
Stage 3 (S3): Offline and User Evaluation
Dataset used
Prominent sources such as CiteSeer, ACL and CiteUlike were considered
ACM dataset was shortlisted as it provides a extensive set of research articles from the Computer Science discipline along with full text for majority of papers
Feature Periodicals Proceedings
Count of total articles 77437 84111
Count of articles satisfying qualification requirement
19040 20022
Period covered 1954-2011 1951-2011
DEVELOPMENT OF REC4LRW
Technical DetailsDatabases for storage and basic querying
•BaseX XML Store and MySQL
Implementation of CB recommender
•Apache Lucene used for text search based retrieval process for the CB recommender
Implementation of CF recommender
•Apache Mahout used for building CF recommender system
Web application for conducting the user experiments
•A custom web application using PHP will be built so that the experiment URL could be sent to participants
CLOSING REMARKS Application of RS in academic databases and digital libraries provides benefits for both researchers and system designers
Rec4LRW addresses:
• The whole lifecycle of scientific publication through interconnected tasks• With Flexible recommender criteria• With Customizable recommendation techniques
Offline evaluations and user evaluations will be conducted to verify the effectiveness of Rec4LRW
Top Related