Download - Rec4LRW – Scientific Paper Recommender System for Literature Review and Writing

REC4LRW – SCIENTIFIC PAPER RECOMMENDER SYSTEM FOR

LITERATURE REVIEW AND WRITING

Aravind Sesagiri Raamkumar, Schubert Foo & Natalie Pang

Wee Kim Wee School of Communcation and InformationNanyang Technological University, Singapore

Presentation for ICADIWT’1512th February 2015

What are we concerned about?

“How to get the best set of relevant documents for a researcher’s literature review and publication purposes?”

How (Process)

+

Relevant (User-specific)

+

Literature Review & Publication (Requirement)

RELATED AREAS OF RESEARCH Literature Review

To enumerate the different stages, steps and activities involved in a researcher’s literature review

Scientific Information Seeking

Information Behavior (IB) research has modeled user information seeking activities at an abstract level (Case, 2012)

Recommender Systems (RS)

The most relevant area as it can collect user requirements in flexilble manner along with personalization, use wisdom of crowd and provide output at any stage and in different forms (Burke, 2002)

RECOMMENDER SYSTEMS (RS)What is a Recommender System?

“Any system that produces individualized recommendations as output or has the effect of guiding the user in a personalized way to interesting or useful objects in a large space of possible objects” (Burke, 2002)

(Source: IMDB.com)

Why Recommender Systems are required?

•Inability of Information Retrieval (IR) systems in capturing contextual dimensions

•Inability of current systems in providing personalized outputs

USE OF RS IN SCHOLARLY DOMAINRS have been previously used for scholarly recommendations for the following scenarios:

•Identifying conference reviewers (Basu et al, 2001)

•Identifying topical experts (Chen et al, 2013)

•Identifying potential co-authors for a paper (Huynh et al, 2012)

•Recommending similar research papers (Liang et al, 2011)

•Recommending reading list of papers (Ekstrand et al, 2010)

Techniques used in RS

•Collaborative Filtering (CF)

•Content-based (CB) recommendation algorithm (more or less IR)

•Hybrid versions involving CF and CB, combined with techniques such as topic models, language models, and citation graphs

RELATED WORKRecommending papers for information seeking tasks (Mcnee, 2006)

• Theoretical model – “Human Recommender Interaction (HRI)” conceptualized for recommending papers for six information seeking tasks

• Experience level connected to RS metrics through aspects (for e.g. correctness, trust)

Recommending reading list of research papers

• CF recommender reinforced with graph ranking algorithms (PageRank, HITS and SALSA) (Ekstrand et al, 2010)

• Latent Dirichlet Allocation (LDA) (Jardine, 2014) and hybrid approaches based on multiple similarity measures (Bae et al, 2014)

Finding similar papers based on a seed set of papers

• Metadata-based similarity (Martin et al, 2013) and citation-based similarity (Liang, 2011) approaches to identify relevant papers,

• Data items such as title, abstract, keywords, bibliographic references and citation web are used

Few online stand-alone citation RS

• RefSeer is a citation RS built on top of CiteSeer digital library data (Hwang, 2003)• theadvisor is a recent online citation RS that recommends papers based on a seed set of

papers (Küçüktunç, 2013)• Docear is a reference management tool with an inbuilt recommendation module (Beel, 2013 )

http://refseer.ist.psu.edu/

http://theadvisor.osu.edu/

http://www.docear.org/

WHAT’S MISSING THEN?Plentitude of diverse techniques with different data items

⇒Difficult proposition for replication

⇒Lack of intermediate structure

Lack of interconnection between sequential tasks

=> Researchers’ selection of papers evolves through tasks in a natural setting

Use of ‘Article Type’ as a contextual dimension

⇒Article type ranges from journal survey/review papers, journal case studies to conference long papers and short papers

⇒Useful in shortlisting papers for inclusion in manuscripts

INTRODUCING REC4LRW…- A Scientific Paper RS for Literature Review and Writing

CRITERIA USED IN REC4LRW (1)First set of criteria for capturing the relations between Research paper and its bibliography

•References Count (RC)

• Data has the potential for setting the number of the recommendations in the recommendations list provided to the user

•Grey Literature Percentage (GL)

• Non-scientific references which are yet to be formally published are referred to as grey literature

• Intended to be used for the purpose of calculating the extent of inclusion of grey literature references in papers

•Coverage (C)

• Measures the ability of the bibliography in covering the important papers for the topic(s) being addressed in the main paper

CRITERIA USED IN REC4LRW (2)Second set of criteria for capturing the relations between the research paper and each reference in the bibliography

•Recency (RE)• Shows how recent the referenced papers are in the bibliographies of papers• Calculated by finding difference in years between the publication date of the parent

paper and references

•Textual Similarity (TS)• For calculating the topical similarity between the parent paper and the references• Semantic Textual Similarity (STS) and Letter-pair Similarity are the preferred

methods

•Specificity (S)• A vertical characteristic as it looks at the relations from a top-down perspective

(similar to broad-narrow relations in theasuri)• Measurement will make use of the keywords specified by the author(s) in the

article metadata

•Citation Count (CC)• To identify the extent to which citation count of references is given importance in

the target artefact

CRITERIA MEASUREMENT STEPS

TASKS HANDLED IN RECLRWLiterature Review

Task 1: Building a reading list of research papers

Task 2: Finding similar papers based on a set of papers

Manuscript Writing

Task 3: Shortlisting papers from the final reading list for inclusion in manuscript based on article type

FIRST TASK IN REC4LRW

SECOND TASK IN REC4LRW

THIRD TASK IN REC4LRW

WORKFLOW OF TASKS IN REC4LRW USER INTERACE

METHODOLOGYStages

Stage 1 (S1): Criteria Measurement for the articles in ACM Dataset

Stage 2 (S2): Building of Recommender System

Stage 3 (S3): Offline and User Evaluation

Dataset used

Prominent sources such as CiteSeer, ACL and CiteUlike were considered

ACM dataset was shortlisted as it provides a extensive set of research articles from the Computer Science discipline along with full text for majority of papers

Feature Periodicals Proceedings

Count of total articles 77437 84111

Count of articles satisfying qualification requirement

19040 20022

Period covered 1954-2011 1951-2011

DEVELOPMENT OF REC4LRW

Technical DetailsDatabases for storage and basic querying

•BaseX XML Store and MySQL

Implementation of CB recommender

•Apache Lucene used for text search based retrieval process for the CB recommender

Implementation of CF recommender

•Apache Mahout used for building CF recommender system

Web application for conducting the user experiments

•A custom web application using PHP will be built so that the experiment URL could be sent to participants

CLOSING REMARKS Application of RS in academic databases and digital libraries provides benefits for both researchers and system designers

Rec4LRW addresses:

• The whole lifecycle of scientific publication through interconnected tasks• With Flexible recommender criteria• With Customizable recommendation techniques

Offline evaluations and user evaluations will be conducted to verify the effectiveness of Rec4LRW

THANK YOU