Recommendation Engines for Scientific Literature

Recommendation Engines for Scientific

Literature

Kris Jack, PhDData Mining Team Lead

➔ 2 recommendation use cases

➔ literature search with Mendeley

➔ use case 1: related research

➔ use case 2: personalised recommendations

Summary

Use Cases

2) Personalised Recommendations● given a user's profile (e.g. interests)● find articles of interest to them

1) Related Research● given 1 research article● find other related articles

Two types of recommendation use cases:

Use Cases



My secondment(Dec-Feb):

Literature Search Using Mendeley

● Use only Mendeley to perform literature search for:● Related research● Personalised recommendations

Challenge!

Eating your own dog food...

Queries: “content similarity”, “semantic similarity”, “semantic relatedness”, “PubMed related articles”, “Google Scholar related articles”

Found:

0

Queries: “content similarity”, “semantic similarity”, “semantic relatedness”, “PubMed related articles”, “Google Scholar related articles”

Found:

1

Found:

1

Found:

2

Found:

4


Summary of Results

Strategy Num Docs Found

Comment

Catalogue Search 19 9 from “Related Research”

Group Search 0 Needs work

Perso Recommendations 45 Led to a group with 37 docs!

Found:

64


Summary of Results

Eating your own dog food... Tastes good!

Strategy Num Docs Found

Comment

Catalogue Search 19 9 from “Related Research”

Group Search 0 Needs work

Perso Recommendations 45 Led to a group with 37 docs!

Found:

64

64 => 31 docs, read 14 so far, so what do they say...?

Use Cases


Use Case 1: Related Research

User study (e.g. Likert scale to rate relatedness between documents). (Beel & Gipp, 2010)

TREC collections with hand classified 'related articles' (e.g. TREC 2005 genomics track). (Lin & Wilbur, 2007)

Try to reconstruct a document's reference list (Pohl, Radlinski, & Joachims, 2007; Vellino, 2009)

7 highly relevant papers (related research for scientific articles)

Q1/4: How are the systems evaluated?


Paper reference lists (Pohl et al., 2007; Vellino, 2009)

Usage data (e.g. PubMed, arXiv) (Lin & Wilbur, 2007)

Document content (e.g. metadata, co-citation, bibliographic coupling) (Gipp, Beel, & Hentschel, 2009)

Collocation in mind maps (Jöran Beel & Gipp, 2010)


Q2/4: How are the systems trained?


bm25 (Lin & Wilbur, 2007)

Topic modelling (Lin & Wilbur, 2007)

Collaborative filtering (Pohl et al., 2007)

Bespoke heuristics for feature extraction (e.g. in-text citation metrics for same sentence, paragraph). (Pohl et al., 2007; Gipp et al., 2009)


Q3/4: Which techniques are applied?


Topic modelling slighty improves on BM25 (MEDLINE abstracts) (Lin & Wilbur, 2007):- bm25 = 0.383 precision @ 5- PMRA = 0.399 precision @ 5

Seeding CF with usage data from arXiv won out over using citation lists (Pohl et al., 2007)

Not yet found significant results that show content-based or CF methods are better for this task


Q4/4: Which techniques have most success?


Progress so far...

Q1/2 How do we evaluate our system?

Construct a non-complex data set of related research:● include groups with 10-20 documents (i.e. topics)● no overlaps between groups (i.e. documents in common)● only take documents that are recognised as being in English● document metadata must be 'complete' (i.e. has title, year, author, published in, abstract, filehash, abstract, tags/keywords/MeSH terms)

→ 4,382 groups → mean size = 14 → 60,715 individual documents

Given a doc, aim to retrieve the other docs from its group● tf-idf with lucene implementation


Progress so far...


Construct a non-complex data set of related research:● include groups with 10-20 documents (i.e. topics)● no overlaps between groups (i.e. documents in common)● only take documents that are recognised as being in English● document metadata must be 'complete' (i.e. has title, year, author, published in, abstract, filehash, abstract, tags/keywords/MeSH terms)

→ 4,382 groups → mean size = 14 → 60,715 individual documents

Given a doc, aim to retrieve the other docs from its group

title

year

author

publishedIn

fileHash

abstract

generalKeyw

o rd

meshT

erms

keywords

tags

0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

90.00%

100.00%

Metadata Presence in Documents

Evaluation Data Det

Group

Catalogue

metadata field

% o

f do

cum

en

ts th

at f

ield

ap

pe

ars

in


Progress so far...

Q2/2 What are our results?

abstracttitle

generalKeywordmesh-term

authorkeyword

tag

0

0.05

0.1

0.15

0.2

0.25

0.3

tf-idf Precision per Field for Complete Data Set

metadata field

Pre

cisi

on

@ 5


Progress so far...


tag abstract mesh-term title general-keyword author keyword0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

tf-idf Precision per Field when Field is Available

metadata field

Pre

cisi

on

@ 5


Progress so far...


BestCombo = abstract+author+general-keyword+tag+title

bestComboabstract

titlegeneralKeyword

mesh-termauthor

keywordtag

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

tf-idf Precision for Field Combos for Complete Data Set

metadata field(s)

pre

cisi

on

@ 5


Progress so far...


BestCombo = abstract+author+general-keyword+tag+title

tagbestCombo

abstractmesh-term

titlegeneral-keyword

authorkeyword

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

tf-idf Precision for Field Combos when Field is Available

metadata field(s)

pre

cisi

on

@ 5


Future directions...?

Evaluate multiple techniques on same data set

Construct public data set● similar to current one but with data from only public groups● analyse composition of data set in detail

Train:● content-based filtering● collaborative filtering● hybrid

Evaluate the different systems on same data set

...and let's brainstorm!

Use Cases


Use Case 2: Perso Recommendations

Cross validation on user libraries (Bogers & van Den Bosch, 2009; Wang & Blei, 2011)

User studies (McNee, Kapoor, & Konstan, 2006; Parra-Santander & Brusilovsky, 2009)

7 highly relevant papers (perso recs for scientific articles)

Q1/4: How are the systems evaluated?


CiteULike libraries (Bogers & van Den Bosch, 2009; Parra-Santander & Brusilovsky, 2009; Wang & Blei, 2011)

Documents represent users and their citations documents of interest (McNee et al., 2006)

User search history (N Kapoor et al., 2007)


Q2/4: How are the systems trained?


CF (Parra-Santander & Brusilovsky, 2009; Wang & Blei, 2011)

LDA (Wang & Blei, 2011)

Hybrid of CF + LDA (Wang & Blei, 2011)

BM25 over tags to form user neighbourhood (Parra-Santander & Brusilovsky, 2009)

Item-based and content-based CF (Bogers & van Den Bosch, 2009)

User-based CF, Naïve Bayes classifier, Probabilistic Latent Semantic Indexing, textual TF-IDF-based algorithm (uses document abstracts) (McNee et al., 2006)


Q3/4: Which techniques are applied?


CF is much better than topic modelling (Wang & Blei, 2011)

CF-topic modelling hybrid, slightly outperforms CF alone (Wang & Blei, 2011)

Content-based filtering performed slightly better than item-based filtering on a test set with 1,322 CiteULike users (Bogers & van Den Bosch, 2009)

User-based CF and tf-idf outperformed Naïve Bayes and Probabilistic Latent Semantic Indexing significantly (McNee et al., 2006)

BM25 gave better results than CF but the study was with just 7 CiteULike users so small scale (Parra-Santander & Brusilovsky, 2009)






Advantage Disadvantage

Content-based

Human readable form of their profile

Quickly absorb new content without need for ratings

Tends to over-specialise

CF Works on an abstract item-user level so you don't need to 'understand' the content

Tends to give more novel and creative recommendations

Requires a lot of data


Our progress so far...


Construct an evaluation data set from user libraries● 50,000 user libraries● 10-fold cross validation● libraries vary from 20-500 documents● preference values are binary (in library = 1; 0 otherwise)

Train:● item-based collaborative filtering recommender

Evaluate:● train recommender and test how well it can reconstruct the users' hidden testing libraries● mulitple similarity metrics (e.g. cooccurrence, loglikelihood)




Cross validation:● 0.1 precision @ 10 articles

Usage logs:● 0.4 precision @ 10 articles




Pre

cis i

on a

t 10

art

icle

s

Number of articles in user library


Future directions...?

Q2/2 What are our results?Evaluate multiple techniques on same data set

Construct data set● similar to current one but with more up-to-date data● analyse composition of data set in detail

Train:● content-based filtering● collaborative filtering (user-based and item-based)● hybrid

Evaluate the different systems on same data set

...and let's brainstorm!

➔ 2 recommendation use cases

➔ similar problems and techniques

➔ good results so far

➔ combining CF with content would likely improve both

Conclusion

www.mendeley.com

Beel, Jöran, & Gipp, B. (2010). Link Analysis in Mind Maps : A New Approach to Determining Document Relatedness.  Mind, (January). Citeseer. Retrieved from http://scholar.google.com/scholar?hl=en&btnG=Search&q=intitle:Link+Analysis+in+Mind+Maps+:+A+New+Approach+to+Determining+Document+Relatedness#0Bogers, T., & van Den Bosch, A. (2009). Collaborative and Content-based Filtering for Item Recommendation on Social Bookmarking Websites. ACM RecSys ’09 Workshop on Recommender Systems and the Social Web. New York, USA. Retrieved from http://ceur-ws.org/Vol-532/paper2.pdfGipp, B., Beel, J., & Hentschel, C. (2009). Scienstein: A research paper recommender system. Proceedings of the International Conference on Emerging Trends in Computing (ICETiC’09) (pp. 309–315). Retrieved from http://www.sciplore.org/publications/2009-Scienstein_-_A_Research_Paper_Recommender_System.pdfKapoor, N, Chen, J., Butler, J. T., Fouty, G. C., Stemper, J. A., Riedl, J., & Konstan, J. A. (2007). Techlens: a researcher’s desktop. Proceedings of the 2007 ACM conference on Recommender systems (pp. 183-184). ACM. doi:10.1145/1297231.1297268Lin, J., & Wilbur, W. J. (2007). PubMed related articles: a probabilistic topic-based model for content similarity. BMC Bioinformatics, 8(1), 423. BioMed Central. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/17971238McNee, S. M., Kapoor, N., & Konstan, J. A. (2006). Don’t look stupid: avoiding pitfalls when recommending research papers. Proceedings of the 2006 20th anniversary conference on Computer supported cooperative work (p. 180). ACM. Retrieved from http://portal.acm.org/citation.cfm?id=1180875.1180903Parra-Santander, D., & Brusilovsky, P. (2009). Evaluation of Collaborative Filtering Algorithms for Recommending Articles. Web 3.0: Merging Semantic Web and Social Web at HyperText ’09 (pp. 3-6). Torino, Italy. Retrieved from http://ceur-ws.org/Vol-467/paper5.pdfPohl, S., Radlinski, F., & Joachims, T. (2007). Recommending related papers based on digital library access records. Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries (pp. 418-419). ACM. Retrieved from http://portal.acm.org/citation.cfm?id=1255175.1255260Vellino, A. (2009). The Effect of PageRank on the Collaborative Filtering Recommendation of Journal Articles. Retrieved from http://cuvier.cisti.nrc.ca/~vellino/documents/PageRankRecommender-Vellino2008.pdfWang, C., & Blei, D. M. (2011). Collaborative topic modeling for recommending scientific articles. Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 448–456). ACM. Retrieved from http://dl.acm.org/citation.cfm?id=2020480

References

Recommendation Engines for Scientific Literature

Technology

Transcript of Recommendation Engines for Scientific Literature