Post on 19-Jul-2015
Newsjunkie: Providing Personalized Newsfeeds via Analysis of Information Novelty
-- Evgeniy Gabrilovich
Fan Jin
4047 8963
What’s the problem and what’s the solution?
• Users of news sites do not want to read every piece of information over and over again, they are primarily interested in learning what’s new
• Newsjunkie is designed to rank news by its novelty
• Results have been evaluated and tested with baseline
• Personalize Newsjunkie to match user’s special requirements
A framework for comparing text collections• Features used to represent documents: vectors of TF.IDF weights
• Distance matrices are used to identify most different documents from previous read documents: Kullback-Leibler (KL) divergence
• Algorithm to rank news by novelty
R seedStory
for i = 1 to D
d dist(d, R)
R R ∪ {d}
Evaluate results
• Data:
12 topics of news span 2-9 days, 36-328 articles in each topic
• Baseline method
Chronological ordering of articles
• Evaluation methods
People are asked to read all documents and make decision which carries most novel information
• Hypothesis testing
Wilcoxon signed-rank test, an alternative to the paired student t-test
Personalized news updates
• A single daily update
• Reporting breaking news