Newsjunkie

5
Newsjunkie: Providing Personalized Newsfeeds via Analysis of Information Novelty -- Evgeniy Gabrilovich Fan Jin 4047 8963

Transcript of Newsjunkie

Page 1: Newsjunkie

Newsjunkie: Providing Personalized Newsfeeds via Analysis of Information Novelty

-- Evgeniy Gabrilovich

Fan Jin

4047 8963

Page 2: Newsjunkie

What’s the problem and what’s the solution?

• Users of news sites do not want to read every piece of information over and over again, they are primarily interested in learning what’s new

• Newsjunkie is designed to rank news by its novelty

• Results have been evaluated and tested with baseline

• Personalize Newsjunkie to match user’s special requirements

Page 3: Newsjunkie

A framework for comparing text collections• Features used to represent documents: vectors of TF.IDF weights

• Distance matrices are used to identify most different documents from previous read documents: Kullback-Leibler (KL) divergence

• Algorithm to rank news by novelty

R seedStory

for i = 1 to D

d dist(d, R)

R R ∪ {d}

Page 4: Newsjunkie

Evaluate results

• Data:

12 topics of news span 2-9 days, 36-328 articles in each topic

• Baseline method

Chronological ordering of articles

• Evaluation methods

People are asked to read all documents and make decision which carries most novel information

• Hypothesis testing

Wilcoxon signed-rank test, an alternative to the paired student t-test

Page 5: Newsjunkie

Personalized news updates

• A single daily update

• Reporting breaking news