Research Trends in Multimedia Content Services

29
Research Trends in Multimedia Content Services Data Mining and Web Search Group Computer and Automation Research Institute Hungarian Academy of Sciences András A. Benczúr

description

Research Trends in Multimedia Content Services. Data Mining and Web Search Group Computer and Automation Research Institute Hungarian Academy of Sciences. András A. Benczúr. Web 2.0, 3.0 …?. Platform convergence (Web, PC, mobile, television) – information vs. recreation - PowerPoint PPT Presentation

Transcript of Research Trends in Multimedia Content Services

Page 1: Research Trends in Multimedia Content Services

Research Trends in Multimedia Content Services

Data Mining and Web Search GroupComputer and Automation Research

Institute

Hungarian Academy of Sciences

András A. Benczúr

Page 2: Research Trends in Multimedia Content Services

A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008

Web 2.0, 3.0 …?

• Platform convergence (Web, PC, mobile, television) – information vs. recreation

• Emphasis on social content (blogs, Wikipedia, photo and video sharing)

• From search towards recommendation (query free, profile based, personalized)

• From text towards multimedia• Glocalization (language, geography)• Spam

Page 3: Research Trends in Multimedia Content Services

A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008

A sample service

RSSWeb 2.0

• Small screen browsing

• Recommendation based on user profile (avoid query typing)

• Read blogs, view media, …

client software

Recommender engine

Page 4: Research Trends in Multimedia Content Services

A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008

The user profile

• History stored for each user:• Known ratings, preferences,

opinion – scarce!• Items read, weighted by time spent

• details seen, scrolling, back button• Terms in documents read,

tf.idf weighted top list• User language, region, current

location and known sociodemographic data

• Multimedia!

Page 5: Research Trends in Multimedia Content Services

A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008

Same item—multiple source

Page 6: Research Trends in Multimedia Content Services

A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008

Information vs recreation: Do not mix the two?

Page 7: Research Trends in Multimedia Content Services

A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008

Spam is increasingly annoying

Page 8: Research Trends in Multimedia Content Services

A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008

Distribution of categories

Reputable 70.0%

Spam 16.5%

Weborg 0.8%

Ad 3.7%

Non-existent 7.9%

Empty 0.4%Alias 0.3%

Unknown 0.4%

Page 9: Research Trends in Multimedia Content Services

A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008

Keresési találati pozíció hatása

Talá

lati

pozí

ció n

ézé

sével tö

ltött

id

ő

Talá

lath

oz

érk

ezé

s id

eje

Page 10: Research Trends in Multimedia Content Services

A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008

Multimedia Information Retrieval

Page 11: Research Trends in Multimedia Content Services

A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008

Similar objects

Segmentation

Page 12: Research Trends in Multimedia Content Services

A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008

Class of Query Image

Pre-classified Images

VOC2007

Original Training Set

Query Images

ImageCLEF Object Retrieval Task

Page 13: Research Trends in Multimedia Content Services

A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008

Networked relation

•spam•social network analysis•churn

Page 14: Research Trends in Multimedia Content Services

A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008

Szociális hálózatok

Üzleti ADSL

Üzleti

Egyéni ADSL

Egyéni

Egyéni és üzleti ügyfelek

home

business

ADSL ---ADSL ---

Page 15: Research Trends in Multimedia Content Services

A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008

Biztosítási csalások – hálózatban

Page 16: Research Trends in Multimedia Content Services

A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008

Stacked Graphical Learning

1. Predict churn p(v) of node v2. For target node u, aggregate p(v) for

neighbors to form new feature f(u)3. Rerun classification by adding feature

f(.)4. Iterate

?u

v1

v2

v7

Page 17: Research Trends in Multimedia Content Services

A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008

Why social networks are hard to analyze

Subgraphs of social networks

Medium size dense communities attract

much algorithmic work

Tentacles induce noise

Page 18: Research Trends in Multimedia Content Services

A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008

Mapping into 2D

plain spectral

semidefinite

Page 19: Research Trends in Multimedia Content Services

A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008

Research HighlightsResearch Highlights

Recommenders: KDD Cup 2007 Task 1 First Prize

Predict the probability that a user rated a movie in 2006, based on

year –2005 training data Spam filtering: Web Spam Challenge 1 first

placeChurn prediction: method presented at

KDD Cup 2009 WorkshopTask XXXX

Page 20: Research Trends in Multimedia Content Services

A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008

Netflix: lessons and differences learned

•Ratings 1– 5 stars•Predict an unseen rating•Evaluation: RMSE•0.8572: $1,000,000 •Current leader: 0.8650• Oct/07: 0.8712KDD Cup 2007•same data set•predict existence of a rating

Page 21: Research Trends in Multimedia Content Services

A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008

Results of two separate tasks

BellKor team report [Bell, Koren 2007]:• Low rank approximation• Restricted Boltzmann Machine• Nearest neighborKDD Cup 2007: Predict probability

that a user rated a movie in 2006:• Given list of 100,000 user–movie pairs• Users and movies drawn from Netflix

Prize data setWinner report [K, B, and our colleauges

2007]

Page 22: Research Trends in Multimedia Content Services

A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008

For a given user i and movie j

where is the predicted valueKDD Cup example:• Our RMSE: 0.256• First runner up: 0.263 • All zeroes prediction: 0.279 (Place 10-13)

But why do we use RMSE and not precision/recall?

• RMSE preferes correct probability guesses for the majority unfrequently visited items

• The presence of the recommender changes usage

Evaluation and Issue 1

ji,

ijij )w(w= 22 ˆRMSE

otherwise 1

given rating no if 0=wij

ijw

Page 23: Research Trends in Multimedia Content Services

A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008

Method Overview

• Probability by naive user-movie independence• Item frequency estimation (Time Series)• User frequency estimation• Reaches RMSE 0.260 in itself (still first

place)

• Data Mining• SVD• Item-item similarities• Association Rules

• Combination (we used linear regression)

Page 24: Research Trends in Multimedia Content Services

A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008

Time series prediction

Interest remains for long time range (several years)

Page 25: Research Trends in Multimedia Content Services

A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008

Short lifetime of online items

OrigoVery different behavior in time: news articles

http://www.origo.hu/filmklub/20060124kiolte.html

Publication day

Next day usage peak

Third day

and gone …

Page 26: Research Trends in Multimedia Content Services

A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008

K-dim SVD: Noise filtering – the essence of the matrix – optimizes

• SVD explains ratings as effect of few linear factors

• RMSE (ℓ2 error) 10-30 dim: 0.93

Issue: too many news items

18K Netflix movies vs.

potentially infinite set of items

-> may recommend data source but not the item

SVD

22 ˆRMSE )A(A= ijij

use

r

movie news item

Page 27: Research Trends in Multimedia Content Services

A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008

• Content similarity might be the key feature

• Relative success of trivial estimates on KDD Cup!

• Data mining techniques overlap, apparently catch similar patterns

• Precision/recall is more important than RMSE

• Solution must make heavy use of time

Lessons learned

Page 28: Research Trends in Multimedia Content Services

A Benczur – Research Trends in Multimedia Content Services – FuturICT 28 April 2008

Future plans and ideasFuture plans and ideas

• New partners and application fields: network infrastructure, new generation services, bioinformatics, …?

• Scaling our solutions to multi-core architectures

• Use our search (cross-lingual, multimedia etc) and recommender system capabilities in major solutions; mobile, new generation platforms etc.

• Expand means of our European level collaboration, e.g. KIC participation

benczur
Knowledge and Innovation Communities
Page 29: Research Trends in Multimedia Content Services

Questions ?Andras A. Benczur

[email protected]://datamining.sztaki.hu