A Non-Intrusive Movie Recommendation System

20
A Non-intrusive Movie Recommendation System Tania Farinella 1 , Sonia Bergamaschi 2 , and Laura Po 2 1 vfree.tv GmbH, München, Germany 2 Department of Engineering “Enzo Ferrari” , University of Modena and Reggio Emilia, Italy The 11th International Conference on Ontologies, DataBases, and Applications of Semantics (ODBASE 2012) 11-12 Sept 2012, Roma, Italy

description

Tania Farinella , Sonia Bergamaschi, Laura Po, A non-intrusive Movie Recommendation System. In: Meersman, R.; Panetto, H.; Dillon, T.; Rinderle-Ma, S.; Dadam, P.; Zhou, X.; Pearson, S.; Ferscha, A.; Bergamaschi, S.; Cruz, I.F. (Eds.). On the Move to Meaningful Internet Systems: OTM 2012. Roma, 10-14 September 2012, vol. 978-3-642-33614-0, p. 736-751, Heidelberg:Springer, ISBN: 9783642336140

Transcript of A Non-Intrusive Movie Recommendation System

Page 1: A Non-Intrusive Movie Recommendation System

A Non-intrusive Movie Recommendation System

Tania Farinella1, Sonia Bergamaschi2, and Laura Po2

1 vfree.tv GmbH, München, Germany

2 Department of Engineering “Enzo Ferrari”, University of Modena and Reggio Emilia, Italy

The 11th International Conference on Ontologies, DataBases, and Applications of Semantics (ODBASE 2012)11-12 Sept 2012, Roma, Italy

Page 2: A Non-Intrusive Movie Recommendation System

A Non-intrusive Movie Recommendation System 2

Recommendation systems

Recommendation systems are a type of information filtering systems that recommend products available in e-shops, entertainment items (books music, videos, Video on Demand, books, news, images, events etc.) or people (e.g. on dating sites) that are likely to be of interest to the user.

Recommendation algorithms are the basis of the targeted advertisements that account for most commercial sites’ revenues.

The DVD rental site Netflix deemed its recommendation algorithms important enough that it offered a million-dollar prize in 2009 to anyone who could improve their predictions by 10%.

Page 3: A Non-Intrusive Movie Recommendation System

A Non-intrusive Movie Recommendation System 3

Page 4: A Non-Intrusive Movie Recommendation System

A Non-intrusive Movie Recommendation System 4

Movie Recommendation systems

The widely utilized collaborative filtering systems focus on the analysis of user roles or user ratings of the items. These systems decrease their performance at the start-up phase and due to privacy issues, when a user hides most of his personal data.

Content-based recommendation systems compare movie features to suggest similar multimedia contents; these systems are based on less invasive observations, however they find some difficulties to supply tailored suggestions.

Page 5: A Non-Intrusive Movie Recommendation System

A Non-intrusive Movie Recommendation System 5

Movie Recommendation systems

There are dozens of movie recommendation engines on the Web. Some require little or no input before they give you titles, while others want to find out exactly what your interests are.

Netflix asks you to rate movies to determine which films you'll want to see next.

Rotten Tomatoes - you can tell it what kind of films you enjoy, which actors you want to see, and other criteria to help it find the best movie for you.

Movielens evaluates your tastes based on ratings to films you've seen before. Once you rate 15 movies, it returns recommendations.

IMDb instead of asking you to input ratings or to tell it what movies you like, IMDb automatically recommends similar films to the movie you search for.

Jinni is able to find films based on your mood, time available, setting, or reviews

Page 6: A Non-Intrusive Movie Recommendation System

A Non-intrusive Movie Recommendation System 6

Goal -> Plot-based Movie RS

Our goal was to develop a recommendation

system that does not need to access personal information nor preferences of the user.

We developed a plot-based recommendation system, which is based upon an evaluation of similarity among the plot of a video that was watched by the user and a large amount of plots that is stored in a movie database.

Since it is independent from the number of user ratings, it is able to propose famous and beloved movies as well as old or unheard movies or programs that are still strongly related to the content of the video the user has watched.

Page 7: A Non-Intrusive Movie Recommendation System

A Non-intrusive Movie Recommendation System 7

Plot-based Movie RS

The similarity of the plots can be combined with the similarity of other features such as directors, genre, producers, release year, cast etc

Movies can be compared considering only plot

Local database

movie selected by the user

Page 8: A Non-Intrusive Movie Recommendation System

A Non-intrusive Movie Recommendation System 8

Local database

Our intention was to generate an extensible and reliable collection of multimedia metadata

We studied the major movie repositories:

Internet Movie Database (IMDb)

Dbpedia

Open Movie Database (TMDb)

The choice of MongoDB - As MongoDB does not require a fixed schema, different collections in the databases may store different attributes.

Cast&Crew

Movie Person

IMDB Movie

Collection

IMDB Actor

Collection

TMDB Film

Collection

Page 9: A Non-Intrusive Movie Recommendation System

A Non-intrusive Movie Recommendation System 9

Vector space model

Each plot is represented as a vector of keywords with associated weights

Weights depend on the distribution of keywords in the given training set of plots

Matrix keyword1

keyword2

plot a

plot b wb,2

plot c

The weight of keyword 2 according to plot b

Page 10: A Non-Intrusive Movie Recommendation System

A Non-intrusive Movie Recommendation System 10

Compute weights

Pre-processing: plots need to be converted into vectors of keywords

stop words removal

lemmatization by TreeTagger

1st step: weights are defined as occurrences of keywords in the description

2nd step: weights are modified by tf-idf or log techniques

3th step: the matrix is trasformed by performing Latent Semantic Analysis

Page 11: A Non-Intrusive Movie Recommendation System

A Non-intrusive Movie Recommendation System 11

TF-IDF vs LOG Two different techniques have been used for

computing keyword weights.

A weight represents the relevance of a specific keyword according to a specific text.

Weights depend on the local distribution of the keyword within the text as well as on the global distribution of the keyword in the whole corpus of descriptions.

tf-idf and log are not able to recognize neither synonyms (e.g. market and shop) nor hyponyms (e.g. Ferrari and cars); in contrast, the use of LSA emphasizes underlying semantic.

Page 12: A Non-Intrusive Movie Recommendation System

A Non-intrusive Movie Recommendation System 12

Page 13: A Non-Intrusive Movie Recommendation System

A Non-intrusive Movie Recommendation System 13

LSA The LSA consists of a Singular Value

Decomposition (SVD) of the vector matrix T followed by a Rank lowering. Each row and column of the resulting matrix T’can be represented as a vector combination of the eigenvectors of the matrix T’T’T

Where the coefficients of the above formula represent how strong the relationship between a keyword (or a description) and a topic eigenvector is. The eigenvectors define the so-called topic space, thus, the coefficients related to a vector v represent a topic vector.

Page 14: A Non-Intrusive Movie Recommendation System

A Non-intrusive Movie Recommendation System 14

LSA – topic vectors The topic representation of the keywords which is used

as a natural language model in order to compare texts.

Topic vectors may be useful for three main reasons:

(1) as the number of topics that is equal to the number of non-zero eigenvectors is usually significantly lower than the number of keywords, the topic representation of the descriptions is more compact;

(2) the topic representation of the keywords makes possible to add movies that have been released after the definition of the matrix T’ without re-computing the matrix T’;

(3) to find similar movies starting from a given one, we just need to compute the topic vectors for the plot of the movie and then compare these vectors with the ones we have stored in the matrix T’finding the top relevant.

Page 15: A Non-Intrusive Movie Recommendation System

A Non-intrusive Movie Recommendation System 15

Evaluation

We have performed several evaluations:

Manual tests to evaluate and compare the weighting techniques.

An evaluation of the computational costs of the model and its approximations

An evaluation of the user judgments on the lists of recommended movies from IMDb and LSA

Page 16: A Non-Intrusive Movie Recommendation System

A Non-intrusive Movie Recommendation System 16

Manual tests

1 ) Results obtained by applying tf-idf and log techniques on the local database show slight differences, and the quality does not seem to be significantly different.

2 ) A noticeable quality improvement can be achieved by applying the LSA technique. The outcome of Latent Semantic Analysis is superior to other techniques such as tf-idf or log.

3 ) LSA over tf-idf and LSA over log techniques gives comparable outcomes.

Page 17: A Non-Intrusive Movie Recommendation System

A Non-intrusive Movie Recommendation System 17

Computational costs evaluation

Given a target plot, all the other plots in the database can be ranked according to their similarity in about 42 seconds.

To further decrease similarity time consumption, three LSA models have been built using different assumptions.

Page 18: A Non-Intrusive Movie Recommendation System

A Non-intrusive Movie Recommendation System 18

User judgement Our goal was to compare

our system recommendations with respect to the IMDb recommendations.

We asked users to judge the recommendations proposed for the 18 most famous movies.

For each title we selected 6 recommended movies with our algorithm and 6 recommended movies suggested by IMDb.

Then, we asked users to select which movies are the most similar to the target one.

Page 19: A Non-Intrusive Movie Recommendation System

A Non-intrusive Movie Recommendation System 19

User judgement – Results

Comments on the results IMDb system advertises famous movies, while LSA proposes even old or

unknown movies -> very difficult to gain votes for LSA recommendations

IMDb uses factors such as user votes, genre, title, keywords, and, most importantly, user recommendations themselves to generate an automatic response., this system requires a great deal of human support.

By using the statistical techniques employed in NLP we attempt to find fully automatic ways of generating these strictly content-based recommendations.

While our system does not outperform the commercial approach, it can be used to make recommendations when knowledge of users’ preferences is not available.

Total number of movies = 18

4 cases = LSA selected the best recommendations

10 cases = IMDb selected the best recommendations

4 cases = the systems has the same performance

Page 20: A Non-Intrusive Movie Recommendation System

A Non-intrusive Movie Recommendation System 20

Conclusion The system developed take advantage of the NLP

techniques and is able to make recommendations without access to user preference information.

3 algorithms (tf-idf, log and LSA) have been used that take advantage of information in the natural language descriptions of the plot.

We evaluated these algorithms and found out that LSA outperformed log and tf-idf algorithms.

We compared our system to a commercial system that relies heavily on human volunteered effort. While our system does not outperform the commercial approach, it is much less labor intensive and can be ported to any domain where natural language descriptions exist.