Movie topics- Efficient features for movie recommendation systems

24
Efficient Features for Movie Recommendation Systems Project presentation Suvir Bhargav

description

User written movie reviews carry substantial amounts of movie related features such as description of location, time period, genres, characters, etc. Using natural language processing and topic modeling based techniques, it is possible to extract features from movie reviews and find movies with similar features.

Transcript of Movie topics- Efficient features for movie recommendation systems

Page 1: Movie topics- Efficient features for movie recommendation systems

Efficient Features for Movie Recommendation

Systems

Project presentation

Suvir Bhargav

Page 2: Movie topics- Efficient features for movie recommendation systems

Outline

● Motivation and Why movie reviews● Problem statement● How? or the overall system ● Text preprocessing approaches● Postprocessing: movie topics from a reviews

corpus● Similarity● Experimental setup and results

Page 3: Movie topics- Efficient features for movie recommendation systems

Thanks to Sean Lind, source: http://www.silveroakcasino.com/blog/posts/netflix/what-to-watch-on-netflix.html

Motivation

Page 4: Movie topics- Efficient features for movie recommendation systems

Motivation

● movie genres are not enough.● classify movies

○ keywords○ moods○ imdb ratings○ micro genres

Page 5: Movie topics- Efficient features for movie recommendation systems

micro genres

source: http://www.theatlantic.com/technology/archive/2014/01/how-netflix-reverse-engineered-hollywood/282679/

Page 6: Movie topics- Efficient features for movie recommendation systems

Why movie reviews?

Source: a sample user written movie review from imdb

Page 7: Movie topics- Efficient features for movie recommendation systems

Problem statement

● Feature extraction from user reviews of movies

● Use extracted features to find similar movies.

Page 8: Movie topics- Efficient features for movie recommendation systems

The overall system

Movie reviews corpus● preprocessing

○ tokenization, stopwords, lemmatized.

● post processing○ topic modeling: Movie topics from a reviews corpus

● similarity measure○ return movies with similar topics distribution

Page 9: Movie topics- Efficient features for movie recommendation systems

tokenization, stopwords, lemmatized.

Simple information extraction

Text preprocessing

Figure credit to nltk book.

Page 10: Movie topics- Efficient features for movie recommendation systems

Post processing

Document representation: Vector Space Model (VSM)

Picture credit: pyevolve

Page 11: Movie topics- Efficient features for movie recommendation systems

Post processing: generative model

source: David blei’s slide

Page 12: Movie topics- Efficient features for movie recommendation systems

Post processing: LDA

For each document in the collection, the words can be generated in two stage process1) Randomly choose a distribution over topics.2) For each word in the document

a) Randomly choose a topic from the distribution over topics in step 1.

b) Randomly choose a word from the corresponding distribution over the vocabulary

Documents exhibit multiple topics

Page 13: Movie topics- Efficient features for movie recommendation systems

Movie topics from a reviews corpus

Page 14: Movie topics- Efficient features for movie recommendation systems

Similarity Measure

● Cosine Similarity● KL divergence● Hellinger distance

Page 15: Movie topics- Efficient features for movie recommendation systems

Cosine Similarity

Similarity Measure

Page 16: Movie topics- Efficient features for movie recommendation systems

Hellinger Distance

Similarity Measure

Page 17: Movie topics- Efficient features for movie recommendation systems

The overall system: implementation

Movie reviews corpus● preprocessing

○ nltk and gensim’s simple preprocessing.

● post processing○ gensim python wrapper to MALLET○ index topic distribution of query movies, q and 1k

movies corpus, C.

● similarity measure○ python numpy implementation○ apply distance metric on indexed q and C.○ sort and pick top 5 movies.

Page 18: Movie topics- Efficient features for movie recommendation systems

Experimental setup

Movie reviews corpus of 1k movies

reviews data source: imdb

Page 19: Movie topics- Efficient features for movie recommendation systems

Evaluation criteria

Experimental setup

Page 20: Movie topics- Efficient features for movie recommendation systems

Conclusion

● Movie topics as efficient features for RS○ represents movies by underlying semantic patterns

○ useful for capturing movie genre and mood.

○ but not so well with plot.

○ user written movie reviews are useful movie meta-data.

● The developed prototype○ easy to add more movie meta-data

○ python allows scalability.

○ Topics as an explanation needs further tuning.

Page 21: Movie topics- Efficient features for movie recommendation systems

Future directions

● Movie review preprocessing○ bigram, trigrams.○ create multi-word movie keywords or language

construction

● Building complex topic models○ Hierarchical LDA○ author-topic model

■ include authorship information.■ similarity between authors

Page 22: Movie topics- Efficient features for movie recommendation systems

Questions ?

Thank You

Image src: http://www.brinvy.biz/177215/batman-catching-a-ride-on-supermans-back-funny-hd-wallpaper-x.html

Page 23: Movie topics- Efficient features for movie recommendation systems

Extra slides

List of extra slides and notes● Original LDA paper● introduction to probabilistic topic modeling● and A. Huang’s Similarity measures for text document

clustering● Another good LDA description● Integrating out multinomial parameters in LDA● language construction in micro genres

Page 24: Movie topics- Efficient features for movie recommendation systems

LDA