Content-based Recommendation Systems Group: Tippy.

29
Content-based Recommendation Systems Group: Tippy

Transcript of Content-based Recommendation Systems Group: Tippy.

Page 1: Content-based Recommendation Systems Group: Tippy.

Content-based Recommendation SystemsGroup: Tippy

Page 2: Content-based Recommendation Systems Group: Tippy.

Group Members

•Nerin George▫Goal Models + Presentation

•Deepan Murugan▫Domain Models + Presentation

•Thach Tran▫Strategies + Presentation

Page 3: Content-based Recommendation Systems Group: Tippy.

Outline

• Introduction• Item Representation•User Profiles▫Manual Recommendation Methods

•Learning A User Model▫Classification Learning Algorithms

Decision Trees and Rule Induction Nearest Neighbour Methods

•Conclusions•Q & A

Page 4: Content-based Recommendation Systems Group: Tippy.

Introduction

• The WWW is growing exponentially. Many websites become enormous in term of size and complexity• Users need help in finding items that are

in accordance with their interests• Recommendation–Content-based recommendation:

recommend an item to a user based upon a description of the item and a profile of the user’s interests

Page 5: Content-based Recommendation Systems Group: Tippy.

Introduction

•Pazzani, M. J., & Billsus, D. (2007). Content-Based Recommendation Systems. Lecture Notes in Computer Science. (4321), 325-341.

Page 6: Content-based Recommendation Systems Group: Tippy.

Related Research•Recommender systems▫present items (e.g., movies, books, music,

images, web pages, news, etc.) that are likely of interest to the user

▫compare the user’s profile to some reference characteristics to predict whether the user would be interested in an unseen item

▫Reference characteristics Information about the unseen item content-

based approach User’s social environment collaborative filtering

approach

Page 7: Content-based Recommendation Systems Group: Tippy.

Item Representation• Items stored in a database table

• Structured data▫ Small number of attributes▫ Each item is described by the same set of attributes▫ Known set of values that the attributes may have

• Straightforward to work with▫User’s profile contains positive rating for 1001, 1002,

1003▫Would the user be interested in say Oscars (French

cuisine, table service)?

ID Name Cuisine Service Cost

1001 Mike’s Pizza Italian Counter Low

1002 Chris’s Café French Table Medium

1003 Jacques Bistro French Table High

Page 8: Content-based Recommendation Systems Group: Tippy.

Item Representation• Information about item could also be free

text; e.g., text description or review of the restaurant, or news articles

•Unstructured data▫No attribute names with well-defined values▫Natural language complexity

Same word with different meanings Different words with same meaning

•Need to impose structure on free text before it can be used in recommendation algorithm

Page 9: Content-based Recommendation Systems Group: Tippy.

TF*IDF Weighting

•First, stemming is applied to get the root forms of words▫“compute”, “computation”, “computer”,

“computes”, etc., are represented by one term

•Compute a weight for each term that represents the importance or relevance of that term

Page 10: Content-based Recommendation Systems Group: Tippy.

TF*IDF Weighting

•Term frequency tft,d of a term t in a document d

• Inverse document frequency idft of a term t

•TF*IDF weighting

kdk

dtdt n

ntf

,

,,

tt df

Nidf log

tdt idftfdtw ,,

Page 11: Content-based Recommendation Systems Group: Tippy.

TF*IDF Weighting

•The term with highest weight occur more often in that document than in other documents more central to the topic of the document

•Limitations▫This method does not capture the context in

which a word is used▫“This restaurant does not serve vegetarian

dishes”

Page 12: Content-based Recommendation Systems Group: Tippy.

User Profiles•A profile of the user’s interests is used by

most recommendation systems•This profile consists of two main types of

information▫A model of the user’s preferences. E.g., a

function that for any item predicts the likelihood that the user is interested in that item

▫User’s interaction history. E.g., items viewed by a user, items purchased by a user, search queries, etc.

Page 13: Content-based Recommendation Systems Group: Tippy.

User Profiles

•User’s history will be used as training data for a machine learning algorithm that creates a user model

•“Manual” recommending approaches▫User customisation

Provide “check box” interface that let the users construct their own profiles of interests

A simple database matching process is used to find items that meet the specified criteria and recommend these to users.

Page 14: Content-based Recommendation Systems Group: Tippy.

User Profiles• Limitations

▫ Require efforts from users▫ Cannot cope with changes

in user’s interests▫ Do not provide a way to

determine order among recommending items

Page 15: Content-based Recommendation Systems Group: Tippy.

User Profiles

•“Manual” recommending approaches▫Rule-based Recommendation

The system has rules to recommend other products based on user history

Rule to recommend sequel to a book or movie to customers who purchased the previous item in the series

Can capture common reasons for making recommendations

Page 16: Content-based Recommendation Systems Group: Tippy.

Learning a User Model•Creating a model of the user’s preference from

the user history is a form of classification learning

• The training data (i.e., user’s history) could be captured through explicit feedback (e.g., user rates items) or implicit observing of user’s interactions (e.g., user bought an item and later returned it is a sign of user doesn’t like the item)

• Implicit method can collect large amount of data but could contains noise while data collected through explicit method is perfect but the amount collected could be limited

Page 17: Content-based Recommendation Systems Group: Tippy.

Learning a User Model

•Next, a number of classification learning algorithms are reviewed

•The main goal of these classification learning algorithms is to learn a function that model the user’s interests▫Applying the function on a new item can

give the probability that a user will like this item or a numeric value indicating the degree of interest in this item

Page 18: Content-based Recommendation Systems Group: Tippy.

Decision Trees and Rule Induction•Given the history of user’s interests as

training data, build a decision tree which represents the user’s profile of interest

•Will the user like an inexpensive Mexican restaurant?

Cuisine

Service

Cost Rating

Italian Counter

Low Negative

French Table Med Positive

French Counter

Low Positive

… … … …

Page 19: Content-based Recommendation Systems Group: Tippy.

Decision Trees and Rule Induction•Well-suited for structured data• In unstructured data, the number of

attributes becomes too enormous and consequently, the tree becomes too large to provide sufficient performance

•RIPPER: a rule induction algorithm based on the same principles but provide better performance in classifying text

Page 20: Content-based Recommendation Systems Group: Tippy.

Nearest Neighbour Methods

•Simply store all the training data in memory

•To classify a new item, compare it to all stored items using a similarity function and determine the “nearest neighbour” or the k nearest neighbours.

•The class or numeric score of the previously unseen item can then be derived from the class of the nearest neighbour.

Page 21: Content-based Recommendation Systems Group: Tippy.

Nearest Neighbour Methods• unseen item needed to

be classified• positive rated items• negative rated items• k = 3: negative• k = 5: positive

Page 22: Content-based Recommendation Systems Group: Tippy.

Nearest Neighbour Methods

•The similarity function depends on the type of data

•Structured data: Euclidean distance metric•Unstructured data (i.e., free text): cosine

similarity function

Page 23: Content-based Recommendation Systems Group: Tippy.

Euclidean Distance Metric

•Distance between A and B

•Attributes which are not measured quantitatively need to be labeled by numbers representing their categories▫Cuisine attribute: 1=Frech, 2=Italian,

3=Mexican.

Item

Attr. X Attr. Y Attr. Z

A XA YA ZA

B XB YB ZB

222, BABABA zzyyxxBAd

Page 24: Content-based Recommendation Systems Group: Tippy.

Cosine Similarity Function

•Vector space model▫An item or a document d is represented as

a vector

▫wt,d is the tf*idf weight of a term t in a document d

•The similarity between two items can then be computed by the cosine of the angle between two vectors

TdNddd www ,,2,1 ,,, v

21

21

vv

vv cos

Page 25: Content-based Recommendation Systems Group: Tippy.

Nearest Neighbour Methods

•Despite the simplicity of the algorithm, its performance has been shown to be competitive with more complex algorithms

Page 26: Content-based Recommendation Systems Group: Tippy.

Other Classification Learning Algorithms•Relevance Feedback and Rocchio’s

Algorithm•Linear Classifiers•Probabilistic Methods and Naïve Bayes

Page 27: Content-based Recommendation Systems Group: Tippy.

Conclusions•Can only be effective in limited circumstances.

It is not straightforward to recognise the subtleties in content

•Depend entirely on previous selected items and therefore cannot make predictions about future interests of users

• These shortcomings can be addressed by collaborative filtering (CF) techniques

•CF is the dominant technique nowadays thanks to the popularity of Web 2.0/Social Web concept

•Many recommendation system utilise a hybrid of content-based and collaborative filtering approaches

Page 28: Content-based Recommendation Systems Group: Tippy.

Summary

•Content-based Recommendation• Item Representation•User Profiles▫Manual Recommendation Methods

•Learning A User Model Decision Trees and Rule Induction Nearest Neighbour Methods

Page 29: Content-based Recommendation Systems Group: Tippy.

Q & A