A Comparison Study on Similarity Measures in Collaborative ... · valuable input for recommendation...

A Comparison Study on Similarity Measures in

Collaborative Filtering Algorithms for Movie

Recommendation 1G. Suganeshwari and

2S.P. Syed Ibrahim

1,2School of Computing Sciences and Engineering,

Vellore Institute of Technology, Chennai Campus,

Chennai, Tamilnadu, India.

Abstract

Similarity measures is the core component used by the neighborhood

based collaborative filtering algorithms to predict the user’s interest in

items based upon the interest of the other people with similar taste.

Depending on the application, usage and domain, the collaborative

filtering has been implemented with different similarity metrics, weighing

approaches and rating normalization. In this paper, different similarity

metrics are compared, and a set of recommendations are provided based

on the influence of the similarity measures on the results of the empirical

study. Item-based and user-based collaborative filtering methods are

exploited to evaluate these similarity metrics. The experiments are

conducted on the MovieLens dataset. For evaluation the metrics used are

root mean square error and mean average precision. This paper is intended

to help the research community to identify the suitable similarity metric

depending upon the datasets.

Index Terms:Cosine, euclidean, jaccard, pearson, recommendation

system, similarity measures.

International Journal of Pure and Applied MathematicsVolume 119 No. 15 2018, 1495-1505ISSN: 1314-3395 (on-line version)url: http://www.acadpubl.eu/hub/Special Issue http://www.acadpubl.eu/hub/

1495

1. Introduction

with the advent of web 2.0, voluminous amount of information is available that

has led to information overload in e-commerce industry. Users find difficulties

in selecting the items based on their preferences and interest. Recommender

system(RS) helps the user to handle the information overload problem and to

provide a personalized recommendation. RS is more popular and acts a

backbone in ecommerce and social networking sites. The user item interaction

(likes, purchases, reviews, ratings) acts as an input to generate recommendation

in shopping sites. Similarly, the data in social networking sites also provide a

valuable input for recommendation in classical collaborative filtering methods.

The companies that has successfully implemented recommender systems are

Amazon, Spotify and Netflix.

Collaborative filtering is one of the most widely adopted technique in the

industry. To provide recommendations, similar items or users are to be found.

To trace the similar objects a value is computed between the pairs of items. If

some user watches the movie Inception, the recommended movies would be

Avatar and The Dark Knight. But an ecommerce site with minimum of 10,000

products would end up in computing one billion computations, which is

infeasible solution with number of items and users available in recent years.

Moreover, user tend to rate only a small fraction of products from the large item

catalog. So, the recommendation system must take care of huge volume of

sparse dataset. To address this key challenge in dynamically growing dataset the

similarity computations are to be computed periodically.

Improving the accuracy of the recommendation has been the target of the

researchers. Since in CF algorithms, similarity metrics are the core components,

their performances directly influence the recommendations generated. The aim

of this paper is to investigate the performance of the benchmark similarity

measures such as Euclidean distance, Jaccard distance, Cosine similarity and

Pearson Correlation Coefficient and evaluate their performance in the rating

dataset.

The key contributions of this paper are:

• Popular similarity metrics frequently used for predicting continuous data

are empirically studied and evaluated in a single framework.

• Similarity metrics are evaluated on the bench mark MovieLens dataset

on accuracy and mean average precision.

The rest of the paper is organized as follows: In section 2, the preliminary

knowledge is discussed. In section 3, popular similarity metrics are compared.

Experimental results are discussed in section 4 followed by the conclusion and

recommendation of the metrics.

International Journal of Pure and Applied Mathematics Special Issue

1496

2. Preliminary Knowledge

Recommendation System and Types

Recommender systems emerged as a powerful tool providing useful content to

the user and increasing the revenue of the retailers. The RS definition as defined

in [1], is a useful guide that directs the user to take appropriate decision from

the large option space available. The recommender systems are broadly

classified as content, collaborative and hybrid methods [2]. In content-based

methods items are recommended based on the user’s past purchases.

Collaborative filtering recommends items based on the user’s or item’s

similarities. The content based, and collaborative methods are combined in

different ways to yield hybrid methods.

The most successful recommender system implemented in the ecommerce

industry is Collaborative filtering (CF) [3]. It suggests items to the user, based

on the similarity between the user or the items. CF techniques are further

classified as model and memory-based methods. Memory based methods

exploits the entire utility matrix to provide recommendations. Statistical

techniques are used to find the most similar users to the active user. These

similar users are called as neighbors, they share similar interest with the target

user. Once the neighborhood is framed different algorithms can be integrated to

provide an effective recommendation. Though this method yields high quality

accuracy, it suffers from scalability issues when number of users and items

increases. The computational complexity drastically grows, and it becomes

infeasible to provide recommendations in real time. The similarity is computed

based on users or items. In contrast to memory-based methods, model-based

methods create a model based on the user’s ratings and then generates the

recommendation based on probabilistic approach. Different machine learning

algorithms like Bayesian network, Clustering, SVM and rule-based approaches

can be used to build the models.

Definition 1

Given a utility matrix R with n users and m items of size m x n represented as U

= {u1, u2…., un} and I = {i1, i2…im}, the recommendation problem is to define a

utility function F → {i1,i2,….ik}. p and q represent the user, u represents the

active user. Similarly, x and y represent the items and i represents the active

item. rup and ruq represents the rating given by the same user for the items p and

q. rxi and ryi represents the rating given by two users x and y for the same item i.

rp, rq are the mean ratings for user p and q. rx,ry are the mean ratings for items x

and y. The CF heuristic method for user based is given by aggregation of the

ratings of the similar users.

𝐹 𝑝, 𝑥 = 𝑎𝑔𝑔𝑟𝑞𝜖𝑛 𝑝 𝑟𝑞𝑥 (1)

Here n(p) is set of users similar to user p, rqx is the rating given by the similar

user to item x and F(p,x) is the predicted rating for the item x by the user p. For


1497

item-based it is given by

𝐹 𝑝, 𝑥 = 𝑎𝑔𝑔𝑟𝑦𝜖𝑛 𝑥 𝑟𝑝 ,𝑦 (2)

3. Similarity Metrics

The first step in neighborhood-based CF is to weight the group of similar items

with an active item. The aim is to group the items that are similar and can be

influential in providing recommendations. Similarity metrics are used to

determine the closeness between the items. It involves a statistical computation

and measures how alike two objects are. In general, it is not practical to frame a

best similarity metric for myriad applications involving variety of datatypes, a

comparison study can highlight the performance and behaviors.

Euclidean Distance

The Euclidean distance for two objects is given by the path connecting them.

This metric can be used when data are dense and continuous. For n dimensions

it is given as the distance measured as

𝑆𝑖𝑚(𝑝, 𝑞) = 𝑝𝑗 − 𝑞𝑗 2

𝑛

𝑗=1

(3)

The distances computed is always greater than or equal to 0. Two objects are

identical when the distance is zero and dissimilar objects has higher distance

values. In Euclidean distance, the average rating has not been considered and

user’s rating are not normalized. As a result, it doesn’t reflect the user’s mood

or attitude. Euclidean distance [4] is broadly used in clustering recommendation

methods.

Jaccard Similarity

This similarity is used when objects are treated as sets. Jaccard distance [5] is

measured as cardinality of the intersection to the cardinality of their union.

𝑆𝑖𝑚(𝑝, 𝑞) =𝑝 𝑞

𝑝 𝑞

(4)

Jaccard is a measure of asymmetric value on binary objects. This method is

used when negative values gives no valuable outputs. When used with rating

dataset it doesn’t yield accurate results because it ignores the rating value.

Cosine Similarity

Cosine similarity [5] finds the similarity as a dot product of two objects. The

angle between the two objects with same direction results in higher similarity

score. This similarity value yields accurate results in sparse dataset. Its scales

from -1 to +1. The cosine angle is given by


1498

If vector P and Q are represented as {x1, y1} and {x2, y2} then cosine similarity

between the vectors are given as

𝑠𝑖𝑚 𝑝, 𝑞 =𝑥1𝑥2 + 𝑦1𝑦2

x12 + y1

2 x22 + y2

2

(6)

The problem with cosine similarity is it treats missing values as negatives and it

doesn’t take into the account the rating scale. This is mainly caused due to the

fact that the average rating has not be considered. To address this issue Pearson

correlation coefficient is used that utilizes cosine similarity with sort of

normalization.

Pearson Correlation Coefficient

It is a very popular similarity metric used by Grouplens [6] and Ringo [7] for

recommendations. It depends upon the co-rated items and its deviation from the

average rating of the item. The similarity between the two users or two items

given by Pearson correlation is a measure of linear relationship between the pair

of objects. The similarity ranges by a scale of -1 to +1. Pearson correlation for

an item-based CF between items p and q is given by

𝑠𝑖𝑚 𝑝, 𝑞 = 𝑟𝑢𝑝 − 𝑟𝑝 𝑟𝑢𝑞 − 𝑟𝑞 𝑢𝜀𝑈𝑝𝑞

𝑟𝑢𝑝 − 𝑟𝑝 2

𝑢 ԑ 𝑈 𝑟𝑢𝑞 − 𝑟𝑞 2

𝑢 ԑ 𝑈

(7)

Here u ε Upq, are the common users who have rated both the items p and q. and

are the average ratings for the item p and q by the user u. Similarly, for user-

based algorithm the Pearson correlation between two user’s x and y is given by

users who have rated the same items.

𝑠𝑖𝑚 𝑥, 𝑦 = 𝑟𝑥𝑖 − 𝑟𝑥 𝑟𝑦𝑖 − 𝑟𝑦 𝑖ԑ𝐼𝑥𝑦

𝑟𝑥𝑖 − 𝑟𝑥 2𝑖 ԑ 𝐼 𝑟𝑦𝑖 − 𝑟𝑦

2𝑖 ԑ 𝐼

(8)

Here Ixy represents items rated by both the users’ x and y, rxi and ryi are the

ratings given by user’s x and y for item i. and are the average ratings for the

items x and y. Ringo music RS used constrained Pearson correlation [7] to

compute similarity weights. It is the variation of spearman that uses midpoint

instead of mean rate. Spearman correlation [8] is used when ratings are

represented as ranks and Kendall’s coefficient [9] is used when relative ranks

are used instead of ranks. Pearson correlation [6] [12] was used by Grouplens

𝑠𝑖𝑚 𝑝, 𝑞 = 𝑐𝑜𝑠 𝑝, 𝑞 =𝑝 . 𝑞

|𝑝 |2 |𝑞 |2

(5)


1499

for predicting Usenet news articles by computing the weighted average. The

small neighborhood size doesn’t yield valuable predictions and leads to low

quality accuracy. To address this SigWt Pearson method is used, which

multiplies the correlation value with significant weight α/20. Here α is the

number of co-rated items.

4. Experiments

Dataset

The dataset used for experiments is benchmark MovieLens [11] dataset which

includes 1 lakh ratings from 943 users and 1642 movies. In order to represent

the relationship of sparsity and recommendation generated, the dataset is

divided using cross-validation. All the experiments are performed on the

machines with Intel(R) Xeon(R)2 CPU 3.36GHz and 32GB RAM which were

implemented using Spark version 2.2.0.

Performance Measure

To evaluate the efficiency of the existing similarity measure, the performance

measures such as accuracy, sensitivity, specificity are computed.

Root Mean Square Error (RMSE)

It represents the error rate when estimating an unrated item for an active user.

The proposition of total number of predictions that matches the user’s

preference. This measures the quality of the recommendation made. The widely

used accuracy metric in recommender system is RMSE [10]. The equation is as

follows

𝑅𝑀𝑆𝐸 = 𝑟𝑖𝑗 − 𝑟𝑖𝑗 ′

2𝑖 ,𝑗ϵ𝑋

𝑋

(9)

where rij is the actual rating given by user i for an item j, r’ij is the predicted

rating and |X| is the size of the Test Set.

Mean Average Precision

In most of the recommender systems items in the top list are more significant

than rest of the items. Mean Average Precision (MAP) is used to evaluate these

top-K items. MAP is a measure of the average relevance scores of a set of the

top-K documents presented in response to a query. It is the mean of the average

precision at K [30] as described in Eq (5). Higher MAP values predict higher

quality of the recommendation system.

𝑀𝐴𝑃 = 𝐴𝑃 𝑞 𝑞ϵ𝑄

𝑄

(10)

AP is average precision for each query q, |Q| is mean of the queries.


1500

Results and Analysis

Fig.1, Fig.2, represents the root mean square value using different similarity

metrics for Item-Based CF and User-based CF methods. Comparing these

similarity metrics, it is found that Pearson correlation provide much better

results than the rest of the similarity metrics for the MovieLens dataset.

Cosine method doesn’t provide the accurate results, because it considers only

the angle and not the relationship between the ratings and the average ratings.

For example, if there are 3 set of ratings r1, r2 and r3 represented by vectors

(5,5,5), (1,1,1) and (4,5,5). It is evident that r1 and r3 are more similar. But

according to cosine similarity r1 and r2 are more similar. Another challenge in

this method is, it treats missing values as negative. This method is mostly used

in calculating the similarity between the documents.

In Euclidean distance the data must be normalized before computing the

similarity value. Considering the rating sets r1(4,4,3,3), r2(5,5,4,4) and r3

(3,4,3,3). The r1 rating set is rated by a tough user and r2 is a normal user, but

their tastes are very similar.

Fig. 1: Comparison of RMSE for Movie Lens 100k Dataset for Item-based CF

Fig. 2: Comparison of RMSE for MovieLens 100k Dataset for User-based CF


1501

The relation between r1 and r2 must be small which is not true when the

Euclidean distance is used.

From the experimental analysis the Jaccard similarity also doesn’t yield

significant outcome for rating dataset. Considering the example of rating sets r1

(4, x, x, 5, 1),r2(5, 5, 5, x, x) and r3(1, x, x, 2, 5). Here x represents the movie

not rated by the user. According to Jaccard similarity the r1 and r2 are less

similar than the similarity between r1 and r3. This is due to the fact that Jaccard

similarity method ignores the rating values.

Pearson correlation method provides better results by normalizing the ratings by

subtracting from the mean. The challenges of handling tough raters and normal

raters are addressed. Along with that the missing ratings are not treated as

negative values and treated as average values. But often the Pearson correlation

method gets over-powered by the smaller number of co-rated objects. To

address this significance weighting can be used. When the number of neighbors

is less than 20, the correlation value is divided by n/20, where n is the number

of neighbors. This helps to address the less number of neighbors.

Mean Average Precision

From Fig.3 and Fig.4, it is inferred that item-based CF provide better accuracy

than user-based CF methods.

Fig. 3: Comparison of MAP for MovieLens 100k Dataset for Item-based CF

Fig. 4: Comparison of MAP for MovieLens 100k dataset for User-based CF


1502

5. Conclusion

Item based collaborative filtering methods provide better quality of results when

compared with user based collaborative methods. In item-based CF the

neighborhood is static, which enables to precompute the similarity between the

items from the large sparse dataset. User based CF are extensively used in social

networking sites and item based are used in the e-commerce industries like

Amazon, MovieLens and Netflix. Depending upon the dataset the various

similarity metrics can be used. Pearson can be used for large datasets where data

isn’t normalized. Cosine is used when frequency plays an important role, and

Jaccard can be used when the data is represented as binary attributes.

References

[1] Burke R., Hybrid recommender systems: Survey and experiments, User modeling and user-adapted interaction 12(4) (2002), 331-370.

[2] Adomavicius G., Tuzhilin A., Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions, IEEE transactions on knowledge and data engineering 17(6) (2005), 734-749.

[3] Ren Y., Li G., Zhang J., Zhou W., Lazy collaborative filtering for data sets with missing values, IEEE transactions on cybernetics 43(6) (2013), 1822-1834.

[4] Song L., Tekin C., van der Schaar M., Online learning in large-scale contextual recommender systems, IEEE Transactions on Services Computing 9(3) (2016), 433-445.

[5] Meng S., Dou W., Zhang X., Chen J., KASR: A Keyword-Aware Service Recommendation method on MapReduce for big data applications, IEEE Transactions on Parallel and Distributed Systems 25(12) (2014), 3221-3331.

[6] Resnick P., Iacovou N., Suchak M., Bergstrom P., Riedl J., GroupLens: an open architecture for collaborative filtering of netnews, Proceedings of the ACM conference on Computer supported cooperative work (1994), 175-186.

[7] Shardanand U., Maes P., Social information filtering: algorithms for automating “word of mouth”, Proceedings of the SIGCHI conference on Human factors in computing systems (1995), 210-217.

[8] Goldberg K., Roeder T., Gupta D., Perkins C., Eigentaste: A constant time collaborative filtering algorithm, information retrieval 4(2) (2001), 133-1351.


1503

[9] Herlocker J.L., Konstan J.A., Terveen L.G., Riedl J.T., Evaluating collaborative filtering recommender systems, ACM Transactions on Information Systems (TOIS) 22(1) (2004).

[10] Sarwat M., Levandoski J.J., Eldawy A., Mokbel M.F., LARS*: An efficient and scalable location-aware recommender system, IEEE Transactions on Knowledge and Data Engineering 26(6) (2014), 1384-1399.

[11] Harper F.M., Konstan J.A., The movielens datasets: History and context, ACM Transactions on Interactive Intelligent Systems (TiiS) 5(4) (2016).

[12] Suganeshwari G., Ibrahim S.S., A survey on collaborative filtering-based recommendation system, Proceedings of the 3rd International Symposium on Big Data and Cloud Computing Challenges (2016), 503-518.


1504

A Comparison Study on Similarity Measures in Collaborative ... · valuable input for recommendation...

Documents

Transcript of A Comparison Study on Similarity Measures in Collaborative ... · valuable input for recommendation...