A Scalable Solution for Personalized Recommendations in Large-scale Social Networks
-
Upload
miranda-calvin -
Category
Documents
-
view
45 -
download
2
description
Transcript of A Scalable Solution for Personalized Recommendations in Large-scale Social Networks
A Scalable Solution for Personalized
Recommendations in Large-scale Social Networks
Sardianos Christos, Varlamis Iraklis
Harokopio University of AthensDept. of Informatics & Telematics
{sardianos}{varlamis}@hua.gr
Click icon to add picture
HAROKOPIO UNIVERSITY of ATHENS
Department of Informatics & Telematics
PCI 2014, Athens October 2-4, 2014 18th Panhellenic Conference in Informatics
In many Web 2.0 applications users can interact with the applications in terms of social activity. They can express their trust for another user or another user’s
review.
A recommender system is responsible for recommending items (e.g. products, articles etc.) to users, based on their previous activity.
This can be a difficult process, using existing techniques, in large social and bipartite graphs.
PCI 2014, Athens October 2-4, 2014 18th Panhellenic Conference in Informatics
Role of Recommender Systems
Structure of Recommender Systems
PCI 2014, Athens October 2-4, 2014 18th Panhellenic Conference in Informatics
We consider two types of entities:
• Users• Items
Users express their preferences for some of the available items by rating them (directly or indirectly).
These preferences usually are expressed in a user rating matrix or utility matrix.
System’s goal: Predict user’s preference for items that he hasn’t “rated” yet and recommend the k-most likely preferred.
Recommender Systems Approaches
There are many Recommender Systems approaches, which can be broadly categorized into the following categories.*
Collaborative Filtering (CF)
Content-based
Hybrid Systems
* P. Melville, V. Sindhwani. "Recommender Systems", Encyclopedia of Machine Learning, Springer, 2010.
PCI 2014, Athens October 2-4, 2014 18th Panhellenic Conference in Informatics
Limitations of Existing Approaches
Social networks like Facebook & Twitter have over 1.5BN & 95M users respectively. Thus, a major limitation for Recommender Systems is scalability.
The process of generating recommendations for users, for whom the system has insufficient information (Cold-Start users) is a known issue of Recommender Systems.
PCI 2014, Athens October 2-4, 2014 18th Panhellenic Conference in Informatics
Scientific Research Question-Definition
Is it possible to achieve equally good recommendations by applying CF over subgraphs of the original graph?
Is it possible to use these subgraphs for providing a solution for the Cold-Start problem?
Proposed Solution: The creation of subgraphs based on social information content.
PCI 2014, Athens October 2-4, 2014 18th Panhellenic Conference in Informatics
Proposed Approach & Tools
o Partitioning using Metis from Karypis Lab*
o CF using LensKit Recommender Toolkit
(GroupLens Research**)
BipartiteGraph
SocialGraph
Partitioning
Subgraphs
SVDUser-UserItem-Item
CollaborativeFiltering
Recommendations
* http://glaros.dtc.umn.edu/gkhome/index.php** http://lenskit.grouplens.org/
PCI 2014, Athens October 2-4, 2014 18th Panhellenic Conference in Informatics
Description of the model functionalityPreparation of the Social Graph
Social Graph Partitioning
Bipartite Graph Partitioning
Recommendations Evaluation
PCI 2014, Athens October 2-4, 2014 18th Panhellenic Conference in Informatics
Bipartite
Graph
Evaluation Metrics
• ByUser• ByRating
• ByUser• ByRating
• ByUser• ByUser
MAE RMSE𝑹𝑴𝑺𝑬=√ 𝟏
𝒏∑𝒊=𝟏
𝒏
(𝒚 𝒊− �̂� 𝒊)𝟐
𝑴𝑨𝑬=𝟏𝒏∑
𝒊=𝟏
𝒏
|𝒚 𝒊− �̂� 𝒊|
PCI 2014, Athens October 2-4, 2014 18th Panhellenic Conference in Informatics
Dataset Characteristics Comparison Dataset
Characteristics Epinions FlixsterSocial Graph
Num. of Distinct Users 131,828 786,936
Num. of Social Edges 841,372 7,058,819
Average Degree 12.765 17.94
Bipartite
Graph
Num. of Distinct Users (Raters) 120,492 147,612
Num. of Distinct Items 755,760 48,794
Num. of Ratings 13,668,320 8,196,077
Avg. outDegree/User 113.44 10.42
Avg. inDegree/Item 18.09 167.97
Evaluation scale 1 – 5 0.5 – 5
Precision 1.0 0.5PCI 2014, Athens October 2-4, 2014 18th Panhellenic Conference in Informatics
Experimental Procedure
Experimental procedure implementation
Use of Okeanos IaaS Cloud provided by The Greek Research and Technology Network (GRNET S.A.)
Two Linux based systems: Ubuntu Desktop 64-bit 2-CPUs QEMU Virtual CPU v.:1.7.0 2.1GHz CPU Speed, 512KB cache 6GB RAM memory
Platform used for experiments
Model implementation in Java Evaluation process run through Groovy scripts
PCI 2014, Athens October 2-4, 2014 18th Panhellenic Conference in Informatics
Evaluation of the Experimental Procedure
Algorithms evaluated:o User-Usero Item-Itemo FunkSVD (SVD Implementation)
We performed a 5-fold Cross-Validation over the Training & Testing samples.
The range of the different number of subgraphs examined was:s = {1, 2, 4, 8, 16, 33, 65, 125, 250, 500, 1000}, using the whole neighborhood as k-nearest neighbors.
For s = {4, 65, 1000} we examined the performance of User-User algorithm for different Neighborhood–Size (knn), with k = {1, 3, 5, 10, 25, 50, 100, 500, 1000}.
The number of features used for training by FunkSVD algorithm was set to: FeatureCount = 100.
The number of Listsize for the Top-N nDCG metric was set to: Ν = 5.PCI 2014, Athens October 2-4, 2014 18th Panhellenic Conference in Informatics
Evaluation Findings
o Evaluation time is rapidly reduced, while number of subgraphs increases.o For s>16 (~7.530 users), Item-Item algorithm performs faster than User-User
και SVD.o Execution of Item-Item & User-User algorithms over the full graph was
impossible , while SVD algorithm could not be executed for s<4 (~30.123 users), due to memory insufficiency because of the way SVD algorithm works.
PCI 2014, Athens October 2-4, 2014 18th Panhellenic Conference in Informatics
PCI 2014, Athens October 2-4, 2014 18th Panhellenic Conference in Informatics
Evaluation Findings
o Algorithms SVD & Item-Item appear to have normalized gain , unlike User-User that performs poorly, due to the notable large number of items per subgraph.
o Algorithm Item-Item can predict similar items (based on the ratings), while SVD creates a smaller and denser item space. Better performance!
PCI 2014, Athens October 2-4, 2014 18th Panhellenic Conference in Informatics
Evaluation Findings
o Results are comparable to those from Epinions.o User-User algorithm still doesn’t perform well, but has more stable behavior.o There is however, a larger standard deviation of the performance of User-
User algorithm over each subgraph for the different values of s, unlike Item-Item & SVD algorithms.
Conclusions
Is it possible to create a model that will take into account the social network of the users for creating personalized recommendations in large-scale social networks?
In conclusion, we can say that the performance of the proposed model (CF in subgraphs) is comparable to that of the traditional techniques (CF in full graph).
In sparse bipartite graphs, the performance of this model may be reduced.
But, using algorithms such as SVD, we can provide a solution even in the case of sparse bipartite graphs.
The proposed approach could be utilized to implement a distributed recommender system, minimizing the execution time and producing high quality recommendations.
PCI 2014, Athens October 2-4, 2014 18th Panhellenic Conference in Informatics
PCI 2014, Athens October 2-4, 2014 18th Panhellenic Conference in Informatics
Future Work
Deploy the proposed model over a distributed architecture
Partitioning is fast, CF is the bottleneck
• Based on graph (and subgraph) statistics, decide whether to partition or not and decide on the number of partitions
Graph partitioning results in many CrossCluster edges, which are currently ignored
• What happens when we take these edges into account
Thank you for your time.
PCI 2014, Athens October 2-4, 2014 18th Panhellenic Conference in Informatics