A Scalable Solution for Personalized Recommendations in Large-scale Social Networks

A Scalable Solution for Personalized

Recommendations in Large-scale Social Networks

Sardianos Christos, Varlamis Iraklis

Harokopio University of AthensDept. of Informatics & Telematics

{sardianos}{varlamis}@hua.gr

Click icon to add picture

HAROKOPIO UNIVERSITY of ATHENS

Department of Informatics & Telematics

PCI 2014, Athens October 2-4, 2014 18th Panhellenic Conference in Informatics

In many Web 2.0 applications users can interact with the applications in terms of social activity. They can express their trust for another user or another user’s

review.

A recommender system is responsible for recommending items (e.g. products, articles etc.) to users, based on their previous activity.

This can be a difficult process, using existing techniques, in large social and bipartite graphs.


Role of Recommender Systems

Structure of Recommender Systems


We consider two types of entities:

• Users• Items

Users express their preferences for some of the available items by rating them (directly or indirectly).

These preferences usually are expressed in a user rating matrix or utility matrix.

System’s goal: Predict user’s preference for items that he hasn’t “rated” yet and recommend the k-most likely preferred.

Recommender Systems Approaches

There are many Recommender Systems approaches, which can be broadly categorized into the following categories.*

Collaborative Filtering (CF)

Content-based

Hybrid Systems

* P. Melville, V. Sindhwani. "Recommender Systems", Encyclopedia of Machine Learning, Springer, 2010.


Limitations of Existing Approaches

Social networks like Facebook & Twitter have over 1.5BN & 95M users respectively. Thus, a major limitation for Recommender Systems is scalability.

The process of generating recommendations for users, for whom the system has insufficient information (Cold-Start users) is a known issue of Recommender Systems.


Scientific Research Question-Definition

Is it possible to achieve equally good recommendations by applying CF over subgraphs of the original graph?

Is it possible to use these subgraphs for providing a solution for the Cold-Start problem?

Proposed Solution: The creation of subgraphs based on social information content.


Proposed Approach & Tools

o Partitioning using Metis from Karypis Lab*

o CF using LensKit Recommender Toolkit

(GroupLens Research**)

BipartiteGraph

SocialGraph

Partitioning

Subgraphs

SVDUser-UserItem-Item

CollaborativeFiltering

Recommendations

* http://glaros.dtc.umn.edu/gkhome/index.php** http://lenskit.grouplens.org/


Description of the model functionalityPreparation of the Social Graph

Social Graph Partitioning

Bipartite Graph Partitioning

Recommendations Evaluation


Bipartite

Graph

Evaluation Metrics

• ByUser• ByRating

• ByUser• ByRating

• ByUser• ByUser

MAE RMSE𝑹𝑴𝑺𝑬=√ 𝟏

𝒏∑𝒊=𝟏

𝒏

(𝒚 𝒊− �̂� 𝒊)𝟐

𝑴𝑨𝑬=𝟏𝒏∑

𝒊=𝟏

𝒏

|𝒚 𝒊− �̂� 𝒊|


Dataset Characteristics Comparison Dataset

Characteristics Epinions FlixsterSocial Graph

Num. of Distinct Users 131,828 786,936

Num. of Social Edges 841,372 7,058,819

Average Degree 12.765 17.94

Bipartite

Graph

Num. of Distinct Users (Raters) 120,492 147,612

Num. of Distinct Items 755,760 48,794

Num. of Ratings 13,668,320 8,196,077

Avg. outDegree/User 113.44 10.42

Avg. inDegree/Item 18.09 167.97

Evaluation scale 1 – 5 0.5 – 5

Precision 1.0 0.5PCI 2014, Athens October 2-4, 2014 18th Panhellenic Conference in Informatics

Experimental Procedure

Experimental procedure implementation

Use of Okeanos IaaS Cloud provided by The Greek Research and Technology Network (GRNET S.A.)

Two Linux based systems: Ubuntu Desktop 64-bit 2-CPUs QEMU Virtual CPU v.:1.7.0 2.1GHz CPU Speed, 512KB cache 6GB RAM memory

Platform used for experiments

Model implementation in Java Evaluation process run through Groovy scripts


Evaluation of the Experimental Procedure

Algorithms evaluated:o User-Usero Item-Itemo FunkSVD (SVD Implementation)

We performed a 5-fold Cross-Validation over the Training & Testing samples.

The range of the different number of subgraphs examined was:s = {1, 2, 4, 8, 16, 33, 65, 125, 250, 500, 1000}, using the whole neighborhood as k-nearest neighbors.

For s = {4, 65, 1000} we examined the performance of User-User algorithm for different Neighborhood–Size (knn), with k = {1, 3, 5, 10, 25, 50, 100, 500, 1000}.

The number of features used for training by FunkSVD algorithm was set to: FeatureCount = 100.

The number of Listsize for the Top-N nDCG metric was set to: Ν = 5.PCI 2014, Athens October 2-4, 2014 18th Panhellenic Conference in Informatics

Evaluation Findings

o Evaluation time is rapidly reduced, while number of subgraphs increases.o For s>16 (~7.530 users), Item-Item algorithm performs faster than User-User

και SVD.o Execution of Item-Item & User-User algorithms over the full graph was

impossible , while SVD algorithm could not be executed for s<4 (~30.123 users), due to memory insufficiency because of the way SVD algorithm works.



Evaluation Findings

o Algorithms SVD & Item-Item appear to have normalized gain , unlike User-User that performs poorly, due to the notable large number of items per subgraph.

o Algorithm Item-Item can predict similar items (based on the ratings), while SVD creates a smaller and denser item space. Better performance!


Evaluation Findings

o Results are comparable to those from Epinions.o User-User algorithm still doesn’t perform well, but has more stable behavior.o There is however, a larger standard deviation of the performance of User-

User algorithm over each subgraph for the different values of s, unlike Item-Item & SVD algorithms.

Conclusions

Is it possible to create a model that will take into account the social network of the users for creating personalized recommendations in large-scale social networks?

In conclusion, we can say that the performance of the proposed model (CF in subgraphs) is comparable to that of the traditional techniques (CF in full graph).

In sparse bipartite graphs, the performance of this model may be reduced.

But, using algorithms such as SVD, we can provide a solution even in the case of sparse bipartite graphs.

The proposed approach could be utilized to implement a distributed recommender system, minimizing the execution time and producing high quality recommendations.



Future Work

Deploy the proposed model over a distributed architecture

Partitioning is fast, CF is the bottleneck

• Based on graph (and subgraph) statistics, decide whether to partition or not and decide on the number of partitions

Graph partitioning results in many CrossCluster edges, which are currently ignored

• What happens when we take these edges into account

Thank you for your time.


A Scalable Solution for Personalized Recommendations in Large-scale Social Networks

Documents

Transcript of A Scalable Solution for Personalized Recommendations in Large-scale Social Networks