LSH for Prediction Problem in Recommendation

Post on 21-Feb-2017

43 views 6 download

Transcript of LSH for Prediction Problem in Recommendation

LSH for

Prediction Problem in Recommendation

Maruf Aytekin

PhD Student

Computer Engineering Department Bahcesehir University

May 5, 2015

Outline• User-based • Item-based • LSH • Parameters • Model Build Performance • Accuracy Performance • LSH Parameters

Data SetTotal Ratings: 100000 Number of Users : 943 Number of Items : 1682 Sparsity = 0.0630

Evaluation Methods• We use hold out cross validation methot for the

experiments

• We select %5 for test %5 for validation data randomly.

• Repeat this process 3 times and averaged out the results

User-basedNeighbors can have different levels of similarity.

Wuv: Similarity of user u and v.

rvi: Rating value of user v for item i.

Ni(u): Set of neighbors who have rated for item i.

ruj: Rating value of user u for item j.

Nu(i): the items rated by user u most similar to item i. Wij: Similarity of item i and j

Item-based

U1

U2

U3

Um

.

.

.

.

.

H1

H2

U7 U11 U10

.

.

U13 U39 Um

.

.

U1 U3 U9

.

.

U2 U5 U6

.

.

bucket 1 key: 0101

bucket 2 key: 1110

bucket 3 key: 1101

bucket 4 key: 1001

[0,1]

[0,1] AND-Construction

Locality Sensitive Hashing

Hash Tables

U2 U6 U1 U3

.

.

.

candidate set for U5: C(U5)

L = 2 K = 4

t = 1

t = 2

LSH for Prediction

L : number of hash tables (bands)

Cvi(t) : the set of candidate pairs retrieved from hash table t

rated for item i.

rvi : rating of user v (in C) on item i

Computational Complexty

|U | : User set size | I | : Item set size k : Number of neighbors used in the predictions p : Maximum number of ratings per user q : Maximum number of ratings per item

Parameters (CF)

LSH Parameters

LSH Parameters

Model Build Time

ResultsUser-based

With the optimum k = 30 and Y=7 ;

• Average MAE: 0.79527 • Average running time: 9.437 seconds.

We compare this results LSH method.

LSH & User-basedHash Functions

LSH & User-basedHash Functions

LSH & User-basedHash Tables

LSH & User-basedHash Tables

Conclusion• LSH tremendously improved the scalability • Accuracy decreased in acceptable ranges • Performance improved a lot. • LSH needs to be configured to balance MAE and

performance according to expectations from the system.

Source Code User-based Prediction:

Source CodeLSH Prediction:

Q&A