Computer Based Recommendation Systems in Online Businessjzhang/CS689/PPDM-Chapter10.pdf · Most...

57
Computer Based Recommendation Systems in Online Business Jun Zhang and Xiwei Wang Department of Computer Science University of Kentucky Lexington, KY 40506-0633 USA

Transcript of Computer Based Recommendation Systems in Online Businessjzhang/CS689/PPDM-Chapter10.pdf · Most...

Page 1: Computer Based Recommendation Systems in Online Businessjzhang/CS689/PPDM-Chapter10.pdf · Most recommendation systems rely on Collaborative Filtering (CF) technique In CF‐based

Computer Based Recommendation Systems in Online Business

Jun Zhang and Xiwei WangDepartment of Computer ScienceUniversity of KentuckyLexington, KY 40506-0633USA

Page 2: Computer Based Recommendation Systems in Online Businessjzhang/CS689/PPDM-Chapter10.pdf · Most recommendation systems rely on Collaborative Filtering (CF) technique In CF‐based

CONTENTS

Introduction to recommendation systems

Popular algorithms used in recommendation systems

Comparison of recommendation system algorithms using real online market data

Concluding remarks and future work

1

Page 3: Computer Based Recommendation Systems in Online Businessjzhang/CS689/PPDM-Chapter10.pdf · Most recommendation systems rely on Collaborative Filtering (CF) technique In CF‐based

BACKGROUND

The booming e‐Business facilitates profound social behavior changes. More and more people choose to shop online instead of going to real stores.

2

Page 4: Computer Based Recommendation Systems in Online Businessjzhang/CS689/PPDM-Chapter10.pdf · Most recommendation systems rely on Collaborative Filtering (CF) technique In CF‐based

BACKGROUND

Supermarkets provide blind recommendation information to all customers.

3

Page 5: Computer Based Recommendation Systems in Online Businessjzhang/CS689/PPDM-Chapter10.pdf · Most recommendation systems rely on Collaborative Filtering (CF) technique In CF‐based

BACKGROUND

Online merchants provide targeted recommendation information to the customers who have previously visited their website to better market their merchandises (e.g., books, movies, CD’s, web pages, newsgroup messages).

4

Page 6: Computer Based Recommendation Systems in Online Businessjzhang/CS689/PPDM-Chapter10.pdf · Most recommendation systems rely on Collaborative Filtering (CF) technique In CF‐based

BACKGROUND – RECOMMENDER SYSTEM

What is a Recommendation System?Recommendation system is a program that utilizes computer algorithms to predict users’ purchase interests by profiling their shopping patterns.

Recommendation systems provide users with personalized suggestions for products or services.

5

Page 7: Computer Based Recommendation Systems in Online Businessjzhang/CS689/PPDM-Chapter10.pdf · Most recommendation systems rely on Collaborative Filtering (CF) technique In CF‐based

BACKGROUND – RECOMMENDER SYSTEM CONT.

Many online stores provide recommendations (e.g., Amazon, eBay).

Recommendation systems have been shown to substantially increase sales at online stores.

6

From a business perspective, it is viewed as part of Customer Relationship Management (CRM).

Page 8: Computer Based Recommendation Systems in Online Businessjzhang/CS689/PPDM-Chapter10.pdf · Most recommendation systems rely on Collaborative Filtering (CF) technique In CF‐based

BACKGROUND – AN EXAMPLE

7

Page 9: Computer Based Recommendation Systems in Online Businessjzhang/CS689/PPDM-Chapter10.pdf · Most recommendation systems rely on Collaborative Filtering (CF) technique In CF‐based

RECOMMENDATION SYSTEMS

Two types of recommendation systems Content‐based filtering Collaborative filtering

Content‐based filtering Performs profiling by extracting feature values from contents used in the past and recommend new contents with similar feature values.

8

Page 10: Computer Based Recommendation Systems in Online Businessjzhang/CS689/PPDM-Chapter10.pdf · Most recommendation systems rely on Collaborative Filtering (CF) technique In CF‐based

CONTENT-BASED FILTERING

9

Page 11: Computer Based Recommendation Systems in Online Businessjzhang/CS689/PPDM-Chapter10.pdf · Most recommendation systems rely on Collaborative Filtering (CF) technique In CF‐based

COLLABORATIVE FILTERING

Most recommendation systems  rely on Collaborative Filtering (CF) technique

In CF‐based recommendation systems, shopping history is analyzed in order to establish connections between users and products.

A profile is created by evaluating contents used by a user in the past, and recommendations are made by evaluating users with similar profiles.

10

Page 12: Computer Based Recommendation Systems in Online Businessjzhang/CS689/PPDM-Chapter10.pdf · Most recommendation systems rely on Collaborative Filtering (CF) technique In CF‐based

COLLABORATIVE FILTERING CONT.

Maintain a database of many users’ ratings of a variety of items.

For a given user, find other similar users whose ratings strongly correlate with the current user.

Recommend items rated highly by these similar users, but not rated by the current user.

Almost all existing commercial recommendation systems use this approach (e.g., Amazon).

11

Page 13: Computer Based Recommendation Systems in Online Businessjzhang/CS689/PPDM-Chapter10.pdf · Most recommendation systems rely on Collaborative Filtering (CF) technique In CF‐based

COLLABORATIVE FILTERING CONT.

12

Page 14: Computer Based Recommendation Systems in Online Businessjzhang/CS689/PPDM-Chapter10.pdf · Most recommendation systems rely on Collaborative Filtering (CF) technique In CF‐based

EXAMPLE OF TRANSACTIONS

Each user may purchase several items Each item could be purchased by a few users

13

User Itemsu01 Bread, Milku02 Bread, Diaper, Beer, Eggsu03 Milk, Diaper, Beer, Cokeu04 Bread, Milk, Diaper, Beeru05 Bread, Milk, Diaper, Coke

Page 15: Computer Based Recommendation Systems in Online Businessjzhang/CS689/PPDM-Chapter10.pdf · Most recommendation systems rely on Collaborative Filtering (CF) technique In CF‐based

THE USER-ITEM RATING MATRIX

14

1100022000005550022200111Jenny

Emily

Diane

Jeremy

Stefan

user vector

item vector

Page 16: Computer Based Recommendation Systems in Online Businessjzhang/CS689/PPDM-Chapter10.pdf · Most recommendation systems rely on Collaborative Filtering (CF) technique In CF‐based

RECOMMENDATION MODELS

Four basic models of recommendation systems are studied Item Popularity‐based Model (IP)

Item Similarity‐based Model (IS)

SVD‐based Latent Factor Model (SVD)

Bipartite Graph Model (BG)

15

Page 17: Computer Based Recommendation Systems in Online Businessjzhang/CS689/PPDM-Chapter10.pdf · Most recommendation systems rely on Collaborative Filtering (CF) technique In CF‐based

WHICH ONE IS THE BSET?

There has been no definite answers as to which recommendation algorithm is the best  Most published comparison results are based on some special datasets

These datasets are twisted

We need some comparison results based on real commercial datasets

16

Page 18: Computer Based Recommendation Systems in Online Businessjzhang/CS689/PPDM-Chapter10.pdf · Most recommendation systems rely on Collaborative Filtering (CF) technique In CF‐based

ITEM POPULARITY-BASED MODEL (IP)

The most primitive model in RS.

Recommend most popular, most viewed, or best selling items to users.

It overlooks user’s preferences,  but can be used as an auxiliary component in some recommendation systems.

There is a filtering step for IP to improve the prediction result.

17

Page 19: Computer Based Recommendation Systems in Online Businessjzhang/CS689/PPDM-Chapter10.pdf · Most recommendation systems rely on Collaborative Filtering (CF) technique In CF‐based

FILTERING STEP IN IP

If a user prefers to view an item just once, then the algorithm should not recommend the items that have already been viewed by this user;

If a user prefers to view an item several times, the items that have been viewed  by this user could be presented to him/her again.

This step can also be applied to other recommendation models.

18

Page 20: Computer Based Recommendation Systems in Online Businessjzhang/CS689/PPDM-Chapter10.pdf · Most recommendation systems rely on Collaborative Filtering (CF) technique In CF‐based

ITEM SIMILARITY-BASED MODEL (IS)

In similarity‐based recommendation, the prediction is based on the similarity between items.

19

Item1

Item2

Item3

User1

User2

User3 Item4

Item4 can be recommended to User 1, Item1 can be recommended toUser 3.

Page 21: Computer Based Recommendation Systems in Online Businessjzhang/CS689/PPDM-Chapter10.pdf · Most recommendation systems rely on Collaborative Filtering (CF) technique In CF‐based

SIMILARITY MEASURE

Central to most item‐oriented approaches is a similarity measure between items.

Pearson correlation coefficient is a measure of the strength of linear dependence between two variables

It measures the tendency of users to rate items iand j similarly.

20

d

kjjk

d

kiik

d

kjjkiik

xx

jiij

xxxx

xxxxxx

ji

1

2

1

2

1

)()(

))((),cov(

Page 22: Computer Based Recommendation Systems in Online Businessjzhang/CS689/PPDM-Chapter10.pdf · Most recommendation systems rely on Collaborative Filtering (CF) technique In CF‐based

IMPROVED SIMILARITY MEASURE

To improve the reliability of the similarity measure, a modification can be applied to the equation:

where nck denotes the number of items that user k has viewed.

21

)log

11('

kikik nc

xx

Page 23: Computer Based Recommendation Systems in Online Businessjzhang/CS689/PPDM-Chapter10.pdf · Most recommendation systems rely on Collaborative Filtering (CF) technique In CF‐based

THE MODEL

We also take into account the item popularity. 

where S(i;u) is the set of items that were viewed by uesr u, npi denotes the view count of item i. N is the global maximum view count.

22

Nnp

uiSr i

uiSjijui

ij

)1();(

1ˆ0

);(

2

Similarity Popularity

Page 24: Computer Based Recommendation Systems in Online Businessjzhang/CS689/PPDM-Chapter10.pdf · Most recommendation systems rely on Collaborative Filtering (CF) technique In CF‐based

SVD-BASED LATENT FACTOR MODEL (SVD)

Latent factor model focuses on reducing dimensionality of the user‐item rating matrix to discover some “latent factors”.

Original matrix = Factor matrix * … * Factor matrixOriginal matrix: sparse, not orderedFactor matrix: compact, ordered

23

Page 25: Computer Based Recommendation Systems in Online Businessjzhang/CS689/PPDM-Chapter10.pdf · Most recommendation systems rely on Collaborative Filtering (CF) technique In CF‐based

SINGULAR VALUE DECOMPOSITION

A is an m‐by‐n matrix with rank(A) = r. It could be decomposed into three matrices.

U is an m‐by‐m orthogonal matrix and V is an n‐by‐northogonal matrix.

is the singular value 

matrix, where D is a diagonal matrix with the singular values on its diagonal.

24

Tnm VUDA 1

nmrnrmrrm

rnrrr

OOOD

D

)()()(

)(1

Page 26: Computer Based Recommendation Systems in Online Businessjzhang/CS689/PPDM-Chapter10.pdf · Most recommendation systems rely on Collaborative Filtering (CF) technique In CF‐based

SINGULAR VALUE DECOMPOSITION CONT.

The singular values in D have the property σ1 ≥ σ2 ≥ … ≥ σr> 0. 

25

r

D

3

2

1

Page 27: Computer Based Recommendation Systems in Online Businessjzhang/CS689/PPDM-Chapter10.pdf · Most recommendation systems rely on Collaborative Filtering (CF) technique In CF‐based

SVD-BASED LATENT FACTOR MODEL CONT.

26

Singular Value Decomposition (SVD)

Page 28: Computer Based Recommendation Systems in Online Businessjzhang/CS689/PPDM-Chapter10.pdf · Most recommendation systems rely on Collaborative Filtering (CF) technique In CF‐based

REDUCE THE DIMENSION

If we only focus on those r non‐trivial singular values, the effective dimensions of the SVD matrices U, D and V will be m×r, r×r and n×r .

To reduce the dimension of data, we could retain the klargest singular values of D and discard the rest. 

Expect to capture the underlying latent structure of the original data.

27

Tknkkkmnm VDUA

Page 29: Computer Based Recommendation Systems in Online Businessjzhang/CS689/PPDM-Chapter10.pdf · Most recommendation systems rely on Collaborative Filtering (CF) technique In CF‐based

AN EXAMPLE

28

71.071.00000058.058.058.0

16.30049.9

44.0089.00090.0036.0018.0

1100022000005550022200111

                                                   daily use

digital

Jenny

Emily

Diane

Jeremy

Stefan

housewife

digital fans

housewife

digital fans

user – category similarity matrix

strength of digital fans

item – category similarity matrix

Page 30: Computer Based Recommendation Systems in Online Businessjzhang/CS689/PPDM-Chapter10.pdf · Most recommendation systems rely on Collaborative Filtering (CF) technique In CF‐based

SVD-BASED RECOMMENDATION

SVD‐based model factorizes the user‐item matrix into two lower rank matrices, i.e., a “user‐factor” matrix Pm×f and an “item‐factor” matrix Qn×f.

Each user u and item i can be represented as an f‐dimensional factor vector pu and qi.

The prediction of a rating from user u on item ican be computed by

29

iTuui qpr ˆ

Page 31: Computer Based Recommendation Systems in Online Businessjzhang/CS689/PPDM-Chapter10.pdf · Most recommendation systems rely on Collaborative Filtering (CF) technique In CF‐based

SVD-BASED RECOMMENDATION CONT.

Decompose the user‐item rating matrix R into three submatrices:

where U and Q are orthonormal matrices, D is a diagonal matrix.

It can be inferred that

30

TQDUR

QRDUP TQPR

Page 32: Computer Based Recommendation Systems in Online Businessjzhang/CS689/PPDM-Chapter10.pdf · Most recommendation systems rely on Collaborative Filtering (CF) technique In CF‐based

SVD-BASED RECOMMENDATION CONT.

if we use ru to denote the u‐th row of the rating matrix R, then the user factor vector can be obtained by taking the product of ru and Q, i.e., 

and 

31

TQPR

Qrp uu

Tiuui qQrr ˆ

QRDUP

Page 33: Computer Based Recommendation Systems in Online Businessjzhang/CS689/PPDM-Chapter10.pdf · Most recommendation systems rely on Collaborative Filtering (CF) technique In CF‐based

BIPARTITE GRAPH MODEL (BG)

In BG, users and items are modeled as vertices of a graph. 

32

Page 34: Computer Based Recommendation Systems in Online Businessjzhang/CS689/PPDM-Chapter10.pdf · Most recommendation systems rely on Collaborative Filtering (CF) technique In CF‐based

BIPARTITE GRAPH MODEL (BG) CONT.

In Bipartite Graph model, all item nodes form a finite Markov chain with transition matrix P.

33

m

kjkkijiij tuPutPttPp

1

))|()|(()|(

n

jkj

kiki

r

rutP

1

)|(

m

kkj

kjjk

r

rtuP

1

)|(

probability of a chain ends in ti with initial node tj.

Page 35: Computer Based Recommendation Systems in Online Businessjzhang/CS689/PPDM-Chapter10.pdf · Most recommendation systems rely on Collaborative Filtering (CF) technique In CF‐based

PREDICTION BASED ON TRANSITION

MATRIX

Given the previous click history of user u, the rating of item i can be predicted by

where Tk is the initial state vector for user k in a Markov chain and Tk(tj) is the component corresponding to item j.

34

m

jjkijui tTpr

1

))((ˆ

n

lkl

kjjk

r

rtT

1

)(

Page 36: Computer Based Recommendation Systems in Online Businessjzhang/CS689/PPDM-Chapter10.pdf · Most recommendation systems rely on Collaborative Filtering (CF) technique In CF‐based

EXPERIMENTAL STUDY

The datasets in the experiment are clicking history from some online shopping websites.

35

pid:13505646 - siteId:9093 - uid:08097540 - date:2010-08-08pid:16062417 - siteId:9102 - uid:95429188 - date:2010-08-08pid:12546546 - siteId:7167 - uid:71516943 - date:2010-08-08pid:691224 - siteId:4266 - uid:07079557 - date:2010-08-08pid:4577421 - siteId:4266 - uid:07079557 - date:2010-08-08…

Page 37: Computer Based Recommendation Systems in Online Businessjzhang/CS689/PPDM-Chapter10.pdf · Most recommendation systems rely on Collaborative Filtering (CF) technique In CF‐based

STATISTICS ON DATASETS

36

Page 38: Computer Based Recommendation Systems in Online Businessjzhang/CS689/PPDM-Chapter10.pdf · Most recommendation systems rely on Collaborative Filtering (CF) technique In CF‐based

EVALUATION STRATEGY

The dataset has been divided into three sets, namely training set , test set and last transaction set. 

Our goal is to use training set to train the model and apply it on the test set to predict the last transaction of test users. 

37

user0: item0, item1, item2, item3

Page 39: Computer Based Recommendation Systems in Online Businessjzhang/CS689/PPDM-Chapter10.pdf · Most recommendation systems rely on Collaborative Filtering (CF) technique In CF‐based

EVALUATION STRATEGY

38

Xaction 1Xaction 2...Xaction 250000

Xaction 250001...Xaction 300000

Xaction 1Xaction 2...Xaction 250000Xaction 250001

Xaction 250001...Xaction 300000

Original Data Set

Training Set

Test Set

Page 40: Computer Based Recommendation Systems in Online Businessjzhang/CS689/PPDM-Chapter10.pdf · Most recommendation systems rely on Collaborative Filtering (CF) technique In CF‐based

EVALUATION STRATEGY CONT.

The�quality�of�the�results�is�measured�by�the�hit�rate:

Use�Top-10�Recommendation�to�verify�the�models

39

users test#users test predicted correctly#

hitRate

Page 41: Computer Based Recommendation Systems in Online Businessjzhang/CS689/PPDM-Chapter10.pdf · Most recommendation systems rely on Collaborative Filtering (CF) technique In CF‐based

PARAMETER STUDY

γ in�item�similarity-based�model

40

Nnp

uiSr i

uiSjijui

ij

)1();(

1ˆ0

);(

2

Page 42: Computer Based Recommendation Systems in Online Businessjzhang/CS689/PPDM-Chapter10.pdf · Most recommendation systems rely on Collaborative Filtering (CF) technique In CF‐based

TOP-10 PREDICTION ON DATASETS

41

20,471 users499 items

60 factors for SVD

Page 43: Computer Based Recommendation Systems in Online Businessjzhang/CS689/PPDM-Chapter10.pdf · Most recommendation systems rely on Collaborative Filtering (CF) technique In CF‐based

TOP-10 PREDICTION ON DATASETS CONT.

42

148,409 users1,004 items

70 factors for SVD

Page 44: Computer Based Recommendation Systems in Online Businessjzhang/CS689/PPDM-Chapter10.pdf · Most recommendation systems rely on Collaborative Filtering (CF) technique In CF‐based

TOP-10 PREDICTION ON DATASETS CONT.

43

70,049 users2,303 items

100 factors for SVD

Page 45: Computer Based Recommendation Systems in Online Businessjzhang/CS689/PPDM-Chapter10.pdf · Most recommendation systems rely on Collaborative Filtering (CF) technique In CF‐based

TOP-10 PREDICTION ON DATASETS CONT.

44

112,738 users94 items

94 factors for SVD

Page 46: Computer Based Recommendation Systems in Online Businessjzhang/CS689/PPDM-Chapter10.pdf · Most recommendation systems rely on Collaborative Filtering (CF) technique In CF‐based

SUMMARY

Item Popularity‐based model is not suitable for most datasets but could be used as an auxiliary component.

If the dataset has few items but lots of users, SVD‐based model is a good choice.

Item Similarity‐based model and Bipartite Graph model have similar idea so they perform very similarly. They are suitable for the datasets with “normal” number of items and users.

The filtered Bipartite Graph model is a “won’t‐be‐wrong” method in most cases. 45

Page 47: Computer Based Recommendation Systems in Online Businessjzhang/CS689/PPDM-Chapter10.pdf · Most recommendation systems rely on Collaborative Filtering (CF) technique In CF‐based

FUTURE CONSIDERATIONS – INCREMENTAL

DATA

How to handle incremental data?

The amount of shopping data increases every minute, every day

How to handle the new data?

Do we re‐compute the entire data every day, or do we just compute the new data and add it to the existing data?

How do the strategies affect the accuracy of the recommendations?

46

Page 48: Computer Based Recommendation Systems in Online Businessjzhang/CS689/PPDM-Chapter10.pdf · Most recommendation systems rely on Collaborative Filtering (CF) technique In CF‐based

FUTURE CONSIDERATION – IMPUTATION

STRATEGIES

When a person did not buy an item, it does not mean that the person does not like or dislike the item

The item usually receives a rating of 0, which is not the actual rating

The sparse matrix must be imputed – by filling out the missing values

Different imputation strategies will lead to different results, in terms of accuracy and computational efficiency

47

Page 49: Computer Based Recommendation Systems in Online Businessjzhang/CS689/PPDM-Chapter10.pdf · Most recommendation systems rely on Collaborative Filtering (CF) technique In CF‐based

FUTURE CONSIDERATION – DATA PRIVACY

Small merchants do not have the manpower to maintain a recommendation system

They usually buy the services from a third party

They have to provide their customer data to the third party for analysis

This may cause the leak of customer privacy or trade secret

How to pre‐process customer data so that data privacy is preserved? 

48

Page 50: Computer Based Recommendation Systems in Online Businessjzhang/CS689/PPDM-Chapter10.pdf · Most recommendation systems rely on Collaborative Filtering (CF) technique In CF‐based

Professor Jun Zhang

Department of Computer ScienceUniversity of KentuckyLexington, KY 40506-0633, USA

E-mail: [email protected]://www.cs.uky.edu/~jzhangTel: 13540021323

Page 51: Computer Based Recommendation Systems in Online Businessjzhang/CS689/PPDM-Chapter10.pdf · Most recommendation systems rely on Collaborative Filtering (CF) technique In CF‐based

TOP-N PREDICTIONS

50

20,471 users499 items

60 factors for SVD

Page 52: Computer Based Recommendation Systems in Online Businessjzhang/CS689/PPDM-Chapter10.pdf · Most recommendation systems rely on Collaborative Filtering (CF) technique In CF‐based

TOP-N PREDICTIONS CONT.

51

148,409 users1,004 items

70 factors for SVD

Page 53: Computer Based Recommendation Systems in Online Businessjzhang/CS689/PPDM-Chapter10.pdf · Most recommendation systems rely on Collaborative Filtering (CF) technique In CF‐based

TOP-N PREDICTIONS CONT.

52

112,738 users94 items

94 factors for SVD

Page 54: Computer Based Recommendation Systems in Online Businessjzhang/CS689/PPDM-Chapter10.pdf · Most recommendation systems rely on Collaborative Filtering (CF) technique In CF‐based

TOP-N PREDICTIONS CONT.

53

70,049 users2,303 items

100 factors for SVD

Page 55: Computer Based Recommendation Systems in Online Businessjzhang/CS689/PPDM-Chapter10.pdf · Most recommendation systems rely on Collaborative Filtering (CF) technique In CF‐based

CONCLUSION AND FUTURE WORK

We presented concept and background of recommender system in e‐Business.

There are two classes of recommendation techniques: Content‐based filtering and Collaborative filtering. The latter is extensively used in popular online shopping websites.

We illustrated four basic collaborative filtering algorithms and conducted an empirical study of them on four datasets from a retargeting company.

54

Page 56: Computer Based Recommendation Systems in Online Businessjzhang/CS689/PPDM-Chapter10.pdf · Most recommendation systems rely on Collaborative Filtering (CF) technique In CF‐based

CONCLUSION AND FUTURE WORK

The filtering step of Item‐popularity model has a good effect on Item Similarity and Bipartite Graphmodels but has totally no effect on SVDmodel.

We discovered a strategy to choose models in terms of the features of the datasets.

The models could be combined to provide better prediction accuracy.

Privacy issues in recommender systems. (Future)

55

Page 57: Computer Based Recommendation Systems in Online Businessjzhang/CS689/PPDM-Chapter10.pdf · Most recommendation systems rely on Collaborative Filtering (CF) technique In CF‐based

International Conference on Business Computing and Global Informatization

BCGIn 2013

September 13-15, 2013Changsha, China

www.bcgin.org