Amazon Review Rating Prediction with Text-Mining, … · Health & Personal Care 346355 38609 18534...

12
Amazon Review Rating Prediction with Text-Mining, Latent-Factor Model and Restricted Boltzmann Machine Cheng Guo A53201515 [email protected] Juncheng Liu A53223244 [email protected] Zhichen Wu A53214514 [email protected] Linghao Zhu A53203446 [email protected] Abstract For electronic commerce companies, in order to make recommendations to users, they must first make prediction of how a user will respond to a new product. To do so, they should find out the preference of each user as well as the features of each product. Therefore, the task to predict the rating from the review information is a crucial task. In this paper, we adopt three methods to accomplish the task of rating prediction, one is text mininig approach with the review text information, another one is latent-factor model and the other one is the RBM(Restricted Boltzmann Machine). In our experiments, we compare the performance of these three models on the Amazon Review Datasets of different product categories and find that for datasets with different features, the performance of these models varies. Through comparison, we find that for datasets with dense user-item pairs(all users and items have at least several reviews), the latent-factor model could performs quite well. For datasets with enough review text information, the text-mining method shows strong prediction ability. And RBM is an approach with great potential that worth further exploration and research. 1 Introduction The goal of our project is to predict ratings from review information. Online reviews play a crucial role for users to decide between products. They are extensively used for movies, on online shopping sites, restaurant, etc. Most platforms allow users to submit a text review as well as a numeric rating. We implement a number of methods to predict ratings for the Amazon Review Dataset including the text-mining, latent factor model and RBM(Restricted Boltzmann Machine). These models are relatively simple, but could often have good performance in practice. Also as we notice that the performance of these models vary with different dataset with different features, we find out for specific dataset which model is the perfect solution. Specifically, as we know that latent-factor model could perform well on the dataset with dense user-item pairs, we “compress” the dataset step by step and explore the performance of each model. 2 Dataset Description The dataset we use is the Amazon Review Dataset crawled in [2] spanning May 1996 - July 2014, which contains approximately 35 million reviews totally. And this dataset is further divided into 26 parts based on the top-level category of each product (e.g. books, movies). 1

Transcript of Amazon Review Rating Prediction with Text-Mining, … · Health & Personal Care 346355 38609 18534...

Amazon Review Rating Prediction with Text-Mining,Latent-Factor Model and Restricted Boltzmann

Machine

Cheng GuoA53201515

[email protected]

Juncheng LiuA53223244

[email protected]

Zhichen WuA53214514

[email protected]

Linghao ZhuA53203446

[email protected]

Abstract

For electronic commerce companies, in order to make recommendations to users,they must first make prediction of how a user will respond to a new product. To doso, they should find out the preference of each user as well as the features of eachproduct. Therefore, the task to predict the rating from the review information is acrucial task. In this paper, we adopt three methods to accomplish the task of ratingprediction, one is text mininig approach with the review text information, anotherone is latent-factor model and the other one is the RBM(Restricted BoltzmannMachine). In our experiments, we compare the performance of these three modelson the Amazon Review Datasets of different product categories and find that fordatasets with different features, the performance of these models varies. Throughcomparison, we find that for datasets with dense user-item pairs(all users and itemshave at least several reviews), the latent-factor model could performs quite well.For datasets with enough review text information, the text-mining method showsstrong prediction ability. And RBM is an approach with great potential that worthfurther exploration and research.

1 Introduction

The goal of our project is to predict ratings from review information. Online reviews play a crucialrole for users to decide between products. They are extensively used for movies, on online shoppingsites, restaurant, etc. Most platforms allow users to submit a text review as well as a numeric rating.

We implement a number of methods to predict ratings for the Amazon Review Dataset includingthe text-mining, latent factor model and RBM(Restricted Boltzmann Machine). These models arerelatively simple, but could often have good performance in practice. Also as we notice that theperformance of these models vary with different dataset with different features, we find out forspecific dataset which model is the perfect solution. Specifically, as we know that latent-factormodel could perform well on the dataset with dense user-item pairs, we “compress” the dataset stepby step and explore the performance of each model.

2 Dataset Description

The dataset we use is the Amazon Review Dataset crawled in [2] spanning May 1996 - July 2014,which contains approximately 35 million reviews totally. And this dataset is further divided into 26parts based on the top-level category of each product (e.g. books, movies).

1

2.1 Basic Statistics and Property

We choose the preprocessed dense dataset with 5-core where each of the remaining users and itemshave 5 reviews each. In our experiment, in order to compare the performance of the models ondifferent category dataset, we choose 3 categories of similar dataset size, i.e. Video Games, Healthand Personal Care, and Beauty. A summary of the dataset is shown in the following table.

Cagetory #Reviews #Users #Items #Vocabulary #Words avg #WordsVideo Games 231780 24303 10672 507742 47.6M 205Health & Personal Care 346355 38609 18534 314105 32.7M 94Beauty 198502 22363 12101 162539 17.6M 88

Table 1Dataset statistics (number of users; number of items;number of reviews;

vocabulary size; total number of words; average number of words per review)

We could find that in this dataset, the vocabulary and words are quite rich so that text-mining methodcould be ultilized to extract significant information for the rating prediction task. Also, we could tellthat each user has writen 10 reviews and each items has been reviewed 20 times on average so thatthe user-item pair are quite dense in this dataset where latent-factor model could perform quite well.

And for each review, the specific format is as follows.

• reviewerID - ID of the reviewer, e.g. A2SUAM1J3GNN3B• asin - ID of the product, e.g. 0000013714• reviewerName - name of the reviewer• helpful - helpfulness rating of the review, e.g. 3 of 5• reviewText - text of the review• overall - rating of the product• summary - summary of the review• unixReviewTime - time of the review (unix time)• reviewTime - time of the review (raw)

2.2 Exploratory Analysis

And for the exploratory analysis, we first explore the rating distribution of the dataset which is shownas the following figures.

(a) Health & Personal Care (b) Video Games (c) Beauty

Figure 1: Rating Distribution

And from the distribution, we could find that most of the ratings fo all three categories are quitehigh, where 5-star rating reviews count for almost half all the reviews. Therefore, focusing on howto recognize the texture features in the negative reviews would definitely help the text-mining modelto improve the rating prediction performance.

2

Also we explores the “density” of the user-item pair in the dataset. Specifically, we figure out theuser and item distribution for each other.

(a) Health & Personal Care (b) Video Games (c) Beauty

Figure 2: Item Distribution for Users

(a) Health & Personal Care (b) Video Games (c) Beauty

Figure 3: User Distribution for Items

As we could tell from the above figures, as we choose the preprocessed 5-core dataset, for each itemthere are at least 5 users reviewing it and vice versa, which is far much denser than the original rawreview dataset. Therefore, we think that the latent-factor model could be adopted for this kind ofdense dataset. Also we make hypothesis that if we “compress” the data(increase the k-core index)more aggresively, the performance of the model might improve, we would prove this in the followingexperiments.

3 Predictive Task Identification

Our main prediction task is to predict the rating score from the given review information with dif-ferent models on different dataset. With text mining method and latent-factor model, this can beframed as a regression problem where the ratings are just continuous from 1 to 5. And with RBMmodel, this problem is transformed into a clssification problem where ratings are intergers from 1 to5 which can be viewed as 5 different classes.

Also, we are interested in the comparison of performance of different models on different datasets.Specifically, we “compress” the dataset for user-item pair by increasing the k-core index so thatonly the users and items with large number of reviews are kept in the dataset, which make thedataset more “dense”. Then we explore how the performance of different models would changewith the compression of dataset.

3.1 Evaluation of Model

For prediction problem, we mainly adopt MSE(Mean Square Error) as our metric to evaluate theperformance of our model. Also we would consider the effect of data size on the performance of theprediction. Furthermore, for the text-mining model, we would extract the most representitive wordswith highest or lowest weight out of the vocabulary in the positive reviews and negative reviewsfor each product category and justify whether these words make sense or not. For each category ofdataset, we randomly select 80% as training set and the rest 20% as testing set.

3

3.2 Relevant Baseline

Average rating: Here the most simple baseline system is by taking the average across all trainingratings in the dataset. In terms of the MSE, this is the best possible constant predictor so that wecould use as the baseline system.

3.3 Data Preprocess

For the text-mining model, the features extracted from the data are the text features. Specifically,we adopt the bag-of-words model with TF-IDF weighted scheme which would be explained in thelatter section. To implement the TF-IDF feature extraction, we adopt the TfidfVectorizer modulein sklearn which first removes the punctuations and stopwords from the raw review data and thencalculates the TF-IDF score of each review. And for the latent-factor and RBM model, the onlyinformation we need is the rating-user-item triple, which could be easily extracted from the rawdataset.

And for the experiment on different dataset when we “compress” the dataset by increasing the k-core index, we iteratively remove these reviews in the dataset where the number of users or itemsless than the threshold k until there’s no change in the dataset. The original 5-core data has alreadycontains the data with k=5. Then we further “compress” the data by setting k=7,9,11,13 to get 5different dataset for each category. And the summary of the preprocessed dataset is as follows.

K-Core #Reviews #Users #Items #VocabularyHealth & Personal Care

5 346355 38609 18534 3141057 129642 8965 5330 1924419 60902 2632 1449 13337611 52160 1961 1181 12029013 46651 1595 1070 728209

Video Games5 231780 24303 10672 5077427 13060 9808 5641 3750239 71184 4212 2928 26305011 35891 1810 1466 17155713 6850 330 307 59867

Beauty5 198502 22363 12101 1625397 60276 4322 2423 856749 30818 1531 768 6025911 26983 1197 693 5515613 23352 949 624 49874

Table 2K-core Dataset statistics (number of users; number of items;number of reviews;vocabulary

size)

From the summary we could tell that with the compression of the dataset, number of reviews, users, items and vocabulary all drop dramatically. And the density of the user-item pair increases withthe compression.

4 Model Design and Description

In this section, we describe in detail the three methods we adopt for the rating prediction task andthe motivation for design the models.

4

4.1 Latent Factor Model

We first ignore the review text and try predicting the rating only based on the userID and itemID.In this senario, Latent Factor Model is intuitively a solution. We predict the rating based on thefollowing formula:

ru,i = α+ βu + βi + γuγi (1)

We use mean square error to measure our model. In addition, to prevent overfitting, we add L2regularizations to control the model complexity. Since α is a base estimation, we won’t penalize onit. And since β and γ have different dimensions and probably different magnitudes, we use differentcoefficients to penalize them. So the loss can be calculated as:

E =∑train

(α+ βu + βi + γuγi −Ru,i)2

+ λβ(∑u

β2u +

∑i

β2i ) + λγ(

∑u

‖γu‖22 +∑i

‖γi‖22)(2)

Following the loss definition, we can take derivetives on it and update α, β and γ accordinglyuntil convergence. In addition, different categories should have different distributions of ratings, soapplying multiple models respectively is a better choice.

4.1.1 Optimization

Besides applying different models, we can also incorporate category information into Latent FactorModel. Inspired by incorporating user information, we associate ρc, which is the latent factor forcategory c, with γi and multiply them together with γu. So the prediction will be changed to:

ru,i = α+ βu + βi + γu(γi +

C∑c=1

Ai(c)ρc) (3)

in which C is the total number of categories (in our case it is 3), andAi is an one-hot vector in whichAi(c) = 1 means that item i belongs to category c. Thus, the loss is changed to:

E =∑train

(α+ βu + βi + γu(γi +

∑c

Ai(c)ρc)−Ru,i)2

+ λβ(∑u

β2u +

∑i

β2i ) + λγ(

∑u

‖γu‖22 +∑i

‖γi‖22) + λρ∑c

‖ρc‖22(4)

To minimize the loss, we take dirivative on all parameters, which gives us:

∂E

∂α= 2

∑train

(α+ βu + βi + γu(γi +

∑c

Ai(c)ρc)−Ru,i)

∂E

∂βu= 2

∑i∈Iu

(α+ βu + βi + γu(γi +

∑c

Ai(c)ρc)−Ru,i)+ 2λββu

∂E

∂βi= 2

∑u∈Ui

(α+ βu + βi + γu(γi +

∑c

Ai(c)ρc)−Ru,i)+ 2λββi

(5)

5

For these three parameters, we can optimize them by equalizing them to zeros and solve the equa-tions.

∂E

∂γu= 2

∑i∈Iu

(α+ βu + βi + γu(γi +

∑c

Ai(c)ρc)−Ru,i)(γi +

∑c

Ai(c)ρc

)+ 2λγγu

∂E

∂γi= 2

∑u∈Ui

(α+ βu + βi + γu(γi +

∑c

Ai(c)ρc)−Ru,i)γu + 2λγγi

∂E

∂ρc= 2

∑train

(α+ βu + βi + γu(γi +

∑c

Ai(c)ρc)−Ru,i)γuAi(c) + 2λρρc

(6)

For these three parameters, we can optimize them by gradient descent on the full batch of data.

However, simply combining them from the beginning sometimes leads to bad direction. So toachieve better local minimum, we first update α and β until convergence, then update γ and ρ untilconvergence, finally update all parameters except α until convergence.

4.2 Restricted Boltzmann Machine

Boltzmann Machine is a generative stochastic neural network that can learn a probability distributionover its set of inputs. A Restricted Boltzmann Machine restricts its connectivity by allowing onlyone hidden layer and no edges between hidden units. By summing over the states of hidden unitstogether with the weights, we can get the probability distribution over the visible units. Then theoutput can be sampled based on that probability.

However, traditional RBM cannot solve the problem of rating prediction because of its binary statesand the missing rating data. So to deal with it, we have to apply the RBM according to Salakhutdinov[4]. In this paper, RBM is modified to using softmax visible units. Moreover, it constructs differentRBM model for different users, while sharing the weights between hidden units and the visible unitfor all the users who have rated that certain visible unit. Also, unrated visible units are disconnectedwith hidden units.

Unfortunately, we are unable to completely replicate the work in that paper. So the performance isquite limited.

4.3 Text Mining Approach

As there are rich text information in the review text, we try to adopt the text mining apporach forthe rating prediction task. For text mining approach, we extract the features from the review text,specifically the tf-idf weight for each unigram in the vocabulary. Typically, the tf-idf weight iscomposed by two terms: the first computes the normalized Term Frequency (TF), aka. the numberof times a word appears in a document, divided by the total number of words in that document; thesecond term is the Inverse Document Frequency (IDF), computed as the logarithm of the number ofthe documents in the corpus divided by the number of documents where the specific term appears.And due to the large amount of vocabulary, the feature matrix extracted with TF-IDF weight is justhuge and sparse so that the dimension reduction methods like PCA are not feasible plans. Also, asthe feature vector is too sparse, some other features like the helpfulness and time have negligibleeffect on the overall performance of regression, which we choose to discard for this task.

And after the feature extraction, we perform the regression with the SVR(Supporting Vector Re-gression) model. The model produced by support vector classification (as described above) dependsonly on a subset of the training data, because the cost function for building the model does not careabout training points that lie beyond the margin. Analogously, the model produced by SVR dependsonly on a subset of the training data, because the cost function for building the model ignores anytraining data close to the model prediction. And a linear SVR minimizes

1

2‖w‖2 + C

l∑i=1

(χi + χ∗i )

6

subject to

yi− < w, xi > b ≤ ε+ χi

< w, xi > +byi ≤ ε+ χ∗i

χi, χ∗i ≥ 0

where C is a penalty parameter, ε the insensitive tube parameter. We then perform a grid search forthese hyper-parameters. Due to the scaling issue, we randomly select only 50K samples from thedataset and use 3-fold cross-validation to determine the hyper-parameter and finally choose C = 1and ε = 0.2 as the best option. We’ve tried the linear kernel and rbf kernel and found that linearkernel performs better. As we introduce the penalty parameter C which is a regularization term, theoverfitting problem is alleviated.

The strength of text mining method is that it makes fully advantage of the text information in thereview. However the text mining requires a large amount of text data to train a descent model whichmake correct prediction.

4.4 Model Comparison

The three models we applied in this task have their strength and weakness respectively.

For Latent Factor Model, it can deal with pure rating data without any assisstance from other in-formation. So it is the most general model for this task. However, its performance might be highlyrelated to the density of the rating matrix. Once the matrix is too sparse, it can barely predict nothingbut an average rating.

For RBM, it almost share the same strength and weakness as Latent Factor Model. In addidion,it can take advantage of its hidden layer to explore more latent information. But RBM is too hardto implement and even harder to improve by either tuning the parameters or change its networkstructure.

For text mining method, it directly explore the information from review text, which is actually ahuge advantege if there is such information along with the rating. Nevertheless, it might suffer fromno sufficient data. That is, if we only have a few review text, the distribution of words as well as theexpression of words cannot be close to the real world situation.

5 Related Work

For the Amazon review rating prediction task, several previous related works have been explored forbetter performance. This Amazon review dataset is crawled from the Amazon website and widelyused in the research of text mining and latent-factor model to solve the problem of recommendersystems. Therefore, the state-of-the-art methods currently employed to study this problem are textmining methods and latent-factor model.

5.1 Latent-Factor Model

For the latent-factor model, the basic idea is to adopt the user-item pair with its rating and constructa model to learn the latent dimensions for the rating prediction task. The feasibility of this modelis build on the large quantity of user-item pair rating data where we have enough observation ofthe specific user or item. To overcome the cold-start problem, some related works have exploredapproaches to combine the information in the review text with the rating information[2] [1] so as toalleviate the cold-start problem and equip the model with better interpretability.

In the first one[2], latent rating dimensions (such as those of latent-factor recommender systems)are combined with latent review topics (such as those learned by topic models like LDA). And inthe second one[1], it propose a novel method to combine content-based filtering seamlessly withcollaborative filtering, modeling the reviews and ratings simultaneously.

7

5.2 Restricted Boltzmann Machine

In paper [4], Salakhutdinov shows how to use Restricted Boltzmann Machine to model tabular data.By adding constraints like sharing weights and disconnected edges, they are able to extend theapplication of RBM to users ratings prediction problems. They also derive efficient learning rulesand inference procedures for their model so that the performance can be further improved. Finally,they demonstrate that applying RBMs on Netflix data set can reduce the RMSE by 0.005 and evenmore when multiple RBM models and multiple SVD models are linearly combined.

5.3 Text Mining

For the text-mining method, the basic idea is to predicts product ratings by harnessing the informa-tion present in review text which this is especially helpful for new products and users, who may havetoo few ratings to model their latent factors, yet may still provide substantial information from thetext of even a single review. The most intuitive approach with this method is to adopt the N-gramsmodel with TF-IDF feature extraction which is presented in our experiment in the previous sections.This approach is usually adopted as the baseline system for comparison with further improvement.

For instance in the paper paper of Qu [3], the results of the baseline system with N-grams model isquite similar to our experiments results, which justifies the feasibility of our model selection. But tomake improvement, the concept of Bag-of-Opinions is introduced in this paper where an opinion,within a review, consists of three components: a root word, a set of modifier words from the samesentence, and one or more negation words. Each opinion is assigned a numeric score which islearned, by ridge regression. This method overcomes the sparsity problem in the N-grams modeland performs better than the naive N-grams model.

6 Experiment Results and Conclusion

6.1 Latent Factor Model

Latent Factor Model can be easily infulenced by the density of the dataset. If the dataset is toosparse, a new (user,item) pair cannot be precisely predicted because the given information is notenough to support the bias calculation. So we first conduct an experiment to show the relationbetween performances and the density of the dataset. In this experiment, we set the length for γ as5, λβ = 4 and λγ = 10 for category “video game”, λβ = 6 and λγ = 12 for category “health”, andλβ = 6 and λγ = 12 for category “beauty”.

Figure 4: Accuracies vs. minimum numbers of items/users per user/item

It can be seen from the figure above that the MSEs go smaller with the minimum numbers ofitems/users per user/item become larger in each category. From this aspect, Latent Factor Modeldoes improve with higher density.

8

We also conduct an experiment to demonstrate the difference of model with and without γ. TheMSEs of the three categories over different minimum number of items/users per user/item are shownin the following table.

Table 3: Comparison of MSEs with and without γ

category min # without γ with γ5 1.10226624779 1.101302069407 1.03449730014 1.03226650941

video game 9 0.96446515868 0.9623653421111 0.93984131191 0.9360452350213 0.89261020996 0.887687044775 1.06213995319 1.062024925227 0.84962904415 0.84894890012

health 9 0.73195039429 0.7299422702511 0.72845785356 0.7258712676313 0.72133358258 0.718438108415 1.16701007071 1.166712133447 0.91563682339 0.91419769846

beauty 9 0.71263453373 0.7108835548611 0.69146409242 0.6896906532413 0.69116775963 0.68943366486

It can be seen that including γ does imporve the performance, although it’s relatively trival. Thatmeans there exists some latent factors lying beneath the rating data, and they be expressed by someSVD-like factorization.

Besides the basic model, we also modify it by incorporating category information so that datasetwith mixed categories can be less universal. By mixing the datasets of the three categories andleaving only those with at least 9 items/users, we get a new mixed dataset. By applying the basicmodel as well as the improved one, with λβ = 5 and λγ = 10 and λρ = 5, we get MSEs as0.82181075125 and 0.82144958746 respectively. So there is a tiny improvement, which proves thefeasibility of incorporating category informations. In addition, since this imrovement is far lesssignificant than using seperate models, we can infer that the difference between categories are toolarge to be covered by ρ only. So using totally different αs, βs and γs is better.

6.2 Restricted Boltzmann Machine

Because RBM is implemented based on matrix, we cannot apply it on the original dataset. So weonly conduct experiments on ones with at least 7 related items/users. Here we set the number of hid-den units as 100, the epoch number as 5, and the batch size as 500, learning rate as 0.1, and momen-tum as 0.5. The MSEs are 1.1652970920770037, 0.9962176586621301, and 1.0767795450895149for the category “video game”, “health”, and “beauty” respectively. So it can be seen that direclyapplying RBM has very poor performance without the other optimization methods mentioned in thepaper.

6.3 Text Mining

For implementation of this model, we first calculate the TF-IDF weighted index with the TfidfVec-torizer module in sklearn. Then for the SVR model, we directly adopt the LinearSVR module insklearn which set the hyper-parameter C = 1 and ε = 0.2.

For text mining method, we extract the TF-IDF features from the dataset and adopt the SVR modelfor different datasets. The comparison of our method with the baseline method is in the followingfigure.

9

Figure 5MSE for different Datasets

From the figure we could tell that our method could beat the baseline method by almost 40%. Andfor both the baseline method and our method, as the data being ”compressed” the MSE decreases.Through our analysis, we think that this result is due to the higher quality of the review text whenthe dataset is ”compressed”. When the users and items with large number of reviews are left inthe dataset, although the size of the training data decreases, these reviews are usually of high qualitywhere we could extract richer text information and thus make more accurate rating predictions. Also,we notice that the MSE seems to increase a little bit when we compress the dataset too aggressively.This may be explained by the fact that when the dataset is not large enough to provide plenty of textinformation for training, the performance of the text mining model would be negatively affected.

Also, for the interpretation of our text model, we extract the words with the highest weight andlowest weight in the SVR model for each category to explain why the review text could effect theratings of the reviews.

(a) Health & Personal Care (b) Video Games (c) Beauty

Figure 6: Positive Words in Review Text

From the positive words, we could see some universal words that appears in all the categories like”amazed”, ”best”, ”great”. Also there are words actually make sense for each category. For instance,in the health and personal care category, the positive words are ”nutritious”, ”delicious”, ”mainte-nance”. In the video games category, the positive words are ”preinstalled”, ”plausible”, ”holy”, andfor the beauty category, the positive words are ”enriching”, ”repurchase” and ”relaxing”.

10

(a) Health & Personal Care (b) Video Games (c) Beauty

Figure 7: Negative Words in Review Text

And for negative words, some universal words like ”worst”, ”disappointing” and ”trash” appears inall the categories. And in the health category, ”inconvenient”, ”ineffective” and ”flimsy” are key-words for negative reviews. In the video games category, the keywords are ”boring”, ”uninstall” and”unplayable”. And for the beauty category, the keywords are ”crap”, ”return” and ”disappointed”.We could find that these keywords are quite different for each category so that we could make moreaccurate prediction if we design different text model for corresponding category.

6.4 Model Comparison and Conclusion

And the performances of different models on different datasets are shown in the followint table.

K-Core AverageBaseline

Text Mining Latent-FactorModel

RBM

Health & Personal Care5 1.2577 0.8458 1.10137 1.1077 0.7539 1.0322 1.16539 1.0402 0.6884 0.962411 1.0587 0.7074 0.936113 1.0364 0.7282 0.8876

Video Games5 1.4484 0.7887 1.06207 1.3825 0.7809 0.8489 0.99629 1.3347 0.7345 0.729911 1.2878 0.7479 0.725913 1.2248 0.7521 0.7184

Beauty5 1.3614 0.7928 1.16677 1.1298 0.7191 0.9142 1.07679 0.9712 0.62322 0.710911 0.9579 0.6018 0.689713 0.9628 0.6172 0.6894

Table 4Performance Comparison of Different Methods on Datasets

From the above table we can see that text mining is the best strategy for the rating prediction taskgiven the review text data. It can tower the other models on each category with all core numbers.But if we look into the trend, we will find that the performance of Latent Factor Model continuesto improve while the text mining starts to decay. So it implicitly shows that the Latent FactorModel can reach better, even close to text mining, performance with dense dataset. Therefore, wecould conclude that for dataset with rich text information, the text mining method could achievesatisfactory prediction accuracy. Then for dataset with dense user-item pair information, the Latent

11

Factor Model could perform quite well. And for RBM, it is a quite novel method with potential tobe explored and improved in future research.

References

[1] Guang Ling, Michael R Lyu, and Irwin King. “Ratings meet reviews, a combined approach torecommend”. In: Proceedings of the 8th ACM Conference on Recommender systems. ACM.2014, pp. 105–112.

[2] Julian McAuley and Jure Leskovec. “Hidden factors and hidden topics: understanding ratingdimensions with review text”. In: Proceedings of the 7th ACM conference on Recommendersystems. ACM. 2013, pp. 165–172.

[3] Lizhen Qu, Georgiana Ifrim, and Gerhard Weikum. “The bag-of-opinions method for reviewrating prediction from sparse text patterns”. In: Proceedings of the 23rd International Confer-ence on Computational Linguistics. Association for Computational Linguistics. 2010, pp. 913–921.

[4] Ruslan Salakhutdinov, Andriy Mnih, and Geoffrey Hinton. “Restricted Boltzmann machinesfor collaborative filtering”. In: Proceedings of the 24th international conference on Machinelearning. ACM. 2007, pp. 791–798.

12