1-s2.0-S0950705114000173-main

7/21/2019 1-s2.0-S0950705114000173-main

1/20

Multi-criteria collaborative filtering with high accuracy using higher

order singular value decomposition and Neuro-Fuzzy system

Mehrbakhsh Nilashi , Othman bin Ibrahim, Norafida Ithnin

Faculty of Computing, Universiti Teknologi Malaysia, 81310 Skudai, Johor, Malaysia

a r t i c l e i n f o

Article history:Received 8 April 2013

Received in revised form 3 January 2014

Accepted 6 January 2014

Available online 10 January 2014

Keywords:

Neuro-Fuzzy inference system

Higher order singular value decomposition

Subtractive clustering

Sparsity

Scalability

Multi-criteria collaborative filtering

a b s t r a c t

Collaborative Filtering (CF) is the most widely used prediction technique in recommender systems. Itmakes recommendations based on ratings that users have assigned to items. Most of the current CF rec-

ommender systems maintain only single user ratings inside the user-item ratings matrix. Multi-criteria

based CF presents a possibility of providing accurate recommendations by considering the user prefer-

ences in multi aspects of items. However, in the multi-criteria CF, the user behavior about items features

is frequently subjective, imprecise and vague. These in turn induce uncertainty in reasoning and repre-

sentation of items features that exactly cannot be solved using crisp machine learning techniques. In

contrast, approaches such as fuzzy methods instead of crisp methods can better solve the issue of uncer-

tainty. In addition, fuzzy methods can predict the users preference more accurately and even better alle-

viate the sparsity problem in overall rating by considering user perception about items features. Apart

from this, in the multi-criteria CF, users provide the ratings on different aspects (criteria) of an item in

new dimensions; thereby, increasing the scalability problem. Appropriate dimensionality reduction tech-

niques are thus needed to capture the high dimensions all together without reducing them into lower

dimensions to reveal the latent associations among the components. This study presents a new model

for multi-criteria CF using Adaptive Neuro-Fuzzy Inference System (ANFIS) combined with subtractive

clustering and Higher Order Singular Value Decomposition (HOSVD). HOSVD is used for dimensionality

reduction for improving the scalability problem and ANFIS is used for extracting fuzzy rules from theexperimental dataset, alleviating the sparsity problems in overall ratings and representing and reasoning

the users behavior on items features. Experimental results on real-world dataset show that combination

of two techniques remarkably improves the predictive accuracy and recommendation quality of multi-

criteria CF.

2014 Elsevier B.V. All rights reserved.

1. Introduction

During the last decade the amount of information available on-

line increased exponentially and information overload problem has

become one of the major challenges faced by information retrieval

and information filtering systems. Recommender systems are one

solution to the information overload problem. In the mid-1990s,recommender systems became active in the research domain when

the focus was shifted to recommendation problems by researchers

that explicitly rely on user rating structure and also emerged as an

independent research area[1].

Recommender systems based on Collaborative Filtering (CF) are

particularly popular and used by large online[24]. CF algorithms

can be divided into two categories: memory-based algorithms and

model based algorithms [3,5,6]. Memory-based (or heuristic-

based) methods, such as correlation analysis and vector similarity,

search the user database for user profiles that are similar to the

profile of the active user that the recommendation is made for

[7]. Heuristic-based approaches are classed into user-based and

item-based approaches [6,8]. User-based CF has been the most

popular and commonly used (memory-based) CF strategy [9]. It

is based on the premise that similar users will like similar items.

Item-based CF was first proposed by [10] as an alternative styleof CF that avoids the scalability bottleneck associated with the tra-

ditional user-based algorithm. The bottleneck arises from the

search for neighbors in a population of users that is continuously

growing. In item-based CF, similarities are calculated between

items rather than between users, the intuition being that a user

will be interested in items which are similar to items he has liked

in the past. Two of the most popular approaches to computing sim-

ilarities between users and items are the Pearson correlation coef-

ficient and cosine-based coefficients.

One of the main problems in the recommender systems specif-

ically CF is known as the sparsity problem [1114]. Also, memory

based CF approaches suffer from the scalability problem. Therefore,

0950-7051/$ - see front matter 2014 Elsevier B.V. All rights reserved.http://dx.doi.org/10.1016/j.knosys.2014.01.006

Corresponding author. Tel.: +60 197608281.

E-mail address:[email protected](M. Nilashi).

Knowledge-Based Systems 60 (2014) 82101

Contents lists available at ScienceDirect

Knowledge-Based Systems

j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / k n o s y s
http://dx.doi.org/10.1016/j.knosys.2014.01.006mailto:[email protected]://dx.doi.org/10.1016/j.knosys.2014.01.006http://www.sciencedirect.com/science/journal/09507051http://www.elsevier.com/locate/knosyshttp://www.elsevier.com/locate/knosyshttp://www.sciencedirect.com/science/journal/09507051http://dx.doi.org/10.1016/j.knosys.2014.01.006mailto:[email protected]://dx.doi.org/10.1016/j.knosys.2014.01.006http://crossmark.crossref.org/dialog/?doi=10.1016/j.knosys.2014.01.006&domain=pdf

7/21/2019 1-s2.0-S0950705114000173-main

2/20

7/21/2019 1-s2.0-S0950705114000173-main

3/20

applied on tensors with more than 3 dimensions. This can be one of

the main advantages of HOSVD which make it flexible and effective

approach for multi-criteria CF where other traditional machine

learning techniques have failed. It should be noted that using HOS-

VD the computation time for decomposition procedure is high

when the tensor order is increased. However, it can be done in

the offline phase and with incremental learning for data approxi-

mation procedure in the online phase.In the proposed model, ANFIS aims to extract knowledge (rules)

from the users ratings in multi aspect to be used in overall rating

prediction task. The extracted rules is employed for predicting un-

known ratings for alleviating sparsity problem in overall rating and

also revealing the real level of user preferences on items features.

The ANFIS provides flexible structure of defined problem that is

suitable for generating stipulated inputoutput pairs using a set

of induced fuzzy IFTHEN rules with appropriate and varied MFs

[27]. The produced Fuzzy Inference System (FIS) is served to pre-

dict user overall preferences about items features with proper

training. The elements of this model are a fuzzy set, a neural net-

work and data clustering. In addition, non-stochastic uncertainty

emerging from vagueness and imprecision is handled using ANFIS.

The MFs produced by ANFIS is used for representation and reason-

ing users behavior of providing rating according to their percep-

tion about items features. The MFs formed by ANFIS are

continuous and more accurate in representing the features of items

and user feedbacks. Furthermore, to prevent the problem of over-

fitting discussed in the previous researches [24,28], subtractive

clustering is applied to minimize overfitting by fine-tuning the AN-

FIS models and also the checking set is used to solve this problem

in the training data.

In the context of product recommendation, in practical appli-

cations and situations, customers are interested in rating the

items or express their preferences in linguistic terms, such as

{low interest}, {high interest} or {no interest} for the item fea-

tures. This gives a suggestion to design multi-criteria CF to be

user-friendly and convenient for users in giving ratings to items.

Therefore, for multi-criteria CF, the fuzzy logic and fuzzy set ismore appropriate in human linguistic reasoning with imprecise

concepts in relation to the crisp approaches. In addition, linguis-

tic terms are more suitable than numerical values in assessing

qualitative information, which is usually related to the human

perceptions, opinions and tastes. Hence, in multi-criteria CF, it

is more appropriate that the linguistic terms be considered for

users to express their preferences, knowledge and personal judg-

ments [29]. From this perspective, we can define users degrees

of preference regarding a particular item in a set of linguistic

terms such as {low interest}, {high interest} or {no interest} for

the feature of items. Furthermore, fuzzy approach provides a

way to quantify the non-stochastic uncertainty that is induced

from imprecision, vagueness, and subjectivity. Modeling with

fuzzy approach is more reliable than traditional statistical meth-ods such as Bayesian method which handles uncertainty due to

randomness. Moreover, the discovered fuzzy rules from the

users ratings through ANFIS can maintain in the rules database

to be used in the next predictions for items recommendation.

These properties promise to provide the framework for address-

ing the representation and inference challenges in multi-criteria

CF research.

In this study, we consider the proposed method for movie do-

main recommender systems. However, the method can also be

adopted for e-business and e-government applications recom-

mender systems such as recommender systems developed by

Zhang et al. [30]and Shambour and Lu[31,32]for e-business and

e-government applications, respectively.

Finally, we perform an in-depth experimental evaluation, whichthe user rating about items in multi aspects obtained from

Yahoo!Movies network and several comparisons are conducted be-

tween our method and other algorithms.

Thus, in comparison with research efforts found in the litera-

ture, our work has the following differences. In this research:

A new hybrid recommendation model using HOSVD and Neuro-

Fuzzy techniques is proposed for increasing the predictive accu-

racy and improving the scalability of the multi-criteria CF. Sparsity issue in overall ratings is solved using Neuro-Fuzzy

technique.

HOSVD is used for scalability improvement.

The remainderof this paper is organized as follows: In Section 2,

research background and related work are described. HOSVD

dimensional reduction technique,k-Nearest Neighbor (k-NN) Clas-

sifier, ANFIS and subtractive clustering are introduced in the sepa-

rate subsections in Sections3. Section4 provides an overview of

research methodology. Section 5 presents the result and discus-

sion. Finally, conclusions and future work is presented in Section6.

2. Research background and related work

In the area of personalized web search, Sun et al. [33]proposed

Cube singular value decomposition (CubeSVD) to improve Web

Search. Based on their CubeSVD analysis, which also used HOSVD

technique, web search activities carried out more efficiently. They

evaluated the method on MSN search engine data. In the field of

recommender systems, several recommendation models have been

proposed which have used three dimensional tensors for recom-

mending music, objects and tags. Recommender models, using

HOSVD for dimension reduction have been proposed for recom-

mending personalized music[22]and tags[34]. Xu et al.[35]used

HOSVD to provide item recommendations. Their work was com-

pared with a standard CF algorithm, without focusing in tag recom-

mendations. Leginus et al. [36] utilized clustering techniques for

reducing tag space that improved the quality of recommendationsand also the execution time of the factorization and decreases the

memory demands. Their proposed method was adaptable with

HOSVD. They also introduced a heuristic method to speed-up

parameters tuning process for HOSVD recommenders. Symeonidis

et al.[37]introduced a recommender based on HOSVD where each

tagging activity for a given item from a particular user is repre-

sented by value 1 in the initial tensor, all other cases were repre-

sented with 0. Li et al. [38] presented a multi-criteria rating

approach to improve personalized services in mobile commerce

using Multi-linear Singular Value Decomposition (MSVD). The

aim of their paper was to exploit context information about the

user as well as multi-criteria ratings in the recommendation

process.

The fuzzy logic field has grown considerably in a number ofapplications across a wide variety of domains like in the semantic

music recommendation system [39] and product recommenda-

tions[40]. Castellano et al.[41]developed a Neuro-Fuzzy strategy

combined with soft computing approaches for recommending Uni-

form Resource Locators (URLs) to the active users. They used fuzzy

clustering for creating user profile considering the similar brows-

ing behavior. de Campos et al.[42]proposed a model by combining

Bayesian network for governing the relationships between the

users and fuzzy set theory for presenting the vagueness in the

description of users ratings. A conceptual framework based on fuz-

zy logic-based was proposed by Yager[43]to represent and justify

the recommendation rules. In the proposed framework, an internal

description of the items was used that relied solely on the prefer-

ences of the active user. Carbo and Molina[44]developed an algo-rithm based on CF that ratings and recommendations were

84 M. Nilashi et al. / Knowledge-Based Systems 60 (2014) 82101

7/21/2019 1-s2.0-S0950705114000173-main

4/20

considered as linguistic labels by using fuzzy sets. A model pro-

posed by Pinto et al. [45] that combined fuzzy numbers, product

positioning (from marketing theory) and item-based CF. Zhang

et al. [30] developed a hybrid recommendation approach with

combination of user-based and item-based CF techniques using

fuzzy set techniques and applied it to mobile product and service

recommendation. They tested the prediction accuracy of their hy-

brid recommendation approach using MovieLens 100 K dataset.In case of multi-criteria CF, few researches has been conducted

to develop the similarity calculation of the traditional memory-

based CF approach to investigate multi-criteria rating [23,46,47]

that the similarities between users are estimated through aggre-

gating traditional similarities from individual criteria or applying

multidimensional distance metrics. Aggregation function approach

was seen by Adomavicius and Kwon [23] as the overall rating r0can serve as an aggregate of multi-criteria ratings. With all this

presumption, this method finds aggregation function f represent-

ing the connection between overall and multi-criteria ratings as:

r0 fr1; . . . ; rk 1

In order to developing the idea of Adomavicius and Kwon [23],

Sahoo et al. [48,49] extended the flexible mixture model (FMM)

developedby Si andJin [50] to multi-criteria recommendersystems.

The assumption of FMM is that two latent variables Zuand Zi(for

customers and products) provide just one rating ur of user u on item

i. They discovered the dependency framework of the overall rating

(r0) and multi-criteria ratings (r0,r1,r2, and r4). Liu et al. [51]pre-

sented a multi-criteria recommendation approach which was based

on the clustering of users. Their idea was that for each user one of

the criteria is dominant and users are grouped according to their

criteriapreferences. Theyapplied linear least squaresregression, as-

sign each user to one cluster, and evaluated different schemes for

the generation of predictions. They applied the methods on hotel

domain dataset with five criteria, Value, Location, Rooms, Service

and Cleanliness. Zhang et al. [52] proposed two types of multi-crite-

ria probabilistic latent semantic analysis algorithms extended from

the single-rating version. First, the mixture of multi-variate Gauss-ian distribution was assumed to be the underlying distribution of

multi-criteria ratings of each user. Second, they further assumed

the mixture of the linear Gaussian regression model as the underly-

ing distributionof multi-criteria ratings of eachuser, inspired by the

Bayesian network and linear regression.

Shambour and Lu [53] implemented a hybrid Multi-Criteria

Semantic enhanced CF (MC-SeCF) approach to alleviate limitations

such as sparsity and cold-start of the item-based CF techniques.

The experimental results on MovieLens dataset demonstrated the

effectiveness of their proposed approach in alleviating the sparsity

and cold-start items problems. They achieved high accuracy and

more coverage in very sparse and new items datasets than the

benchmark item-based CF recommendation algorithms. In the pro-

posed method for building a model using HOSVD and ANFIS, theexplicit ratings are needed. However, based on Nielsens 90-9-1

principle[54] more people will lurk in a virtual community than

will participate. Hence, with considering the Nielsens 90-9-1 prin-

ciple, appropriate and domesticated strategies are required to be

incorporated in multi-criteria CF such as developed method by

Shambour and Lu[53]which uses semantic information of items.

Generally, we view the MC-SeCF approach to be complementary

to our method. An opportunity for future work is therefore to com-

bine the predictions of such MC-SeCF approach with our method in

a hybrid approach. With respect to the achieved improvements by

Shambour and Lu [53], the major problems such as sparsity and

cold-start can be remarkably alleviated. These can be suggestions

that methods proposed by Shambour and Lu[53]and Kernel-SVD

[55,56]combined with HOSVD can be incorporated into multi-cri-teria CF to address the sparseness problem.

Jannach et al.[24]further developed the accuracy of multi-cri-

teria CF by proposing a method using Support Vector Regression

(SVR) for automatically detecting the existing relationships be-

tween detailed item ratings and the overall ratings. In addition,

the learning process of SVR models was per item and user and

lastly combined the individual predictions in a weighted approach.

Similar to our research, they evaluated their methods using

Yahoo!Movie dataset.

3. Materials and methods

3.1. Higher Order Singular Value Decomposition (HOSVD)

To represent and recognize high-dimensional data effectively,

the dimensionality reduction is conducted on the original dataset

for low-dimensional representation [57]. Visualizing, comparing,

and decreasing processing time of data are the main advantages

of dimensionality reduction techniques. HOSVD is one of the pow-

erful dimensionality reduction techniques for tensor decomposi-

tion proposed by Lathauwer et al. [58]. They proposed HOSVD as

a generalization of the SVD that is used for tensors decomposition.

For obtaining HOSVD calculations the following steps are

needed:

Step 1: Unfolding of the mode-dtensor T2 RI1...Id which yields

matrices A(1),. . .,A(d). They are defined as:

An 2j

nin1In2In3 . . . IdI1I2 . . . Id1in2In3In4 . . . IdI1I2 . . . Id1

IdI1I2 . . . Id1i1I2I3 . . . In1 in1;

in 0; 1; . . . In 1 2

The matrix unfolding of a tensor can be defined as matrix rep-

resentations of that tensor in which all the column (row, etc.) vec-

tors are stacked one after the other[58].

In the case of 3rd-order tensors T2 RI1I2I3 , there exist three

matrix unfolding (seeFig. 1) as:

mode 1: j =i2+ (i31)I3,

mode 2: j =i3+ (i11)I1,

mode 3: j =i1+ (i21)I2.

Step 2: Identifying the d left singular matrices as U(1),. . .,U(d)

obtained by:

An Un

XnV

n; n 1; . . . ; d 3

In the Eq. (3), the matrices Un 2 RIn In and valuesPn 2 RInI1I2 ...In1 In1 ...Id stands for singular values in a diagonal ma-trix includes with descending order. The matrix V(n) stands for right

singular matrices that V(n)TV(n)=I and U(n)TU(n)=I. These singular

matrices are orthonormal.

Step 3: Finding the S2 RI1I2...Id (core tensor) through con-tracting the left singular matrices U(n) with original tensorT:

S T1U1T2U

2TdUdT 4

whereSia a as sub-tensors ofS2 RI1 I2...Id are found through fix-

ing the nth index to a with ordering properties as:

kSin 1kF rn1 P kSin 2kFr

n2 P P kSin InkF

rnIn P 0 5

In Eq.(5), for all possible values ofn, rni kSin ikF(Frobenious

norms) stands to the ithn-mode singular value of tensorT.Fig. 2

shows a pseudo code for HOSVD algorithm.

Procedure HOSVD (Input: Tensor T)

For HOSVD the computation cost is calculated as shown inTable 1.

M. Nilashi et al. / Knowledge-Based Systems 60 (2014) 82101 85
http://-/?-

7/21/2019 1-s2.0-S0950705114000173-main

5/20

3.1.1. Truncated HOSVD

The truncated HOSVD is defined as a multi-rank approximation.

The truncated HOSVD is taken as the first approximation of an iter-

ative algorithm. The matrices and core tensor are updated itera-

tively starting with Eq. (4). The algorithm stops when it ceases to

improve the approximation or it reaches a maximum number of

iterations[59]. This iterative method belongs to the family of alter-

nating least-squares methods, and is called higher-order orthogo-

nal iteration[58].

According to Lathauwer et al. [58], for the determined decom-

position by HOSVD, the following norm holds:

kTk2FXR1i1

r1i 2

XRdi1

rdi 2

kSk2F 6

where the n-rank ofS is indicated by Rn. Suppose Rn(16 n6 d) bethe n-mode rank of tensor T. A tensoreT can be defined throughholding the largest I0n of n-mode singular values and ignoring the

remaining values. Thus, because of rank truncation, the error is

bounded by Lathauwer et al.[58]:

kT eTk 6Xdn1

XR1i1F1 1

rnin 2

7

In practice, using an analogous procedure demonstrated in

Fig. 2, the rank-(R1,R2,R2,. . .,Rd) ofeS(truncated core tensor) canbe defined by using Rnleading singular eigenvectors in preference

to keeping all left singular eigenvectors to build the transformation

matrixeUn.

3.2. k-Nearest Neighbor (k-NN) classifier

k-Nearest Neighbor (k-NN) classifier is a well-known and pow-

erful instance-based machine learning technique for classification

data[60]. By learning from all sorted training instances, k-NN sim-

ply can be applied to get results from training instances. Thek-NN

algorithm consists of two phases: training phase and classification

phase. In training phase, the training examples are vectors (each

with a class label) in a multidimensional feature space. In this

phase, the feature vectors and class labels of training samples are

stored. In the classification phase, k is a user-defined constant

(seeFig. 3), a query or test point (unlabelled vector) is classified

by assigning a label, which is the most recurrent among thektrain-

ing samples nearest to that query point. In other words, the k-NN

method compares the query point or an input feature vector with

a library of reference vectors, and the query point is labeled with

the nearest class of library feature vector. This way of categorizing

query points based on their distance to points in a training dataset

is a simple, yet an effective way of classifying new points. One of

the main advantages of thek-NN method in classifying the objects

is that it requires only few parameters to tune:k and the distancemetric, for achieving sufficiently high classification accuracy. Thus,

in k-NN based implementations, the best choice ofk and distance

metric for computing the nearest distance is an important task.

In k-NN classifier, the distance function usually is considered

Euclidean distance when the input vectors and outputs are real

numbers and discrete classes, respectively. In this study, we use

Euclidean, City-Block and correlation distance metrics for distance

calculation ink-NN.

Assume x1,x2,. . .,xmx indicates the first row vectors and y1,y2,. . .,ymy indicates the second row vectors, the various distance

metrics for measuring distance between xs and ytare defined as

follows:

dst ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiXnj1xsjytj2r 8

Mode-3 Unfolding: 1 2 31I I I

( )

=A Mode-3 Unfolding: 2 1 32I I I

( )

=A Mode-3 Unfolding: 3 1 23I I I

( )

=A

Fig. 1. Unfolding of a 3rd-order tensor.

Fig. 2. Procedure for decomposing tensors via HOSVD [59].

Table 1

Computational cost for main steps in HOSVD.

Step N-dim

Unfolding the tensor T O(I1I2. . .IN)

ConstructingAnAnT O(I2I1I2. . .In1In+1...IN)

DeterminingAnAnT

to obtainU(n) O(I3)

Contract tensorTwith matrices U(n) s to get tensorS O(I2I1I2. . .In1In+1...IN)


7/21/2019 1-s2.0-S0950705114000173-main

6/20

dstXnj1

jxsjytj 9

dst 1 xs xsyt xt

0ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffixs xsxs xs

0p ffiffiffiffiffi ffiffiffiffiffiffi ffiffiffiffiffiffiffi ffiffiffiffiffiffiffi ffiffiffiffiffiffi ffiffiffiffiffiffi

yt ytyt yt0

pxs

1

n

Xj

xsj and yt1

n

Xj

ysj

10

where Eqs.(8)(10)stand for Euclidean, City-Block and correlation

distance metrics, respectively.

3.3. Adaptive Neuro-Fuzzy Inference System (ANFIS)

Soft computing techniques are known for their efficiency in

dealing with complicated problems when conventional analytical

methods are infeasible or too expensive, with only sets of opera-

tional data available. Fuzzy logic (FL) and Fuzzy Inference Systems(FIS), first proposed by Zadeh[61], provide a solution for decision

making based on vague, ambiguous, imprecise or missing data.

FL represents models or knowledge using IFTHEN rules. A Neu-

ro-Fuzzy system is functionally equivalent to a FIS. A FIS mimics

a human reasoning process by implementing fuzzy sets and

approximate reasoning mechanism which use numerical values in-

stead of logical values. A FIS requires a domain expert to define the

MFs and to determine the associated parameters both in the MFs,

and the reasoning section [62,63]. However, there is no standard

for the knowledge acquisition process. Thus, the results may be dif-

ferent if a different knowledge engineer is at work in acquiring the

knowledge from experts. A Neuro-Fuzzy system can replace the

knowledge acquisition process by humans using a training process

with a set of inputoutput training dataset. Thus instead of depen-dent on human experts, the Neuro-Fuzzy system will determine

the parameters associated with the Neuro-Fuzzy system through

a training process, by minimizing an error criterion. A popular Neu-

ro-Fuzzy system is called an ANFIS. ANFIS is fuzzy system that uses

Artificial Neural Network (ANN) theory to determine its properties

(fuzzy sets and fuzzy rules)[6469]. It consists of five feed-forward

layers as shown inFig. 4.

The ANFIS is functionally equivalent to TakagiSugenoKang

(TSK) fuzzy model. It can also express its knowledge in the IF

THEN rule format as follow:

Rule 1:IF In1is A1AND In2is B1THENf11=p11In1+q11In2+r11 Rule 2:IF In1is A1AND In2is B1THENf12=p12In1+q12In2+r12

Rule 3:IF In1is A2AND In2is B2THENf21=p21In1+q21In2+r21 Rule 4:IF In1is A2AND In2is B2THENf22=p22In1+q22In2+r22

where the parametersA1,A2,B1and B2determine labels for indicat-

ing MFs for the inputs parameters In1 and In2, respectively. Also,

parameterspij,qijand rij(i,j = 1, 2) denote parameters of the output

MFs.

InFig. 4, the layers in ANFIS perform the different action that is

detailed as bellow:

Layer 1: In this layer, membership grades are provided by nodes

which are adaptive nodes. The outputs in this layer are obtained

by:

O1Ai lAi In1; i 1; 2

O1Bj lBjIn2; j 1;211

where appropriate MFs are indicated by Ai and Bj for the input

parametersIn1and In2that can be defined as triangular, trapezoidal

and Gaussian functions. The Gaussian type MFs for Ai and Bj MFs

and input parameters In1and In2are defined as below:

lAi In1;ri; ci exp In1ci

2

2r2i !; i 1; 2

lBj In2;rj; cj exp In2cj

2

2r2j

!; j 1;2

12

where the parameters of the MFs are defined as {ri,ci} and {rj,cj},

governing the Gaussian functions. In this layer, ANFIS parameters

stand usually as premise parameters.

Layer 2: There are fixed number of nodes in the second layer,

labeled with P. The outputs of the second layer can be defined

as:

O2ij Wij lAi In1lBj In2; i;j 1; 2 13

where the symbol Wij is used here to represent weight.

Layer 3: In this layer, very nodeilabeled withNdetermines the

ratio of theith rules firing strength to the sum of all rules firingstrengths as:

Fig. 3. k-NN fork = 8 and k = 5.

Fig. 4. The structure of ANFIS.

http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-

7/21/2019 1-s2.0-S0950705114000173-main

7/20

O3ij Wij WijX2

i1

P2j1Wij

; i;j 1; 2 14

where the output of this layer represents the normalized firing

strengths.

Layer 4: The nodes are adaptive nodes. The output of each node

in this layer is simply the product of the normalized firingstrength and a first-order polynomial (for a first-order Sugeno

model). Thus, the outputs of this layer are given by:

O4ij Wijfij WijpijIn1qijIn2rij; i;j 1; 2 15

where Wij is the output of layer 3, and {pij, qij, rij} is the parameter

set.

Layer 5: There is only one single fixed node labeled withR. This

node performs the summation of all incoming signals. Hence,

the overall output of the model is given by:

Out O5 X2i1

X2j1

Wijfij X2i1

X2j1

WijpijIn1qijIn2rij

X2

i1X2

j1

Wijpij In1 Wijqij In2 Wijrij 16where the overall output out is a linear combination of the conse-

quent parameters when the values of the premise parameters are

fixed.

3.4. Subtractive clustering

The idea in TSK model is that each rule in a rule base indicates

an area for a model, which can be linear [70]. The TSK rule struc-

ture in a basic shape is as follows:

If fx1 is A1;x2 is A2; . . . ;xk is Ak then y gx1;x2; . . . 17

where sentences forming the condition are connected through the

logical function f. The output y is obtained by gthat is a function

of the inputs x i.

In order to establish an effective TSK model of a process, using

subtractive clustering for generating clusters of datais constructive.

The main goal of using subtractive clustering as a cluster analyser is

to partition the dataset into a number of homogeneous and natural

subsets. The subtractive clustering method assumes each datapoint

is a potential cluster center and calculates a measure of the likeli-

hood that each data point would define the cluster center, based

on the density of surrounding data points. By using it, the quantity

of calculation is in proportion to thenumber of data points which is

foreign to the dimensions of problem. However, while the actual

cluster centers are not necessarily located at one of the data points,

in most cases it is a good approximation, especially with the re-

duced computation this approach requires[71]. In this method, a

data point with the highest potential, which is a function of the dis-tance measure, is considered as a cluster center. The data points

that are close to new cluster center are penalized in order to facili-

tate the emergence of new cluster centers [72]. From the Eq.(18),

the potential cluster center Pican be obtained at a data pointxias:

Pi Xmj1

exp kxixjk

2

ra2

2 !

18

whereXi= [Xi1,Xi2,. . .,Xin] andXj= [Xj1,Xj2,. . .,Xjn] are data vectors

for input and output dimensions, ra is a positive constant defining

the neighborhood range of the cluster or simply the radius of hyper-

sphere cluster in data space and |||| indicates the Euclidean dis-

tance. ra is a critical parameter that determines the number of

cluster centers or locations. The first cluster center is selected asthe c1 data point with the highest potential value, P

c1. For the sec-

ond cluster center, for determining the new density values, the re-

sult of the first cluster center is subtracted as follows:

Pi PiPc1

exp kxixjk

2

rb2

2 !

; rb gra 19

whererb is a positive constant, which defines a neighborhood that

has measurable reductions in density measure and g indicates a

constant greater than 1 to control and avoid cluster centers beingin too close proximity[73]. From the Eq. (19), the potential mea-

surement will be significantly reduced from data points near the

first cluster center c1. Based on the larger potential value, the data

pointc2is chosen for the second cluster center.

Usually, after determining thekth cluster centerck, according to

the Eq.(20),the potential is revised as:

Pi PiPck

exp kxixkk

2

rb2

2 !

20

where Pk is the largest potential density value and ck denotes the

location of the kth cluster center. After revising the density function,

the next cluster center is selected as the point having the greatest

density value. This process continues until a sufficient number ofclusters is attained at which all points lie within a loop belonging

to a cluster center.

4. Research methodology

Fig. 5 shows the general framework of proposed method with

combination HOSVD for dimensionality reduction and ANFIS com-

bined with subtractive clustering for discovering knowledge from

users ratings and predicting overall ratings.

In the first step, we apply the HOSVD for dimensionality reduc-

tion to reveal the latent associations among the components in the

user-item-criteria tensor. Then, we perform cosine-based similar-

ity for clustering to obtain groups of similar users and determine

labels for clusters. Indeed, by this way high quality clusters are ob-tained that is necessary for developing efficient ANFIS model. Then,

ANFIS is applied on clusters for extracting fuzzy rules and predict-

ing null values in overall ratings. The main tasks of dimensionality

reduction process are reducing the dimension and obtaining best

approximation of data in the tensor of user preferences about

items on multi aspects and finding users with similar preferences

on items and criteria. Measuring the similarity for users based on

their ratings on criteria provides the possibility of applying cluster-

ing method. After applying clustering method that provides the

classes of users with similar taste, ANFIS is used to extract knowl-

edge (fuzzy rules) from determined clusters. To increase the accu-

racy of rule-based system, reduce the amount of data in any class

and minimize overfitting in the training data, subtractive cluster-

ing is combined with ANFIS. Thus, the main steps in the proposedmethod for developing the model in the offline phase are:

Step 1: HOSVD is applied on training data in 3-order tensor for

dimensionality reduction to get the best approximation of rat-

ing information.

Step 2: The approximated data by HOSVD is used for clustering

using cosine-based similarity. In fact, in this step, label for each

vector of ratings is defined to be used in k-NN method in online

phase.

Step 3: ANFIS combined with subtractive clustering is used for

training data in clusters obtained in the previous step for

extracting fuzzy rules and forming rule clusters.

Step 4: The fuzzy rules are used for predicting existing null val-

ues of overall ratings in offline and online phases. It should benoted that for predicting the unknown overall ratings, we


7/21/2019 1-s2.0-S0950705114000173-main

8/20

solved the sparsity problem in criteria using the neighborhood

formation in any clustering. For predicting the unknown criteria

ratings for the target item, we relied on a cosine-based similar-

ity as a similarity measure which was performed on approxi-

mated data obtained by HOSVD.

After learning the model in the offline phase, in the online

phase, the recommender system follows the recommendation

and prediction tasks of multi-criteria CF recommender systems

using the 3 main steps as:

Step 1: Usingk-NN method, recommender system predicts the

class label for new data.

Step 2: Recommender system refers to the corresponding fuzzy

rule cluster and predicts the overall rating for active user (see

Section4.2for more detail).

Step 3: After overall rating prediction, recommender system

forms the neighbors using cosine similarity presented in Eq.

(21)for active user from corresponding cluster and makes pre-dictions and Top-N recommendations.

4.1. Clustering the experimental dataset using HOSVD and improving

the scalability of multi-criteria CF

The multi-criteria CF are needed to quickly produce high quality

recommendations for very large-scale problems. In this paper, we

address the performance issues by scaling up the neighborhood

formation process through the use of dimensionality reduction

techniques. Scalability is an issue in multi-criteria CF because ten-

sor of data is composed of multiple dimensions and the dimension

in itself can be very large. There is no doubt that clustering tech-

niques reduces the sparsity and improves scalability of recom-mender systems: it does this by effectively partitioning the

ratings database. Previous studies[74,75]have indicated the ben-

efits of applying clustering in recommender systems. Using HOSVD

and cosine-similarity approaches, we perform the clustering task

in an effective way for multi-criteria CF.

As discussed earlier, for recommendation task in multi-criteria

CF, recommender systems deal with high-dimensional data and

this phenomenon makes the computational cost extremely high

and even non-feasible for traditional dimensionality reductiontechniques. Given the scalability challenge, in this paper, HOSVD

is able to (1) factorize large tensors efficiently using much less time

than standard methods, while at the same time and (2) obtain low-

rank factors that preserve the main variance of the tensors. Thus,

due to the dimensionality reduction, we can better form and pre-

compute the neighborhood that leads the prediction generation

be much faster in multi-criteria CF and this means that forming

neighborhoods in the low dimensional eigenspace provided better

quality and performance. In addition, after tensor decomposing by

HOSVD, the clustering of data using cosine-based similarity is per-

formed in an effective way and once the clustering is complete, the

performance of multi-criteria CF can be very good, since the size of

the group that must be analyzed is much smaller.

For applying HOSVD, 3-dimensional data is stored in the 3-

dimensional tensor A2 RI1I2I3 , whereby I1 corresponds to the

numberof users,I2corresponds to the number of items which were

rated and I3 is thenumber of used criteria. Each entry of thetensorA

is a number between 1 and 13. Using HOSVD the tensorA2 RI1I2I3

that contains the user ratings about items on four criteria was

decomposed into A2 S1U2V3W in which U2 RI1I1 , V2 RI2I2

andW2 RI3I3 are orthonormal matrices, and S2 RI1I2I3 is a core

tensor which satisfies all-orthogonality and ordering properties.

Similar to the truncated SVD for low-rank approximation and

dimensionality reduction of matrices, low-rank approximation

and dimensionality reduction of higher order tensors can be done

by the truncated HOSVD (but with better approximation and com-

putation), that is, take the firstr1columns ofU, the firstr2columns

ofV, the first r3 columns ofW, andthe top-left r1r2r3 block ofS.

In that direction, for dimensionality reduction for 3 dimensionsdataset, HOSVD is an effectivemethod. It is flexible to choosediffer-

ent rfor different modeof a tensor. The sizeof the datagoes downto

r1r2r3+I1r1+I2r2+I3r3 from I1I2I3, and ifr1=r2=r3 the size of the

data goes down to r3+r(I1+I2+I3). If we flat the tensor into a

I1I2I3matrix, the size of the data only goes down to R2+R(I1+I2I3). Therefore, result of the HOSVD decomposition on 3rd tensor of

users ratings are the matricesU,VandWthatshow the relationsbe-

tween user and user, item and item, and criterion and criterion,

respectively. This decomposition is obtained without splitting the

3-dimensional space into pair relations. For the sake of conciseness,

in the followinga very simpleexamplewithonly4 users 6 items and

4 criteriais demonstrated. Table 2 shows theuserrating foritems by

users based on 4 criteria and its decomposition to thematrices U,V,

W,S(:,:,1),S(:,:,2),S(:,:,3) andS(:,:,4) is shown inTable 3.

Unew U2V3W1Sy1 0:5091 0:2729

As can be seen in the Fig. 6, using cosine similarity in Eq.(21),

the similar users to the new user can be found. The cosine similar-

ity between two vectors A and B can be defined as:

similarity cosA; B AB

kAk kBk

Pni1AiBiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPn

i1Ai2

q

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPni1Bi

2q 21

By applying this method, system is able to cluster data based on

user similarity from matrixU. Sincek-NN predictor requires super-vised learning, cosine-based similarity is selected to obtain clusters

Dimensionality Reducon Using

HOSVD

ClusteringCluster 1 Cluster n

(Cluster 1) IF THEN (Cluster n) IF THEN Fuzzy Rules

Database

Overall Rangs Predicon

Criteria kCriteria 1 Criteria 2 Overall Rang

Mul-Criteria Dataset

ANFIS Combined with Subtracve

ClusteringExtracng Fuzzy Rules

Fig. 5. Proposed model using ANFIS and HOSVD.

http://-/?-http://-/?-

7/21/2019 1-s2.0-S0950705114000173-main

9/20

from approximated data by HOSVD to provide the labels for them.

From the truncated matrixU, the first row of the matrix is selectedand system does cosine similarities calculation through Eq. (21)

with the second row, third row and so on, until it reaches the last

row. The highest value of cosine similarities is clustered with the

first row. Applying this method on rows, the system will obtain

clusters with small number of similar users. With determining a

specific number of clusters, system can combines the close clusters

by calculating cosine-based similarity. Finally, after constructing

the clusters, the system assigns the category label to the each vec-

tor of users ratings. Similar to this procedure, we can obtain the a

specific number of clusters from the matrixV.

4.2. ANFIS architecture of proposed method and solving the sparsity

problem in multi-criteria CF

As discussed earlier, multi-criteria CF recommender systems

suffer from the sparsity problem in two sides, missing values in

the overall and criteria, and the system ought to predict these

missing ratings with new approaches. In this paper, we solve the

problem of sparsity in overall ratings using Neuro-Fuzzy system.

Generating the proper MFs and extracting the fuzzy rules for the

prediction of overall ratings are the main advantages of this meth-

od that can be used in the online and offline phases. Because in the

multi-criteria CF the overall ratings are based on users perception

of items features, thus, we can better alleviate sparsity problem in

the overall ratings using the generated MFs and fuzzy rules ob-

tained from users preference on the items features. In addition,

solving and alleviating the sparsity problem in multi-criteria CF

recommender systems improves the predictive accuracy of these

systems that has been proved in the prior researches[24,53]. Based

on the experimental results, we will also demonstrate that pro-

posed method significantly improve predictive accuracy of multi-

criteria CF. Using ANFIS, we will see that prediction error in overall

ratings is very low and even zero in many cases and this show the

capability of ANFIS in alleviating sparsity problem in an exact and

effective way.In this study, discovering the knowledge (fuzzy rules) from

users ratings and generalizing the relationship Y=f(X1,X2,. . .,Xn)

are the main goal of applying ANFIS for accurate prediction of

overall ratings that accordingly lead to predictive accuracy

improvement in multi-criteria CF. In this relationship,X1,X2,. . .,Xnstands for input variables and Ystands for output variable. In the

current study, overall rating or user overall preference about items

can be determined as a function of items features or criteria. Thus,

we associate the Yvariable to the overall rating and X1,X2,. . .,Xnvariables to the criteria ratings. Predicting the relationship be-

tween inputs and output is one of the important tasks that ANFIS

does. Based on the experimental dataset, the input parameters of

the ANFIS model under consideration are Acting (A), Directing

(D), Story (S) and Visuals (V). Overall rating (O) stands for outputthat is defined as overall preference. These attributes naturally

are vague, imprecise and incomplete fuzzy terms that lead to

uncertainty in user interest about items features such as Acting,

Story, Visuals and Directing. Thus, in ANFIS, they can be introduced

and expressed by fuzzy linguistic values (uncertainty modeling)

such as {cluster 1}, {cluster 2}, {cluster 3} and {cluster 4} that

determine the domain of user interest of Acting, Directing, Story

and Visuals in four regions using MFs. They are given in Fig. 7a

and b for two inputs Visuals and Directing, respectively.

The relationship between input variables (criteria) and outputs

(overall rating) can be defined as

Overall rating fActing;Directing; Story; Visuals 22

In ANFIS models, the output relations are related to the inputs

by mathematical relationships mapping using fuzzy rules. Fuzzy

rules play important role in the ANFIS models and they are back-

bone of such systems. The shape of fuzzy rules in ANFIS is defined

as

Table 2

Multi-criteria ratings for 4 users and 6 items on 4 criteria.

Items

Ratings on criteria 1 Ratings on criteria 2

Users 13 12 11 11 5 5 5 4 4 12 9 5

11 1 11 12 5 5 11 11 11 11 11 10

1 13 4 3 12 13 13 13 5 12 4 4

1 1 0 5 4 5 13 13 13 12 13 13

Ratings on criteria 3 Ratings on criteria 4

5 11 11 10 10 10 11 5 9 4 5 3

4 9 11 11 4 3 3 11 11 12 13 13

11 5 11 11 4 11 9 3 9 3 11 4

9 3 3 9 4 9 3 3 8 4 5 3

New user ratings on four criteria C1,C2, C3and C4C1 C2 C3 C4

I1 5 4 4 11

I2 4 5 11 4

I3 4 3 5 12

I4 13 3 12 4

I5 13 13 13 11

I6 13 12 12 13

Rule 1: IF A is A1 AND Dis B1 AND Sis C1 AND Vis D1 THEN f1=p1 A +q1D +r1S+t1V+ p1Rule 2: IF A is A2 AND Dis B2 AND Sis C2 AND Vis D2 THEN f2=p2 A +q2D +r2S+t2V+ p2

Rule 3: IF A is An AND Dis Bn AND Sis Cn AND Vis Dn THEN fn=pn A +qn D +rn S+tn V+ pn


7/21/2019 1-s2.0-S0950705114000173-main

10/20

For example, in this study, from the users ratings to movies,

ANFIS by training vectors of users ratings in any clusters extracts

the fuzzy rules such as

IFthe Acting of a movie is cluster1 ANDDirecting is cluster1AND

Story is cluster1ANDVisuals is cluster1THENthe Overall Rating is

out1cluster1.

According to the extracted fuzzy rules by ANFIS, the out1cluster1

for overall rating is obtained from the MFs degree of 4 input variables.

Also, using subtractive clustering in ANFIS, system improves the

precision of extracted fuzzy rules obtained from users ratings to

movies and minimizes the overfitting in training the data. It re-

veals the users preferences about items features in soft clusters

and divides the user preferences on items features in fuzzy clus-

ters that system can predict exact relation between any criteria

and overall rating.

To illustrate a simple model of ANFIS applied on multi-criteriaCF, assume the system has two criteria S and Vand one output

along with two fuzzy IFTHEN rules. Fig. 8 shows the first-order

Sugeno FIS, the ANFIS model with two rules.

InFig. 8,SandVindicate the crisp inputs related to nodeiandAiandBiimply the linguistic labels distinguished by appropriate MFs

lAiand lBi , respectively. In this study, ANFIS uses the Guassian MF

as

lAi S e

Sbi 2

2a2i 23

lAi V e

Vbi 2

2a2i 24

Table 3

Generated matrices after applying HOSVD on tensor of users ratings.

S(:,:, 1) S(:,:, 2)

77.91 0.81 0.44 0.85 0.09 0.87 0.52 1.43 2.56 0.55 0.92 2.47

0.99 1.44 1.45 0.72 6.52 1.07 5.74 13.11 2.32 0.17 1.85 0.22

0.17 1.65 3.25 2.19 0.13 1.60 15.63 2.16 3.84 6.91 1.11 3.32

0.50 0.55 1.39 1.06 1.33 0.95 3.96 5.15 5.08 0.01 0.41 0.28

S(:,:, 3) S(:,:, 4)

0.19 2.27 4.17 6.40 0.21 2.11 0.18 0.64 2.73 0.08 3.50 3.54

5.80 3.74 1.87 4.98 2.50 3.21 0.40 9.24 6.96 1.28 3.30 1.39

3.88 6.62 0.73 2.12 4.56 0.98 4.60 1.49 3.76 2.71 0.16 1.64

5.85 6.28 1.63 2.45 1.91 2.11 0.70 1.80 0.77 0.59 3.14 0.26

Matrix U Matrix V

0.49 0.30 0.65 0.51 0.44 0.74 0.17 0.47

0.56 0.60 0.34 0.46 0.60 0.66 0.12 0.43

0.51 0.67 0.29 0.46 0.50 0.06 0.46 0.73

0.43 0.33 0.62 0.56 0.44 0.08 0.86 0.23

Matrix W

0.40 0.65 0.05 0.21 0.61 0.05

0.40 0.29 0.84 0.20 0.08 0.03

0.43 0.20 0.32 0.45 0.59 0.360.46 0.25 0.03 0.74 0.33 0.26

0.38 0.45 0.36 0.25 0.12 0.66

0.38 0.43 0.24 0.31 0.39 0.60

-0.65 -0.6 -0.55 -0.5 -0.45 -0.4 -0.35 -0.3-0.8

-0.6-0.4

-0.2

0

0.2

0.4

0.6

0.8

First Column of U and V

SecondColumnofUandV

U1

U2

U3

U4

I1

I2

I3

I4

I5

I6

Unew

Items

New User

Users

Fig. 6. 2D graph of users and items.

(a) (b)

10 10.5 11 11.5 12 12.5 130

0.2

0.4

0.6

0.8

1

Input Variable " Visual"

DegreeofMembership Cluster 3 Cluster 2 Cluster 1Cluster 4

10 10.5 11 11.5 12 12.5 130

0.2

0.4

0.6

0.8

1

Input Variable "Directing

DegreeofMembership Cluster 1

Cluster 4

Cluster 3 Cluster 2

Fig. 7. Membership functions for (a) Visuals and (b) Directing.

http://-/?-http://-/?-http://-/?-

7/21/2019 1-s2.0-S0950705114000173-main

11/20

where {ai, bi, ci} is the parameter set of the MFs in the premise part

of fuzzy IFTHEN rules that change the shapes of the MFs. Parame-

ters in this layer are referred to as the premise parameters.

From the ANFIS architecture shown inFig. 8, it can be observed

that when the values of the premise parameters are fixed, the over-

all output can be expressed as a linear combination of the conse-

quent parameters. In symbols, the output O can be rewritten as:

O w1

w1w2f1

w1w1w2

f2 . . .wn

wn1wnfn

w1p1Aq1Dr1St1V p1 wnpnAqnD

rnStnV pn

w1Ap1 w1Dq1 w1Sr1 w1Vt1 w1p1

wnApn wnDqn wnSrn wnVtn wnpn 25

which is linear in consequent parameters p i, q i, ri, ti, and pi. Fig. 9

shows the architecture of the implemented ANFIS that consist of

four inputs, four rules, sixteen MFs for inputs and output.

4.2.1. Training the ANFIS and model validation using checking and

testing datasetIn this study, three set of data were used for ANFIS modeling as

training, checking and testing data. ANFIS uses training data for

constructing the model of target system. The rows of training data

are used as inputs and outputs for construction the target model.

Checking data is used for testing generalization capability of the

FIS at each epoch that prevents over-fitting networks and verifies

the identified ANFIS. Similar to the format of training data, the for-

mats for the checking and testing data are defined data but gener-

ally their elements are different from those of the training data.

Any clusters obtained using HOSVD were divided into three

groups. The first group of data including 80% of the total dataset

of clusters was used for the training data and the second group

of data including 10% of the total dataset of clusters was used for

the checking data. The remaining 10% data of clusters was used

for the testing data.

5. Result and discussion

In order to analyse the effectiveness of the proposed method,

several experiments were conducted on Yahoo!Movies datasetprovided by Yahoo! Research Alliance Webscope program

(http://webscope.sandbox.yahoo.com).

On the Yahoo!Movies network, users could rate movies in 4

dimensions (Story, Acting, Direction and Visuals) and assign an

overall rating. Users used a 13-level rating scale for ratings. The

four features for any movies were considered as: C1= Acting,

C2= Story,C3= Visuals andC4= Directing. As can be seen inTable 4,

all users ratings are measured in a value between 1 and 13 in

quantitative scale.

In the experimental dataset there are 257,317 tuples of rating in

the original dataset with 127,829 users and 8272 movies. However,

the resulting ratings tensor is extremely sparse, because many of

the user-item-criteria entries are just empty fields. The sparsity

level of dataset is about 97.57% (sparsity level = 1density = 1 (257,317 100)/(127,829 8272) = 0.9757). That means,

not even 2.43% of all entries in the rating tensor are filled. Similar

to the work by Jannach et al. [24], we pre-processed the datasets

and created the test datasets with different density and quality

levels and applied the proposed method on YM-20-20, YM-10-10,

and YM-5-5. In this form, the description of dataset is presented

inTable 5.

5.1. Performance of HOSVD clustering

Because HOSVD is quickly calculated, HOSVD is applied on the

training tensor AeR15005004, which corresponds to the training

set. As result an approximationeAa1;a2;a3 is retained. The valuesset for a1, a2 and a3 determine the dimensions of the core tensor.

It should be noted that all the experiments in this study wereimplemented using MATLAB and on a Microsoft Windows operat-

ing systems with Intel Core i5 processors having a speed of

2.66 GHz and 4 GB RAM.

For estimating the performance of HOSVD clustering for rank 2,

4, 8, 12, 16, and 20 approximations, we adopt Silhouette coefficient

[76]value as the standard measure for clustering quality and used

it to determine the best cluster formation. The Silhouette coeffi-

cient can assess the quality of a clustering. It is an internal index

that measures how good the clustering fits the original data based

on statistical properties of the clustered data. External indices, by

contrast, measure the quality of a clustering by comparing it with

an external (supervised) labeling. The Silhouette coefficient of an

elementi of a clusterk is defined by the average distance a(i) be-

tween i and the other elements of k (the intra-cluster distance),and the distance b(i) between i and the nearest element in the

nearest cluster (is minimal inter-cluster distance).

Fig. 8. Architecture of implemented ANFIS model for two inputs, one output and

two rules.

Fig. 9. Architecture of the implemented ANFIS.

Table 4

A sample of the multi-criteria dataset from the Yahoo!Movies.

Movie ID User ID Directing Story Visual Acting Overall rating

2 1 1 2 1 2 1

13 13 11 13 13 13

9 13 13 8 9 8

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

13 2 13 13 13 13 13

13 13 11 13 13 12

12 13 13 13 12 13

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

http://www.webscope.sandbox.yahoo.com/http://www.webscope.sandbox.yahoo.com/

7/21/2019 1-s2.0-S0950705114000173-main

12/20

sci bi aimaxfai;big

26

which can be written as:

sci

1 ai=bi; if ai< bi

0; if ai bi

bi=ai 1; if ai> bi

8>: 27An overall score for a set ofnkelements (a cluster or the entire

clustering) is calculated by taking the average of the Silhouette

coefficientssciof all elements i in the set. Thus,SCkcan be defined

as

SCk 1nk

Xnki1

sci 28

The Silhouette coefficient takes values between 1 and 1. The

closer to 1, the better the clustering fits the data. Table 6 lists a

general rule of thumb for interpreting the Silhouette coefficient.

Table 7 shows the average Silhouette Coefficient for HOSVD

clustering for rank 2, 4, 8, 12, 16, and 20 approximations. According

to theTable 7, the highest average Silhouette coefficient for HOSVD

clustering obtained 0.867 for rank 10 approximation. This accuracy

percentage is reasonably good. Based on observation, lower

approximation ranks do better than the high approximation ranks.

This supports our claim that truncated HOSVD gives better results.

5.2. Evaluation of proposed ANFIS model

After cluster analysis, the ANFIS model was applied on one of

the clusters with maximum Silhouette coefficient. In that cluster,

four fuzzy clusters have been determined for the given 190 users

ratings in the third cluster generated by HOSVD method for rank

approximation 12. The number of fuzzy rule set was equal to the

number of cluster centers, each representing the characteristic of

the cluster as given inTable 8.For evaluating the ANFIS model, several measures of accuracy

were used to determine the model capability for predicting the

overall rating. For this reason, the models were evaluated by four

estimators Mean Absolute Percentage Error (MAPE), Root Mean

Square Error (RMSE), Mean Absolute Error (MAE) and coefficient

of determination (R2). These estimators are determined by

MSE

PnO1actualO predictionO

2

n 29

MAPE

PnO1actualO predictionO=actualO

n 30

R2 1

PnO1actualO predictionO

2PnO1actualO actualO

2 31

RMSE

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffiffiPnO1actualO predictionO

2

n

s 32

where actual (O) indicates the real overall rating provided by user,

prediction (O) implies the predicted overall rating value and n cor-

responds to the number of used users ratings.

Usually, in the training process RMSE and MSE measure are

used to test the prediction model, however, in this study, other

performance measures were used to investigate for a more effec-

tive performance evaluation that are coefficient of determination

R2 and MAPE. The coefficient of determination R2 provides a valuebetween[1] about the training of the proposed network. A value

closer to 1 stands for the success of learning. Also, in this study,

MAPE was used that accurately identifies the model deviations.

After implementing the ANFIS model using fuzzy logic toolbox

in MATLAB 7.10.0 software, the training and checking data from

the training and checking dataset were tested for error estimation.

Data from four inputs was given to trained model of ANFIS along

with actual overall ratings. From the inputs value, the suitable

MFs (seeFig. 7(a) and (b)) were selected to predict the overall rat-

ings using the extracted rules (see Table 8). From the fuzzy rule

viewer of established ANFIS model shown in Fig. 10, the process

of overall rating prediction by selecting the MFs can be better visu-

alized. It indicates the behavioral of users over the change in values

of all four inputs for overall rating. From the fuzzy rule viewerabove, when the input parameters of Acting is at 11, Directing at

12, Story at 12, and Visuals at 11, an output of overall rating at

12 is obtained.

Table 9presents errors for a sample of training and checking

dataset. As can be seen the error from nineteen samples in Table 9,

ANFIS model has been trained effectively using training data.

Table 5

Information of Yahoo!Movies dataset.

Name #Users #Items #Overall ratings

YM-20-20 429 491 18,504

YM-10-10 1827 1471 48,026

YM-5-5 5978 3079 82,599

Table 6

Rule of thumb for the interpretation of the Silhouette coefficient.

Range Interpretation

>0.70 Strong structure has been found

0.500.70 Reasonable structure has been found

0.250.50 The str ucture is weak and could be artificial

7/21/2019 1-s2.0-S0950705114000173-main

13/20

For subtractive clustering, the parameters were defined by a

trial and error approach as: range of influence: accept ratio: 0.5, re-

ject ratio: 0.15 and 0.5 and squash factor: 1.25. However, we could

test the effect of the two variables raand rbthat represent a radius

of neighborhood on the training, checking and test data for overall

rating prediction error. The error was estimated in lowest value for

therb= 1.5raand the results of varying rafrom 0.3 and 0.8 for the

radius of neighborhood.Fig. 11presents the overall rating predic-tion error of checking and training for nineteen samples.

In this study, the average error for checking data was equal to

0.0001904. After 200 epochs, the averages RMSE, MSE, MAPE and

R2 were calculated 0.02144, 0.00912, 0.18230 and 0.82460, respec-

tively. The average error for training data was equal to

0.000162221. After 200 epochs, the averages RMSE, MSE, MAPE

and R2 were calculated 0.01272, 0.00912, 0.18230 and 0.99460,

respectively. Also, after 200 epochs, the average error for testing

data was equal to 0.000172361. The averages RMSE, MSE, MAPEand R2 were calculated 0.01951, 0.00949, 0.10230 and 0.91150,

respectively. Average training and checking error after 200 epochs

are shown inFig. 12.

Fig. 13illustrates the interdependency of four inputs parame-

ters and the overall rating obtained from the fuzzy rules generated

by ANFIS combined with subtractive clustering through control

surface. The level of overall rating can be depicted as a continuous

function of its input parameters as Acting, Directing, Story and

Visuals. The surface plots in this figure depict the variation of over-

all rating based on identified fuzzy rules.Fig. 13(a) shows the inter-

dependency of overall rating on Directing and Acting. Fig. 13(b)

depicts interdependency of overall rating on Acting and Story.

Fig. 13(c) shows interdependency of overall rating on Visuals and

Acting. Fig. 13(d) depicts interdependency of overall rating on

Story and Directing.Fig. 13(e) depicts interdependency of overall

Fig. 10. Fuzzy rule viewer for input and output variables of ANFIS model.

Table 9Training and checking errors for prediction overall ratings by ANFIS.

S ample # Training d at a Training AN FI S out put Training e rror (%) Che ck ing da ta Che cking AN FI S outp ut Che cking e rror ( %)

1 12 12 0 11 11.01 0.01

2 10 10.0001 0.0001 12 12.009 0.009

3 13 13 0 10 10.009 0.009

4 12 12 0 12 12.0001 0.0001

5 12 12 0 11 11.0008 0.0008

6 13 13 0 11 11.009323 0.009323

7 12 12 0 12 12.0004 0.0004

8 13 13 0 12 12.00383 0.00383

9 12 12 0 12 11.998 0.002

10 12 12 0 13 12.999998 0.000002

11 12 12 0 11 11.003 0.003

12 10 10.0013 0.0013 11 11.0005 0.0005

13 13 13 0 12 12.00276 0.00276

14 12 12 0 12 12.0003 0.0003

15 11 10.9999 -0.0001 12 11.9917 0.0083

16 12 12 0 12 11.99346 0.00654

17 12 12 0 11 10.99299 0.00701

18 13 13 0 10 10.009 0.009

19 12 12 0 11 10.9901 0.0099

0 5 10 15 20

-0.01

-0.005

0

0.005

0.01

0.015

Number of Samples

Predictionerror

Training Error

Checking Error

Fig. 11. Training and checking error for nineteen samples in the dataset.


7/21/2019 1-s2.0-S0950705114000173-main

14/20

rating on Visuals and Directing andFig. 13(f) shows interdepen-

dency of overall rating on Story and Visuals.

These surface plots exactly show the users perception and

behaviors on any two features of items in the cluster of users with

similar preferences. In addition, the results depicted in the surfaceplots are valuable to reveal users behavior about items features in

multi-criteria CF. Thus, theusers preferences in anycluster of users

can be modeled by ANFIS and recommender system can recognize

whichitem features(criteria)in which level is tailored to their pref-

erences. Also,the several curves presented in Fig. 14(ad) reveal dis-

tinctly the user behavior on any feature of items. As can be seen in

these curves, there is a significant increase for overall rating versus

Story criteria in relation to the other criteria. It can be inferred that

Story criteria is most important for users in that cluster.

5.3. Multi-criteria CF evaluation

In this section, we completely focus on multi-criteria CF recom-

mendation using proposed method. As mentioned before, we used

k-NN for classification data and also we stated that selectingk and

distance metric are important ink-NN method for accuracy of clas-

sification. Therefore, in this study, the optimal distance metric and

k were chosen using cross-validation [77]. Thus, classifier could

accurately predict the testing data. Five-fold cross-validation

method has been applied to choice the type of distance metric

and bestk value.

Using five-fold cross-validation approach, for valuesk = 1,k= 3,

k= 5 andk= 7 and three different methods of calculating the near-

est distance (Euclidean), correlation and City-Block, the result of

averaged classification accuracy presented in Table 10. From

Table 10, the highest averaged classification accuracy is obtained

about 98.91% using Euclidean distance metric for k = 5 in compar-

ison to the City-Block (95.89%) and Correlation (96.76%) distance

metrics. Also, using Euclidean method, the averaged classificationrate is higher than Correlation and City-Block methods for all val-

ues ofk. Thus, based on this result, we established the optimal va-

lue 5 obtained using five-fold cross-validation and Euclidean for

distance metric for classification accuracy.

We determined the precision respectively the recall of the Top-

N list of each element in the test set and build the arithmetic mean

of these values. The recommenders prediction accuracy was mea-

sured by RMSE[78], which is a widely used metric for evaluating

the statistical accuracy of recommendation algorithms, given by

RMSE

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffiffiffiffiffi1

jXj

Xui;oj 2X

jaijpijj2

s 33

where X = {(ui, oj)|uihad ratedojin the probe set}. A lower value ofRMSE indicates a higher accuracy of the recommendation system.

Table 11presents the RMSE obtained from proposed approach on

YM-5-5 (each movie has at least 5 ratings), YM-10-10 (each movie

has at least 10) ratings and YM-20-20 (each movie has at least 20

ratings).Fig. 15shows the prediction accuracy for different neigh-

borhood size on datasets YM-5-5, YM-10-10 and YM-20-20.

To compare the proposed method with the HOSVD, truncated

SVD and some stat-of-the-art approaches in multi-criteria CF, we

employ the recall and precision metrics, which are widely used

in recommender systems to evaluate the quality of recommenda-

tions[79,80]. Precision is the ratio of relevant items recommended

to total number of items recommended. Recall is the ratio of rele-

vant items recommended to total number of relevant items that

exist. The two measures are inversely related and are dependent

on the length of the recommendation list. The longer the recom-

mendation list, the easier it becomes to achieve high recall, but

the more difficult it becomes to achieve good precision. The F mea-

sure is the weighted harmonic mean that combines both precision

and recall[24].

Recall Number of correctly recommended itemsNumber of interesting items

34

PrecisionNumber of correctly recommended items

Number of recommended items 35

where items of interest to a customeru refer to products in the test

set that were purchased byu, and correctly recommended items are

items that match the items of interest. Although these measures are

simple to compute and intuitively appealing, they are in conflict be-

cause increasing the size of the recommendation set improves the

recall at the expense of reducing the precision[8].

The F1-metric [24,79], which combines precision and recall, is

also widely used to evaluate the quality of recommendations. Spe-

cifically, the trade-off between precision and recall is balancedusing this measure by assigning equal weights to both metrics.

Therefore, we use the F1-metric in our evaluation, as shown in

Eq.(36).

F12 Recall Precision

Recall Precision 36

We ran the experiments on datasets YM-10-10 and YM-20-20

datasets forNequal 1, 5, 7, 15, 25, 35 and 40, where Nis the num-

ber of items to be recommended by the Top-N recommender

systems.

From all the two F1 curves inFigs. 16 and 17, we can notice that

the proposed method gives high level of accuracy when the size of

neighbors is increased versus the Top-N recommendation. This

outcome demonstrates the significance of combining HOSVDmethod and ANFIS with subtractive clustering for overcoming

Fig. 12. The error of each observation for checking and training data.

http://-/?-http://-/?-http://-/?-http://-/?-

7/21/2019 1-s2.0-S0950705114000173-main

15/20

the problems connected to the multi-criteria CF. The results above

clearly reveal that the proposed method gives better result for YM-

20-20. In theFigs. 16 and 17, the significant changes in accuracymeasured by F1 between neighbor size 15 and 25 indicates that

high accuracy is obtained for large neighborhood compared with

the small neighborhood. These outcomes according to the experi-

ments are related to result of clustering and extracting fuzzy rulesfrom YM-20-20 and YM-10-10 datasets.

Fig. 13. Interdependency of overall rating on (a) Directing and Acting, (b) Acting and Story, (c) Visuals and Acting, (d) Story and Directing, (e) Visuals and Directing, and (f)

Story and Visuals.


7/21/2019 1-s2.0-S0950705114000173-main

16/20

In order to compare the proposed method with previous work

[23,24,52], we also evaluated our approach on the YM-10-10 using

an additional set of metrics. In theTable 12, we report Precision@5

and Precision@7 values as well as the Mean Absolute Error (MAE).

We also performed SVD and HOSVD techniques without using AN-

FIS with subtractive clustering on YM-10-10 and YM-20-20 data-

sets; the results are presented in Table 13.

The MAE is determined as the average absolute deviation be-tween predicted ratings and true ratings shown in Eq. (37).

MAEpred; act XNi1

predu;iactu;i

N

37whereNis the number of items on which a useru has expressed an

opinion.

From the results, we can find that the precision at Top-5 and

Top-7 of the proposed method outperforms the algorithms in the

previous work and methods using solely HOSVD and SVD.

In order to compare proposed method with MC-SeCF developed

by Shambour and Lu[53]evaluated on MovieLens dataset, we also

evaluated our approach on YM-20-20 and YM-10-10 using MAE

metric for different neighborhood size. The MAE comparison isshown inFig. 18. When looking the curves in this figure, the signif-

icant improvement in recommendation accuracy is obtained in the

large neighborhood sizes. We present the recommendation accu-

racy using MAE for YM-20-20 and YM-10-10 in Table 14. From

Fig. 18andTable 14, it can be observed that the proposed method

slightly better improves recommendation accuracy on the YM-20-

20 for large neighborhood sizes. Compared with the MC-SeCF, the

MAE values for our method is slightly higher than the MAE values

of MC-SeCF. However, quite interestingly, on the YM-20-20 data-

set, better recommendation accuracy is obtained by our method.

This indicates that on the on the YM-20-20, the accuracy is rela-

tively high because of discovering better fuzzy rules. Also, as men-

tioned earlier, using semantic information of items can be

incorporated to the multi-criteria CF for obtaining more accuraterecommendations.

Fig. 14. Curves for revealing the relationship between overall rating and (a) Visuals, (b) Directing, (c) Story, and (d) Acting.

Table 10

Averaged classification accuracy for distance metrics and values ofk .

Distance metric k= 1 k= 3 k= 5 k= 7 k= 9

Euclidean 96.63 97.34 98.91 97.67 95.56

City block 94.72 94.87 95.89 93.89 93.73

Correlation 95.28 95.38 96.76 94.88 94.87

Table 11

Coverage and RMSE for YM-5-5, YM-10-10 and YM-20-20.

Size of neighborhood Dataset

YM-5-5 YM-10-10 YM-20-20

RMSE RMSE RMSE

5 0.551097 0.5365 0.53158

10 0.549707 0.5310 0.52988

15 0.544308 0.5289 0.52039

20 0.538909 0.5200 0.51558

25 0.530209 0.5184 0.51129

30 0.528609 0.5174 0.50349

5 15 25 35 45 550.5

0.51

0.52

0.53

0.54

0.55

0.56

0.57

0.58

Neighborhood Size

RMSE

YM 5-5

YM 10-10

YM 20-20

Fig. 15. RMSE and neighborhood size.

http://-/?-http://-/?-http://-/?-http://-/?-

7/21/2019 1-s2.0-S0950705114000173-main

17/20

Also, according to the rank 12 approximation defined for HOS-

VD decomposition, we employed the precision on different number

of clusters. For rank 12 approximation, the defined number of clus-

ters was changed iteratively: starting from the 3 clusters, after

each iteration number of clusters was increased by 3 until 12.

Fig. 19 illustrates the precision value for Precision@5 and Preci-

sion@7 versus number of clusters.

As can be seen inFig. 19, the worst precision is obtained for YM-10-10 at precision@5 in the third cluster and the best precision is

achieved for YM-20-20 at Precision@5 in the twelfth cluster. This

result demonstrates for YM-10-10 and YM-20-20 that the preci-

sion is increased with increasing the number of clusters.

To experimentally show the effectiveness of clustering using

HOSVD and cosine-based similarity, we performed the experi-

ments on similarity-based approach developed by Adomavicius

and Kwon [23] and compared with the proposed method. They

proposed different potential ways to calculate the similarity be-

tween users based on their criteria ratings. It should be noted that

Chebyshev distance metric performed best among their similarity-

based approaches.

Fig. 20presents the performance results of our experiments for

proposed method and similarity-based approach using Chebyshev

distance metric. The throughput is plotted as a function of thecluster size demonstrated in Fig. 20. We define throughput of a

1 5 7 15 25 35 400.77

0.78

0.79

0.80.81

0.82

0.83

0.84

0.85

0.86

Top-N

F

measure

5 Neighbors

15 Neighbors

25 Neighbors

35 Neighbors

Fig. 16. F1 measure and Top-N recommendation for YM-10-10.

1 5 7 15 25 35 400.76

0.78

0.8

0.82

0.84

0.86

Top-N

Fmeasure

5 Neighbors

15 Neighbors

25 Neighbors

35 Neighbors

Fig. 17. F1 measure and Top-N recommendation for YM-20-20.

Table 12

MAE, precision at Top-5 and Top-7 of proposed method HOSVD and truncated SVD for

YM-10-10 (neighborhood size: all users).

Algorithm Precision@5 Precision@7 MAE

HOSVD 75.34 72.85 1.17

Truncated SVD 74.03 72.19 1.75

HOSVD-ANFIS and subtractive

clustering

81.44 80.78 0.96

Table 13

MAE, precision at Top-5 and Top-7 for proposed method, HOSVD and truncated SVD

for YM-20-20 (neighborhood size: all users).

Algorithm Precision@5 Precision@7 MAE

HOSVD 78.57 76.43 0.95

Truncated SVD 75.12 73.21 1.45

HOSVD-ANFIS and subtractive

clustering

83.34 81.32 0.91

10 20 30 40 50 70 900.66

0.67

0.68

0.69

0.7

0.71

0.72

0.73

Size of Neighbors

MAE(%)

YM-10-10

YM-20-20

Fig. 18. Recommendation accuracy for different neighborhood sizes on YM-20-20

and YM-10-10.

Table 14

Recommendation accuracy using MAE for different neighborhood size.

Neighborhood size MAE(%) YM-10-10 MAE(%) YM-20-20

10 0.7325 0.7105

20 0.7370 0.7088

30 0.7249 0.7093

40 0.7260 0.7015

50 0.7184 0.6902

70 0.7112 0.6724

90 0.7180 0.6688

4 6 8 10 12

0.72

0.74

0.76

0.78

0.8

0.82

0.84

0.86

0.88

Number of Clusters

Precision

Precision @5 YM-20-20




Fig. 19. Precision versus number of clusters in Precision@5 and Precision@7 for

different dataset.


7/21/2019 1-s2.0-S0950705114000173-main

18/20

multi-criteria CF recommender system as the number of recom-

mendations generated per second for k selected user (k= 5). From

the curves in this plot, we see that using the HOSVD and cosine-based approaches for clustering the high-dimensional data, the

throughput is substantially higher than the multi-criteria CF based

on similarity-based approach. This is due to the reason that with

the clustered approach using HOSVD and cosine-based similarity

the prediction algorithm uses a fraction of neighbors. The through-

put of multi-criteria recommender system increases rapidly with

the increase in the number of clusters with the small sizes. Since

the multi-criteria CF based on similarity approach has to scan

through all the neighbors, the number of clusters does not impact

the throughput.

We also evaluated the recommendation quality using coverage

measures. Coverage measures the percentage of items for which a

CF system can provide a prediction or that ever appear in a recom-

mendation list[81]. It should be noted that a recommender systemmaintains a good level of coverage so that most of the items are

connected in some way to the rest of the data, otherwise they will

be isolated and essentially dormant in the system.

The curves shown inFig. 21present the quality of the recom-

mendation of proposed method and reveals that the coverage is

strongly related to the neighborhood size. Table 15 presents the

coverage obtained from the proposed method. To experimentally

show the effectiveness of clustering using HOSVD and cosine-

based similarity on coverage, we also performed the experiments

on similarity-based approach as presented in Table 15.

From theTable 15, the proposed method maintains a good level

of coverage in relation to the similarity-based approach on differ-

ent neighborhood sizes. In addition, the results also confirm that

proposed method and similarity-based approach have good cover-

age on YM-20-20.

6. Conclusion and future work

In this paper, a new method was proposed using a combination

of HOSVD and ANFIS combined with subtractive clustering to im-

prove the recommendation quality and predictive accuracy of mul-

ti-criteria CF. We proposed this method for overcoming the

existing shortcomings such as predicting the overall ratings, spar-

sity, scalability and uncertainty induced from vagueness and

imprecision in representing and reasoning items features in mul-

ti-criteria CF.

Using HOSVD, we reduced the noise of high-dimensional data

effectively and improved the scalability problem. Also, by HOSVD,

we considered all factors in the third-order tensor of user, item and

criteria all together to reveal latent relationships between them.

The results of applying HOSVD method on the high-dimensional

dataset assist us to have clusters with high quality using cosine-based similarity. In addition, tensor decomposition using HOSVD

on the experimental dataset demonstrated its advantages in case

of dimensionality reduction in more than two dimensions for

obtaining favorable approximation of information. From the exper-

iments, we observed that proposed method using HOSVD and AN-

FIS achieves better recommendation accuracy in relation to the

algorithms in the previous work and methods using solely SVD

and HOSVD.

The experimental results on movie dataset clearly demon-

strated the capability of ANFIS modeling using MFs and fuzzy rules

without the human expert intervention in multi-criteria CF. Be-

sides, the model of ANFIS combined with subtractive was used to

extract knowledge from user ratings and preferences on items fea-

tures. This was done by incorporating the element of training intothe existing Neuro-Fuzzy system. Furthermore, with the training

data of ANFIS, the rules and the MFs were properly tuned to predict

the unknown overall ratings for alleviating the sparsity problem

which have advantages in terms of the simplicity of the algorithm

and the speed of the training convergence. Moreover, users ratings

on items in multi-criteria CF are accumulated overtime and fuzzy

rules can be amended and maintained in rules database for predic-

tion tasks. The advantage of this method is its flexibility and

extendibility in which can be developed for any number of dimen-

sions and criteria/features the dataset.

We analysed the predictive accuracy of proposed method on a

real-world dataset in the domain of movie recommendation pro-

vided by Yahoo!Movie. We used the popular measurement met-

rics: the F1, RSME, MAE and the coverage. The proposed methodwas evaluated in cases of MAE, Precision@5 and Precision@7 using

3 6 9 12

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

Number of clusters

Throughput

(Recs./Sec)

Similarity-Based Approach YM-20-20

Similarity-Based Approach YM-10-10

HOSVD and Cosine-Based YM-20-20

HOSVD and Cosine-Based YM-10-10

Fig. 20. Throughput of proposed method versus similarity-based approach.

5 15 25 35 45 55

0.992

0.994

0.996

0.998

1

1.002

Neghiborhood Size

Coverage

YM 20-20

YM 10-10

YM 5-5

Fig. 21. Neigh

1-s2.0-S0950705114000173-main

Documents

Transcript of 1-s2.0-S0950705114000173-main