1-s2.0-S0950705114000173-main
description
Transcript of 1-s2.0-S0950705114000173-main
-
7/21/2019 1-s2.0-S0950705114000173-main
1/20
Multi-criteria collaborative filtering with high accuracy using higher
order singular value decomposition and Neuro-Fuzzy system
Mehrbakhsh Nilashi , Othman bin Ibrahim, Norafida Ithnin
Faculty of Computing, Universiti Teknologi Malaysia, 81310 Skudai, Johor, Malaysia
a r t i c l e i n f o
Article history:Received 8 April 2013
Received in revised form 3 January 2014
Accepted 6 January 2014
Available online 10 January 2014
Keywords:
Neuro-Fuzzy inference system
Higher order singular value decomposition
Subtractive clustering
Sparsity
Scalability
Multi-criteria collaborative filtering
a b s t r a c t
Collaborative Filtering (CF) is the most widely used prediction technique in recommender systems. Itmakes recommendations based on ratings that users have assigned to items. Most of the current CF rec-
ommender systems maintain only single user ratings inside the user-item ratings matrix. Multi-criteria
based CF presents a possibility of providing accurate recommendations by considering the user prefer-
ences in multi aspects of items. However, in the multi-criteria CF, the user behavior about items features
is frequently subjective, imprecise and vague. These in turn induce uncertainty in reasoning and repre-
sentation of items features that exactly cannot be solved using crisp machine learning techniques. In
contrast, approaches such as fuzzy methods instead of crisp methods can better solve the issue of uncer-
tainty. In addition, fuzzy methods can predict the users preference more accurately and even better alle-
viate the sparsity problem in overall rating by considering user perception about items features. Apart
from this, in the multi-criteria CF, users provide the ratings on different aspects (criteria) of an item in
new dimensions; thereby, increasing the scalability problem. Appropriate dimensionality reduction tech-
niques are thus needed to capture the high dimensions all together without reducing them into lower
dimensions to reveal the latent associations among the components. This study presents a new model
for multi-criteria CF using Adaptive Neuro-Fuzzy Inference System (ANFIS) combined with subtractive
clustering and Higher Order Singular Value Decomposition (HOSVD). HOSVD is used for dimensionality
reduction for improving the scalability problem and ANFIS is used for extracting fuzzy rules from theexperimental dataset, alleviating the sparsity problems in overall ratings and representing and reasoning
the users behavior on items features. Experimental results on real-world dataset show that combination
of two techniques remarkably improves the predictive accuracy and recommendation quality of multi-
criteria CF.
2014 Elsevier B.V. All rights reserved.
1. Introduction
During the last decade the amount of information available on-
line increased exponentially and information overload problem has
become one of the major challenges faced by information retrieval
and information filtering systems. Recommender systems are one
solution to the information overload problem. In the mid-1990s,recommender systems became active in the research domain when
the focus was shifted to recommendation problems by researchers
that explicitly rely on user rating structure and also emerged as an
independent research area[1].
Recommender systems based on Collaborative Filtering (CF) are
particularly popular and used by large online[24]. CF algorithms
can be divided into two categories: memory-based algorithms and
model based algorithms [3,5,6]. Memory-based (or heuristic-
based) methods, such as correlation analysis and vector similarity,
search the user database for user profiles that are similar to the
profile of the active user that the recommendation is made for
[7]. Heuristic-based approaches are classed into user-based and
item-based approaches [6,8]. User-based CF has been the most
popular and commonly used (memory-based) CF strategy [9]. It
is based on the premise that similar users will like similar items.
Item-based CF was first proposed by [10] as an alternative styleof CF that avoids the scalability bottleneck associated with the tra-
ditional user-based algorithm. The bottleneck arises from the
search for neighbors in a population of users that is continuously
growing. In item-based CF, similarities are calculated between
items rather than between users, the intuition being that a user
will be interested in items which are similar to items he has liked
in the past. Two of the most popular approaches to computing sim-
ilarities between users and items are the Pearson correlation coef-
ficient and cosine-based coefficients.
One of the main problems in the recommender systems specif-
ically CF is known as the sparsity problem [1114]. Also, memory
based CF approaches suffer from the scalability problem. Therefore,
0950-7051/$ - see front matter 2014 Elsevier B.V. All rights reserved.http://dx.doi.org/10.1016/j.knosys.2014.01.006
Corresponding author. Tel.: +60 197608281.
E-mail address:[email protected](M. Nilashi).
Knowledge-Based Systems 60 (2014) 82101
Contents lists available at ScienceDirect
Knowledge-Based Systems
j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / k n o s y s
http://dx.doi.org/10.1016/j.knosys.2014.01.006mailto:[email protected]://dx.doi.org/10.1016/j.knosys.2014.01.006http://www.sciencedirect.com/science/journal/09507051http://www.elsevier.com/locate/knosyshttp://www.elsevier.com/locate/knosyshttp://www.sciencedirect.com/science/journal/09507051http://dx.doi.org/10.1016/j.knosys.2014.01.006mailto:[email protected]://dx.doi.org/10.1016/j.knosys.2014.01.006http://crossmark.crossref.org/dialog/?doi=10.1016/j.knosys.2014.01.006&domain=pdf -
7/21/2019 1-s2.0-S0950705114000173-main
2/20
-
7/21/2019 1-s2.0-S0950705114000173-main
3/20
applied on tensors with more than 3 dimensions. This can be one of
the main advantages of HOSVD which make it flexible and effective
approach for multi-criteria CF where other traditional machine
learning techniques have failed. It should be noted that using HOS-
VD the computation time for decomposition procedure is high
when the tensor order is increased. However, it can be done in
the offline phase and with incremental learning for data approxi-
mation procedure in the online phase.In the proposed model, ANFIS aims to extract knowledge (rules)
from the users ratings in multi aspect to be used in overall rating
prediction task. The extracted rules is employed for predicting un-
known ratings for alleviating sparsity problem in overall rating and
also revealing the real level of user preferences on items features.
The ANFIS provides flexible structure of defined problem that is
suitable for generating stipulated inputoutput pairs using a set
of induced fuzzy IFTHEN rules with appropriate and varied MFs
[27]. The produced Fuzzy Inference System (FIS) is served to pre-
dict user overall preferences about items features with proper
training. The elements of this model are a fuzzy set, a neural net-
work and data clustering. In addition, non-stochastic uncertainty
emerging from vagueness and imprecision is handled using ANFIS.
The MFs produced by ANFIS is used for representation and reason-
ing users behavior of providing rating according to their percep-
tion about items features. The MFs formed by ANFIS are
continuous and more accurate in representing the features of items
and user feedbacks. Furthermore, to prevent the problem of over-
fitting discussed in the previous researches [24,28], subtractive
clustering is applied to minimize overfitting by fine-tuning the AN-
FIS models and also the checking set is used to solve this problem
in the training data.
In the context of product recommendation, in practical appli-
cations and situations, customers are interested in rating the
items or express their preferences in linguistic terms, such as
{low interest}, {high interest} or {no interest} for the item fea-
tures. This gives a suggestion to design multi-criteria CF to be
user-friendly and convenient for users in giving ratings to items.
Therefore, for multi-criteria CF, the fuzzy logic and fuzzy set ismore appropriate in human linguistic reasoning with imprecise
concepts in relation to the crisp approaches. In addition, linguis-
tic terms are more suitable than numerical values in assessing
qualitative information, which is usually related to the human
perceptions, opinions and tastes. Hence, in multi-criteria CF, it
is more appropriate that the linguistic terms be considered for
users to express their preferences, knowledge and personal judg-
ments [29]. From this perspective, we can define users degrees
of preference regarding a particular item in a set of linguistic
terms such as {low interest}, {high interest} or {no interest} for
the feature of items. Furthermore, fuzzy approach provides a
way to quantify the non-stochastic uncertainty that is induced
from imprecision, vagueness, and subjectivity. Modeling with
fuzzy approach is more reliable than traditional statistical meth-ods such as Bayesian method which handles uncertainty due to
randomness. Moreover, the discovered fuzzy rules from the
users ratings through ANFIS can maintain in the rules database
to be used in the next predictions for items recommendation.
These properties promise to provide the framework for address-
ing the representation and inference challenges in multi-criteria
CF research.
In this study, we consider the proposed method for movie do-
main recommender systems. However, the method can also be
adopted for e-business and e-government applications recom-
mender systems such as recommender systems developed by
Zhang et al. [30]and Shambour and Lu[31,32]for e-business and
e-government applications, respectively.
Finally, we perform an in-depth experimental evaluation, whichthe user rating about items in multi aspects obtained from
Yahoo!Movies network and several comparisons are conducted be-
tween our method and other algorithms.
Thus, in comparison with research efforts found in the litera-
ture, our work has the following differences. In this research:
A new hybrid recommendation model using HOSVD and Neuro-
Fuzzy techniques is proposed for increasing the predictive accu-
racy and improving the scalability of the multi-criteria CF. Sparsity issue in overall ratings is solved using Neuro-Fuzzy
technique.
HOSVD is used for scalability improvement.
The remainderof this paper is organized as follows: In Section 2,
research background and related work are described. HOSVD
dimensional reduction technique,k-Nearest Neighbor (k-NN) Clas-
sifier, ANFIS and subtractive clustering are introduced in the sepa-
rate subsections in Sections3. Section4 provides an overview of
research methodology. Section 5 presents the result and discus-
sion. Finally, conclusions and future work is presented in Section6.
2. Research background and related work
In the area of personalized web search, Sun et al. [33]proposed
Cube singular value decomposition (CubeSVD) to improve Web
Search. Based on their CubeSVD analysis, which also used HOSVD
technique, web search activities carried out more efficiently. They
evaluated the method on MSN search engine data. In the field of
recommender systems, several recommendation models have been
proposed which have used three dimensional tensors for recom-
mending music, objects and tags. Recommender models, using
HOSVD for dimension reduction have been proposed for recom-
mending personalized music[22]and tags[34]. Xu et al.[35]used
HOSVD to provide item recommendations. Their work was com-
pared with a standard CF algorithm, without focusing in tag recom-
mendations. Leginus et al. [36] utilized clustering techniques for
reducing tag space that improved the quality of recommendationsand also the execution time of the factorization and decreases the
memory demands. Their proposed method was adaptable with
HOSVD. They also introduced a heuristic method to speed-up
parameters tuning process for HOSVD recommenders. Symeonidis
et al.[37]introduced a recommender based on HOSVD where each
tagging activity for a given item from a particular user is repre-
sented by value 1 in the initial tensor, all other cases were repre-
sented with 0. Li et al. [38] presented a multi-criteria rating
approach to improve personalized services in mobile commerce
using Multi-linear Singular Value Decomposition (MSVD). The
aim of their paper was to exploit context information about the
user as well as multi-criteria ratings in the recommendation
process.
The fuzzy logic field has grown considerably in a number ofapplications across a wide variety of domains like in the semantic
music recommendation system [39] and product recommenda-
tions[40]. Castellano et al.[41]developed a Neuro-Fuzzy strategy
combined with soft computing approaches for recommending Uni-
form Resource Locators (URLs) to the active users. They used fuzzy
clustering for creating user profile considering the similar brows-
ing behavior. de Campos et al.[42]proposed a model by combining
Bayesian network for governing the relationships between the
users and fuzzy set theory for presenting the vagueness in the
description of users ratings. A conceptual framework based on fuz-
zy logic-based was proposed by Yager[43]to represent and justify
the recommendation rules. In the proposed framework, an internal
description of the items was used that relied solely on the prefer-
ences of the active user. Carbo and Molina[44]developed an algo-rithm based on CF that ratings and recommendations were
84 M. Nilashi et al. / Knowledge-Based Systems 60 (2014) 82101
-
7/21/2019 1-s2.0-S0950705114000173-main
4/20
considered as linguistic labels by using fuzzy sets. A model pro-
posed by Pinto et al. [45] that combined fuzzy numbers, product
positioning (from marketing theory) and item-based CF. Zhang
et al. [30] developed a hybrid recommendation approach with
combination of user-based and item-based CF techniques using
fuzzy set techniques and applied it to mobile product and service
recommendation. They tested the prediction accuracy of their hy-
brid recommendation approach using MovieLens 100 K dataset.In case of multi-criteria CF, few researches has been conducted
to develop the similarity calculation of the traditional memory-
based CF approach to investigate multi-criteria rating [23,46,47]
that the similarities between users are estimated through aggre-
gating traditional similarities from individual criteria or applying
multidimensional distance metrics. Aggregation function approach
was seen by Adomavicius and Kwon [23] as the overall rating r0can serve as an aggregate of multi-criteria ratings. With all this
presumption, this method finds aggregation function f represent-
ing the connection between overall and multi-criteria ratings as:
r0 fr1; . . . ; rk 1
In order to developing the idea of Adomavicius and Kwon [23],
Sahoo et al. [48,49] extended the flexible mixture model (FMM)
developedby Si andJin [50] to multi-criteria recommendersystems.
The assumption of FMM is that two latent variables Zuand Zi(for
customers and products) provide just one rating ur of user u on item
i. They discovered the dependency framework of the overall rating
(r0) and multi-criteria ratings (r0,r1,r2, and r4). Liu et al. [51]pre-
sented a multi-criteria recommendation approach which was based
on the clustering of users. Their idea was that for each user one of
the criteria is dominant and users are grouped according to their
criteriapreferences. Theyapplied linear least squaresregression, as-
sign each user to one cluster, and evaluated different schemes for
the generation of predictions. They applied the methods on hotel
domain dataset with five criteria, Value, Location, Rooms, Service
and Cleanliness. Zhang et al. [52] proposed two types of multi-crite-
ria probabilistic latent semantic analysis algorithms extended from
the single-rating version. First, the mixture of multi-variate Gauss-ian distribution was assumed to be the underlying distribution of
multi-criteria ratings of each user. Second, they further assumed
the mixture of the linear Gaussian regression model as the underly-
ing distributionof multi-criteria ratings of eachuser, inspired by the
Bayesian network and linear regression.
Shambour and Lu [53] implemented a hybrid Multi-Criteria
Semantic enhanced CF (MC-SeCF) approach to alleviate limitations
such as sparsity and cold-start of the item-based CF techniques.
The experimental results on MovieLens dataset demonstrated the
effectiveness of their proposed approach in alleviating the sparsity
and cold-start items problems. They achieved high accuracy and
more coverage in very sparse and new items datasets than the
benchmark item-based CF recommendation algorithms. In the pro-
posed method for building a model using HOSVD and ANFIS, theexplicit ratings are needed. However, based on Nielsens 90-9-1
principle[54] more people will lurk in a virtual community than
will participate. Hence, with considering the Nielsens 90-9-1 prin-
ciple, appropriate and domesticated strategies are required to be
incorporated in multi-criteria CF such as developed method by
Shambour and Lu[53]which uses semantic information of items.
Generally, we view the MC-SeCF approach to be complementary
to our method. An opportunity for future work is therefore to com-
bine the predictions of such MC-SeCF approach with our method in
a hybrid approach. With respect to the achieved improvements by
Shambour and Lu [53], the major problems such as sparsity and
cold-start can be remarkably alleviated. These can be suggestions
that methods proposed by Shambour and Lu[53]and Kernel-SVD
[55,56]combined with HOSVD can be incorporated into multi-cri-teria CF to address the sparseness problem.
Jannach et al.[24]further developed the accuracy of multi-cri-
teria CF by proposing a method using Support Vector Regression
(SVR) for automatically detecting the existing relationships be-
tween detailed item ratings and the overall ratings. In addition,
the learning process of SVR models was per item and user and
lastly combined the individual predictions in a weighted approach.
Similar to our research, they evaluated their methods using
Yahoo!Movie dataset.
3. Materials and methods
3.1. Higher Order Singular Value Decomposition (HOSVD)
To represent and recognize high-dimensional data effectively,
the dimensionality reduction is conducted on the original dataset
for low-dimensional representation [57]. Visualizing, comparing,
and decreasing processing time of data are the main advantages
of dimensionality reduction techniques. HOSVD is one of the pow-
erful dimensionality reduction techniques for tensor decomposi-
tion proposed by Lathauwer et al. [58]. They proposed HOSVD as
a generalization of the SVD that is used for tensors decomposition.
For obtaining HOSVD calculations the following steps are
needed:
Step 1: Unfolding of the mode-dtensor T2 RI1...Id which yields
matrices A(1),. . .,A(d). They are defined as:
An 2j
nin1In2In3 . . . IdI1I2 . . . Id1in2In3In4 . . . IdI1I2 . . . Id1
IdI1I2 . . . Id1i1I2I3 . . . In1 in1;
in 0; 1; . . . In 1 2
The matrix unfolding of a tensor can be defined as matrix rep-
resentations of that tensor in which all the column (row, etc.) vec-
tors are stacked one after the other[58].
In the case of 3rd-order tensors T2 RI1I2I3 , there exist three
matrix unfolding (seeFig. 1) as:
mode 1: j =i2+ (i31)I3,
mode 2: j =i3+ (i11)I1,
mode 3: j =i1+ (i21)I2.
Step 2: Identifying the d left singular matrices as U(1),. . .,U(d)
obtained by:
An Un
XnV
n; n 1; . . . ; d 3
In the Eq. (3), the matrices Un 2 RIn In and valuesPn 2 RInI1I2 ...In1 In1 ...Id stands for singular values in a diagonal ma-trix includes with descending order. The matrix V(n) stands for right
singular matrices that V(n)TV(n)=I and U(n)TU(n)=I. These singular
matrices are orthonormal.
Step 3: Finding the S2 RI1I2...Id (core tensor) through con-tracting the left singular matrices U(n) with original tensorT:
S T1U1T2U
2TdUdT 4
whereSia a as sub-tensors ofS2 RI1 I2...Id are found through fix-
ing the nth index to a with ordering properties as:
kSin 1kF rn1 P kSin 2kFr
n2 P P kSin InkF
rnIn P 0 5
In Eq.(5), for all possible values ofn, rni kSin ikF(Frobenious
norms) stands to the ithn-mode singular value of tensorT.Fig. 2
shows a pseudo code for HOSVD algorithm.
Procedure HOSVD (Input: Tensor T)
For HOSVD the computation cost is calculated as shown inTable 1.
M. Nilashi et al. / Knowledge-Based Systems 60 (2014) 82101 85
http://-/?- -
7/21/2019 1-s2.0-S0950705114000173-main
5/20
3.1.1. Truncated HOSVD
The truncated HOSVD is defined as a multi-rank approximation.
The truncated HOSVD is taken as the first approximation of an iter-
ative algorithm. The matrices and core tensor are updated itera-
tively starting with Eq. (4). The algorithm stops when it ceases to
improve the approximation or it reaches a maximum number of
iterations[59]. This iterative method belongs to the family of alter-
nating least-squares methods, and is called higher-order orthogo-
nal iteration[58].
According to Lathauwer et al. [58], for the determined decom-
position by HOSVD, the following norm holds:
kTk2FXR1i1
r1i 2
XRdi1
rdi 2
kSk2F 6
where the n-rank ofS is indicated by Rn. Suppose Rn(16 n6 d) bethe n-mode rank of tensor T. A tensoreT can be defined throughholding the largest I0n of n-mode singular values and ignoring the
remaining values. Thus, because of rank truncation, the error is
bounded by Lathauwer et al.[58]:
kT eTk 6Xdn1
XR1i1F1 1
rnin 2
7
In practice, using an analogous procedure demonstrated in
Fig. 2, the rank-(R1,R2,R2,. . .,Rd) ofeS(truncated core tensor) canbe defined by using Rnleading singular eigenvectors in preference
to keeping all left singular eigenvectors to build the transformation
matrixeUn.
3.2. k-Nearest Neighbor (k-NN) classifier
k-Nearest Neighbor (k-NN) classifier is a well-known and pow-
erful instance-based machine learning technique for classification
data[60]. By learning from all sorted training instances, k-NN sim-
ply can be applied to get results from training instances. Thek-NN
algorithm consists of two phases: training phase and classification
phase. In training phase, the training examples are vectors (each
with a class label) in a multidimensional feature space. In this
phase, the feature vectors and class labels of training samples are
stored. In the classification phase, k is a user-defined constant
(seeFig. 3), a query or test point (unlabelled vector) is classified
by assigning a label, which is the most recurrent among thektrain-
ing samples nearest to that query point. In other words, the k-NN
method compares the query point or an input feature vector with
a library of reference vectors, and the query point is labeled with
the nearest class of library feature vector. This way of categorizing
query points based on their distance to points in a training dataset
is a simple, yet an effective way of classifying new points. One of
the main advantages of thek-NN method in classifying the objects
is that it requires only few parameters to tune:k and the distancemetric, for achieving sufficiently high classification accuracy. Thus,
in k-NN based implementations, the best choice ofk and distance
metric for computing the nearest distance is an important task.
In k-NN classifier, the distance function usually is considered
Euclidean distance when the input vectors and outputs are real
numbers and discrete classes, respectively. In this study, we use
Euclidean, City-Block and correlation distance metrics for distance
calculation ink-NN.
Assume x1,x2,. . .,xmx indicates the first row vectors and y1,y2,. . .,ymy indicates the second row vectors, the various distance
metrics for measuring distance between xs and ytare defined as
follows:
dst ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiXnj1xsjytj2r 8
Mode-3 Unfolding: 1 2 31I I I
( )
=A Mode-3 Unfolding: 2 1 32I I I
( )
=A Mode-3 Unfolding: 3 1 23I I I
( )
=A
Fig. 1. Unfolding of a 3rd-order tensor.
Fig. 2. Procedure for decomposing tensors via HOSVD [59].
Table 1
Computational cost for main steps in HOSVD.
Step N-dim
Unfolding the tensor T O(I1I2. . .IN)
ConstructingAnAnT O(I2I1I2. . .In1In+1...IN)
DeterminingAnAnT
to obtainU(n) O(I3)
Contract tensorTwith matrices U(n) s to get tensorS O(I2I1I2. . .In1In+1...IN)
86 M. Nilashi et al. / Knowledge-Based Systems 60 (2014) 82101
-
7/21/2019 1-s2.0-S0950705114000173-main
6/20
dstXnj1
jxsjytj 9
dst 1 xs xsyt xt
0ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffixs xsxs xs
0p ffiffiffiffiffi ffiffiffiffiffiffi ffiffiffiffiffiffiffi ffiffiffiffiffiffiffi ffiffiffiffiffiffi ffiffiffiffiffiffi
yt ytyt yt0
pxs
1
n
Xj
xsj and yt1
n
Xj
ysj
10
where Eqs.(8)(10)stand for Euclidean, City-Block and correlation
distance metrics, respectively.
3.3. Adaptive Neuro-Fuzzy Inference System (ANFIS)
Soft computing techniques are known for their efficiency in
dealing with complicated problems when conventional analytical
methods are infeasible or too expensive, with only sets of opera-
tional data available. Fuzzy logic (FL) and Fuzzy Inference Systems(FIS), first proposed by Zadeh[61], provide a solution for decision
making based on vague, ambiguous, imprecise or missing data.
FL represents models or knowledge using IFTHEN rules. A Neu-
ro-Fuzzy system is functionally equivalent to a FIS. A FIS mimics
a human reasoning process by implementing fuzzy sets and
approximate reasoning mechanism which use numerical values in-
stead of logical values. A FIS requires a domain expert to define the
MFs and to determine the associated parameters both in the MFs,
and the reasoning section [62,63]. However, there is no standard
for the knowledge acquisition process. Thus, the results may be dif-
ferent if a different knowledge engineer is at work in acquiring the
knowledge from experts. A Neuro-Fuzzy system can replace the
knowledge acquisition process by humans using a training process
with a set of inputoutput training dataset. Thus instead of depen-dent on human experts, the Neuro-Fuzzy system will determine
the parameters associated with the Neuro-Fuzzy system through
a training process, by minimizing an error criterion. A popular Neu-
ro-Fuzzy system is called an ANFIS. ANFIS is fuzzy system that uses
Artificial Neural Network (ANN) theory to determine its properties
(fuzzy sets and fuzzy rules)[6469]. It consists of five feed-forward
layers as shown inFig. 4.
The ANFIS is functionally equivalent to TakagiSugenoKang
(TSK) fuzzy model. It can also express its knowledge in the IF
THEN rule format as follow:
Rule 1:IF In1is A1AND In2is B1THENf11=p11In1+q11In2+r11 Rule 2:IF In1is A1AND In2is B1THENf12=p12In1+q12In2+r12
Rule 3:IF In1is A2AND In2is B2THENf21=p21In1+q21In2+r21 Rule 4:IF In1is A2AND In2is B2THENf22=p22In1+q22In2+r22
where the parametersA1,A2,B1and B2determine labels for indicat-
ing MFs for the inputs parameters In1 and In2, respectively. Also,
parameterspij,qijand rij(i,j = 1, 2) denote parameters of the output
MFs.
InFig. 4, the layers in ANFIS perform the different action that is
detailed as bellow:
Layer 1: In this layer, membership grades are provided by nodes
which are adaptive nodes. The outputs in this layer are obtained
by:
O1Ai lAi In1; i 1; 2
O1Bj lBjIn2; j 1;211
where appropriate MFs are indicated by Ai and Bj for the input
parametersIn1and In2that can be defined as triangular, trapezoidal
and Gaussian functions. The Gaussian type MFs for Ai and Bj MFs
and input parameters In1and In2are defined as below:
lAi In1;ri; ci exp In1ci
2
2r2i !; i 1; 2
lBj In2;rj; cj exp In2cj
2
2r2j
!; j 1;2
12
where the parameters of the MFs are defined as {ri,ci} and {rj,cj},
governing the Gaussian functions. In this layer, ANFIS parameters
stand usually as premise parameters.
Layer 2: There are fixed number of nodes in the second layer,
labeled with P. The outputs of the second layer can be defined
as:
O2ij Wij lAi In1lBj In2; i;j 1; 2 13
where the symbol Wij is used here to represent weight.
Layer 3: In this layer, very nodeilabeled withNdetermines the
ratio of theith rules firing strength to the sum of all rules firingstrengths as:
Fig. 3. k-NN fork = 8 and k = 5.
Fig. 4. The structure of ANFIS.
M. Nilashi et al. / Knowledge-Based Systems 60 (2014) 82101 87
http://-/?-http://-/?-http://-/?-http://-/?-http://-/?- -
7/21/2019 1-s2.0-S0950705114000173-main
7/20
O3ij Wij WijX2
i1
P2j1Wij
; i;j 1; 2 14
where the output of this layer represents the normalized firing
strengths.
Layer 4: The nodes are adaptive nodes. The output of each node
in this layer is simply the product of the normalized firingstrength and a first-order polynomial (for a first-order Sugeno
model). Thus, the outputs of this layer are given by:
O4ij Wijfij WijpijIn1qijIn2rij; i;j 1; 2 15
where Wij is the output of layer 3, and {pij, qij, rij} is the parameter
set.
Layer 5: There is only one single fixed node labeled withR. This
node performs the summation of all incoming signals. Hence,
the overall output of the model is given by:
Out O5 X2i1
X2j1
Wijfij X2i1
X2j1
WijpijIn1qijIn2rij
X2
i1X2
j1
Wijpij In1 Wijqij In2 Wijrij 16where the overall output out is a linear combination of the conse-
quent parameters when the values of the premise parameters are
fixed.
3.4. Subtractive clustering
The idea in TSK model is that each rule in a rule base indicates
an area for a model, which can be linear [70]. The TSK rule struc-
ture in a basic shape is as follows:
If fx1 is A1;x2 is A2; . . . ;xk is Ak then y gx1;x2; . . . 17
where sentences forming the condition are connected through the
logical function f. The output y is obtained by gthat is a function
of the inputs x i.
In order to establish an effective TSK model of a process, using
subtractive clustering for generating clusters of datais constructive.
The main goal of using subtractive clustering as a cluster analyser is
to partition the dataset into a number of homogeneous and natural
subsets. The subtractive clustering method assumes each datapoint
is a potential cluster center and calculates a measure of the likeli-
hood that each data point would define the cluster center, based
on the density of surrounding data points. By using it, the quantity
of calculation is in proportion to thenumber of data points which is
foreign to the dimensions of problem. However, while the actual
cluster centers are not necessarily located at one of the data points,
in most cases it is a good approximation, especially with the re-
duced computation this approach requires[71]. In this method, a
data point with the highest potential, which is a function of the dis-tance measure, is considered as a cluster center. The data points
that are close to new cluster center are penalized in order to facili-
tate the emergence of new cluster centers [72]. From the Eq.(18),
the potential cluster center Pican be obtained at a data pointxias:
Pi Xmj1
exp kxixjk
2
ra2
2 !
18
whereXi= [Xi1,Xi2,. . .,Xin] andXj= [Xj1,Xj2,. . .,Xjn] are data vectors
for input and output dimensions, ra is a positive constant defining
the neighborhood range of the cluster or simply the radius of hyper-
sphere cluster in data space and |||| indicates the Euclidean dis-
tance. ra is a critical parameter that determines the number of
cluster centers or locations. The first cluster center is selected asthe c1 data point with the highest potential value, P
c1. For the sec-
ond cluster center, for determining the new density values, the re-
sult of the first cluster center is subtracted as follows:
Pi PiPc1
exp kxixjk
2
rb2
2 !
; rb gra 19
whererb is a positive constant, which defines a neighborhood that
has measurable reductions in density measure and g indicates a
constant greater than 1 to control and avoid cluster centers beingin too close proximity[73]. From the Eq. (19), the potential mea-
surement will be significantly reduced from data points near the
first cluster center c1. Based on the larger potential value, the data
pointc2is chosen for the second cluster center.
Usually, after determining thekth cluster centerck, according to
the Eq.(20),the potential is revised as:
Pi PiPck
exp kxixkk
2
rb2
2 !
20
where Pk is the largest potential density value and ck denotes the
location of the kth cluster center. After revising the density function,
the next cluster center is selected as the point having the greatest
density value. This process continues until a sufficient number ofclusters is attained at which all points lie within a loop belonging
to a cluster center.
4. Research methodology
Fig. 5 shows the general framework of proposed method with
combination HOSVD for dimensionality reduction and ANFIS com-
bined with subtractive clustering for discovering knowledge from
users ratings and predicting overall ratings.
In the first step, we apply the HOSVD for dimensionality reduc-
tion to reveal the latent associations among the components in the
user-item-criteria tensor. Then, we perform cosine-based similar-
ity for clustering to obtain groups of similar users and determine
labels for clusters. Indeed, by this way high quality clusters are ob-tained that is necessary for developing efficient ANFIS model. Then,
ANFIS is applied on clusters for extracting fuzzy rules and predict-
ing null values in overall ratings. The main tasks of dimensionality
reduction process are reducing the dimension and obtaining best
approximation of data in the tensor of user preferences about
items on multi aspects and finding users with similar preferences
on items and criteria. Measuring the similarity for users based on
their ratings on criteria provides the possibility of applying cluster-
ing method. After applying clustering method that provides the
classes of users with similar taste, ANFIS is used to extract knowl-
edge (fuzzy rules) from determined clusters. To increase the accu-
racy of rule-based system, reduce the amount of data in any class
and minimize overfitting in the training data, subtractive cluster-
ing is combined with ANFIS. Thus, the main steps in the proposedmethod for developing the model in the offline phase are:
Step 1: HOSVD is applied on training data in 3-order tensor for
dimensionality reduction to get the best approximation of rat-
ing information.
Step 2: The approximated data by HOSVD is used for clustering
using cosine-based similarity. In fact, in this step, label for each
vector of ratings is defined to be used in k-NN method in online
phase.
Step 3: ANFIS combined with subtractive clustering is used for
training data in clusters obtained in the previous step for
extracting fuzzy rules and forming rule clusters.
Step 4: The fuzzy rules are used for predicting existing null val-
ues of overall ratings in offline and online phases. It should benoted that for predicting the unknown overall ratings, we
88 M. Nilashi et al. / Knowledge-Based Systems 60 (2014) 82101
-
7/21/2019 1-s2.0-S0950705114000173-main
8/20
solved the sparsity problem in criteria using the neighborhood
formation in any clustering. For predicting the unknown criteria
ratings for the target item, we relied on a cosine-based similar-
ity as a similarity measure which was performed on approxi-
mated data obtained by HOSVD.
After learning the model in the offline phase, in the online
phase, the recommender system follows the recommendation
and prediction tasks of multi-criteria CF recommender systems
using the 3 main steps as:
Step 1: Usingk-NN method, recommender system predicts the
class label for new data.
Step 2: Recommender system refers to the corresponding fuzzy
rule cluster and predicts the overall rating for active user (see
Section4.2for more detail).
Step 3: After overall rating prediction, recommender system
forms the neighbors using cosine similarity presented in Eq.
(21)for active user from corresponding cluster and makes pre-dictions and Top-N recommendations.
4.1. Clustering the experimental dataset using HOSVD and improving
the scalability of multi-criteria CF
The multi-criteria CF are needed to quickly produce high quality
recommendations for very large-scale problems. In this paper, we
address the performance issues by scaling up the neighborhood
formation process through the use of dimensionality reduction
techniques. Scalability is an issue in multi-criteria CF because ten-
sor of data is composed of multiple dimensions and the dimension
in itself can be very large. There is no doubt that clustering tech-
niques reduces the sparsity and improves scalability of recom-mender systems: it does this by effectively partitioning the
ratings database. Previous studies[74,75]have indicated the ben-
efits of applying clustering in recommender systems. Using HOSVD
and cosine-similarity approaches, we perform the clustering task
in an effective way for multi-criteria CF.
As discussed earlier, for recommendation task in multi-criteria
CF, recommender systems deal with high-dimensional data and
this phenomenon makes the computational cost extremely high
and even non-feasible for traditional dimensionality reductiontechniques. Given the scalability challenge, in this paper, HOSVD
is able to (1) factorize large tensors efficiently using much less time
than standard methods, while at the same time and (2) obtain low-
rank factors that preserve the main variance of the tensors. Thus,
due to the dimensionality reduction, we can better form and pre-
compute the neighborhood that leads the prediction generation
be much faster in multi-criteria CF and this means that forming
neighborhoods in the low dimensional eigenspace provided better
quality and performance. In addition, after tensor decomposing by
HOSVD, the clustering of data using cosine-based similarity is per-
formed in an effective way and once the clustering is complete, the
performance of multi-criteria CF can be very good, since the size of
the group that must be analyzed is much smaller.
For applying HOSVD, 3-dimensional data is stored in the 3-
dimensional tensor A2 RI1I2I3 , whereby I1 corresponds to the
numberof users,I2corresponds to the number of items which were
rated and I3 is thenumber of used criteria. Each entry of thetensorA
is a number between 1 and 13. Using HOSVD the tensorA2 RI1I2I3
that contains the user ratings about items on four criteria was
decomposed into A2 S1U2V3W in which U2 RI1I1 , V2 RI2I2
andW2 RI3I3 are orthonormal matrices, and S2 RI1I2I3 is a core
tensor which satisfies all-orthogonality and ordering properties.
Similar to the truncated SVD for low-rank approximation and
dimensionality reduction of matrices, low-rank approximation
and dimensionality reduction of higher order tensors can be done
by the truncated HOSVD (but with better approximation and com-
putation), that is, take the firstr1columns ofU, the firstr2columns
ofV, the first r3 columns ofW, andthe top-left r1r2r3 block ofS.
In that direction, for dimensionality reduction for 3 dimensionsdataset, HOSVD is an effectivemethod. It is flexible to choosediffer-
ent rfor different modeof a tensor. The sizeof the datagoes downto
r1r2r3+I1r1+I2r2+I3r3 from I1I2I3, and ifr1=r2=r3 the size of the
data goes down to r3+r(I1+I2+I3). If we flat the tensor into a
I1I2I3matrix, the size of the data only goes down to R2+R(I1+I2I3). Therefore, result of the HOSVD decomposition on 3rd tensor of
users ratings are the matricesU,VandWthatshow the relationsbe-
tween user and user, item and item, and criterion and criterion,
respectively. This decomposition is obtained without splitting the
3-dimensional space into pair relations. For the sake of conciseness,
in the followinga very simpleexamplewithonly4 users 6 items and
4 criteriais demonstrated. Table 2 shows theuserrating foritems by
users based on 4 criteria and its decomposition to thematrices U,V,
W,S(:,:,1),S(:,:,2),S(:,:,3) andS(:,:,4) is shown inTable 3.
Unew U2V3W1Sy1 0:5091 0:2729
As can be seen in the Fig. 6, using cosine similarity in Eq.(21),
the similar users to the new user can be found. The cosine similar-
ity between two vectors A and B can be defined as:
similarity cosA; B AB
kAk kBk
Pni1AiBiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPn
i1Ai2
q
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPni1Bi
2q 21
By applying this method, system is able to cluster data based on
user similarity from matrixU. Sincek-NN predictor requires super-vised learning, cosine-based similarity is selected to obtain clusters
Dimensionality Reducon Using
HOSVD
ClusteringCluster 1 Cluster n
(Cluster 1) IF THEN (Cluster n) IF THEN Fuzzy Rules
Database
Overall Rangs Predicon
Criteria kCriteria 1 Criteria 2 Overall Rang
Mul-Criteria Dataset
ANFIS Combined with Subtracve
ClusteringExtracng Fuzzy Rules
Fig. 5. Proposed model using ANFIS and HOSVD.
M. Nilashi et al. / Knowledge-Based Systems 60 (2014) 82101 89
http://-/?-http://-/?- -
7/21/2019 1-s2.0-S0950705114000173-main
9/20
from approximated data by HOSVD to provide the labels for them.
From the truncated matrixU, the first row of the matrix is selectedand system does cosine similarities calculation through Eq. (21)
with the second row, third row and so on, until it reaches the last
row. The highest value of cosine similarities is clustered with the
first row. Applying this method on rows, the system will obtain
clusters with small number of similar users. With determining a
specific number of clusters, system can combines the close clusters
by calculating cosine-based similarity. Finally, after constructing
the clusters, the system assigns the category label to the each vec-
tor of users ratings. Similar to this procedure, we can obtain the a
specific number of clusters from the matrixV.
4.2. ANFIS architecture of proposed method and solving the sparsity
problem in multi-criteria CF
As discussed earlier, multi-criteria CF recommender systems
suffer from the sparsity problem in two sides, missing values in
the overall and criteria, and the system ought to predict these
missing ratings with new approaches. In this paper, we solve the
problem of sparsity in overall ratings using Neuro-Fuzzy system.
Generating the proper MFs and extracting the fuzzy rules for the
prediction of overall ratings are the main advantages of this meth-
od that can be used in the online and offline phases. Because in the
multi-criteria CF the overall ratings are based on users perception
of items features, thus, we can better alleviate sparsity problem in
the overall ratings using the generated MFs and fuzzy rules ob-
tained from users preference on the items features. In addition,
solving and alleviating the sparsity problem in multi-criteria CF
recommender systems improves the predictive accuracy of these
systems that has been proved in the prior researches[24,53]. Based
on the experimental results, we will also demonstrate that pro-
posed method significantly improve predictive accuracy of multi-
criteria CF. Using ANFIS, we will see that prediction error in overall
ratings is very low and even zero in many cases and this show the
capability of ANFIS in alleviating sparsity problem in an exact and
effective way.In this study, discovering the knowledge (fuzzy rules) from
users ratings and generalizing the relationship Y=f(X1,X2,. . .,Xn)
are the main goal of applying ANFIS for accurate prediction of
overall ratings that accordingly lead to predictive accuracy
improvement in multi-criteria CF. In this relationship,X1,X2,. . .,Xnstands for input variables and Ystands for output variable. In the
current study, overall rating or user overall preference about items
can be determined as a function of items features or criteria. Thus,
we associate the Yvariable to the overall rating and X1,X2,. . .,Xnvariables to the criteria ratings. Predicting the relationship be-
tween inputs and output is one of the important tasks that ANFIS
does. Based on the experimental dataset, the input parameters of
the ANFIS model under consideration are Acting (A), Directing
(D), Story (S) and Visuals (V). Overall rating (O) stands for outputthat is defined as overall preference. These attributes naturally
are vague, imprecise and incomplete fuzzy terms that lead to
uncertainty in user interest about items features such as Acting,
Story, Visuals and Directing. Thus, in ANFIS, they can be introduced
and expressed by fuzzy linguistic values (uncertainty modeling)
such as {cluster 1}, {cluster 2}, {cluster 3} and {cluster 4} that
determine the domain of user interest of Acting, Directing, Story
and Visuals in four regions using MFs. They are given in Fig. 7a
and b for two inputs Visuals and Directing, respectively.
The relationship between input variables (criteria) and outputs
(overall rating) can be defined as
Overall rating fActing;Directing; Story; Visuals 22
In ANFIS models, the output relations are related to the inputs
by mathematical relationships mapping using fuzzy rules. Fuzzy
rules play important role in the ANFIS models and they are back-
bone of such systems. The shape of fuzzy rules in ANFIS is defined
as
Table 2
Multi-criteria ratings for 4 users and 6 items on 4 criteria.
Items
Ratings on criteria 1 Ratings on criteria 2
Users 13 12 11 11 5 5 5 4 4 12 9 5
11 1 11 12 5 5 11 11 11 11 11 10
1 13 4 3 12 13 13 13 5 12 4 4
1 1 0 5 4 5 13 13 13 12 13 13
Ratings on criteria 3 Ratings on criteria 4
5 11 11 10 10 10 11 5 9 4 5 3
4 9 11 11 4 3 3 11 11 12 13 13
11 5 11 11 4 11 9 3 9 3 11 4
9 3 3 9 4 9 3 3 8 4 5 3
New user ratings on four criteria C1,C2, C3and C4C1 C2 C3 C4
I1 5 4 4 11
I2 4 5 11 4
I3 4 3 5 12
I4 13 3 12 4
I5 13 13 13 11
I6 13 12 12 13
Rule 1: IF A is A1 AND Dis B1 AND Sis C1 AND Vis D1 THEN f1=p1 A +q1D +r1S+t1V+ p1Rule 2: IF A is A2 AND Dis B2 AND Sis C2 AND Vis D2 THEN f2=p2 A +q2D +r2S+t2V+ p2
Rule 3: IF A is An AND Dis Bn AND Sis Cn AND Vis Dn THEN fn=pn A +qn D +rn S+tn V+ pn
90 M. Nilashi et al. / Knowledge-Based Systems 60 (2014) 82101
-
7/21/2019 1-s2.0-S0950705114000173-main
10/20
For example, in this study, from the users ratings to movies,
ANFIS by training vectors of users ratings in any clusters extracts
the fuzzy rules such as
IFthe Acting of a movie is cluster1 ANDDirecting is cluster1AND
Story is cluster1ANDVisuals is cluster1THENthe Overall Rating is
out1cluster1.
According to the extracted fuzzy rules by ANFIS, the out1cluster1
for overall rating is obtained from the MFs degree of 4 input variables.
Also, using subtractive clustering in ANFIS, system improves the
precision of extracted fuzzy rules obtained from users ratings to
movies and minimizes the overfitting in training the data. It re-
veals the users preferences about items features in soft clusters
and divides the user preferences on items features in fuzzy clus-
ters that system can predict exact relation between any criteria
and overall rating.
To illustrate a simple model of ANFIS applied on multi-criteriaCF, assume the system has two criteria S and Vand one output
along with two fuzzy IFTHEN rules. Fig. 8 shows the first-order
Sugeno FIS, the ANFIS model with two rules.
InFig. 8,SandVindicate the crisp inputs related to nodeiandAiandBiimply the linguistic labels distinguished by appropriate MFs
lAiand lBi , respectively. In this study, ANFIS uses the Guassian MF
as
lAi S e
Sbi 2
2a2i 23
lAi V e
Vbi 2
2a2i 24
Table 3
Generated matrices after applying HOSVD on tensor of users ratings.
S(:,:, 1) S(:,:, 2)
77.91 0.81 0.44 0.85 0.09 0.87 0.52 1.43 2.56 0.55 0.92 2.47
0.99 1.44 1.45 0.72 6.52 1.07 5.74 13.11 2.32 0.17 1.85 0.22
0.17 1.65 3.25 2.19 0.13 1.60 15.63 2.16 3.84 6.91 1.11 3.32
0.50 0.55 1.39 1.06 1.33 0.95 3.96 5.15 5.08 0.01 0.41 0.28
S(:,:, 3) S(:,:, 4)
0.19 2.27 4.17 6.40 0.21 2.11 0.18 0.64 2.73 0.08 3.50 3.54
5.80 3.74 1.87 4.98 2.50 3.21 0.40 9.24 6.96 1.28 3.30 1.39
3.88 6.62 0.73 2.12 4.56 0.98 4.60 1.49 3.76 2.71 0.16 1.64
5.85 6.28 1.63 2.45 1.91 2.11 0.70 1.80 0.77 0.59 3.14 0.26
Matrix U Matrix V
0.49 0.30 0.65 0.51 0.44 0.74 0.17 0.47
0.56 0.60 0.34 0.46 0.60 0.66 0.12 0.43
0.51 0.67 0.29 0.46 0.50 0.06 0.46 0.73
0.43 0.33 0.62 0.56 0.44 0.08 0.86 0.23
Matrix W
0.40 0.65 0.05 0.21 0.61 0.05
0.40 0.29 0.84 0.20 0.08 0.03
0.43 0.20 0.32 0.45 0.59 0.360.46 0.25 0.03 0.74 0.33 0.26
0.38 0.45 0.36 0.25 0.12 0.66
0.38 0.43 0.24 0.31 0.39 0.60
-0.65 -0.6 -0.55 -0.5 -0.45 -0.4 -0.35 -0.3-0.8
-0.6-0.4
-0.2
0
0.2
0.4
0.6
0.8
First Column of U and V
SecondColumnofUandV
U1
U2
U3
U4
I1
I2
I3
I4
I5
I6
Unew
Items
New User
Users
Fig. 6. 2D graph of users and items.
(a) (b)
10 10.5 11 11.5 12 12.5 130
0.2
0.4
0.6
0.8
1
Input Variable " Visual"
DegreeofMembership Cluster 3 Cluster 2 Cluster 1Cluster 4
10 10.5 11 11.5 12 12.5 130
0.2
0.4
0.6
0.8
1
Input Variable "Directing
DegreeofMembership Cluster 1
Cluster 4
Cluster 3 Cluster 2
Fig. 7. Membership functions for (a) Visuals and (b) Directing.
M. Nilashi et al. / Knowledge-Based Systems 60 (2014) 82101 91
http://-/?-http://-/?-http://-/?- -
7/21/2019 1-s2.0-S0950705114000173-main
11/20
where {ai, bi, ci} is the parameter set of the MFs in the premise part
of fuzzy IFTHEN rules that change the shapes of the MFs. Parame-
ters in this layer are referred to as the premise parameters.
From the ANFIS architecture shown inFig. 8, it can be observed
that when the values of the premise parameters are fixed, the over-
all output can be expressed as a linear combination of the conse-
quent parameters. In symbols, the output O can be rewritten as:
O w1
w1w2f1
w1w1w2
f2 . . .wn
wn1wnfn
w1p1Aq1Dr1St1V p1 wnpnAqnD
rnStnV pn
w1Ap1 w1Dq1 w1Sr1 w1Vt1 w1p1
wnApn wnDqn wnSrn wnVtn wnpn 25
which is linear in consequent parameters p i, q i, ri, ti, and pi. Fig. 9
shows the architecture of the implemented ANFIS that consist of
four inputs, four rules, sixteen MFs for inputs and output.
4.2.1. Training the ANFIS and model validation using checking and
testing datasetIn this study, three set of data were used for ANFIS modeling as
training, checking and testing data. ANFIS uses training data for
constructing the model of target system. The rows of training data
are used as inputs and outputs for construction the target model.
Checking data is used for testing generalization capability of the
FIS at each epoch that prevents over-fitting networks and verifies
the identified ANFIS. Similar to the format of training data, the for-
mats for the checking and testing data are defined data but gener-
ally their elements are different from those of the training data.
Any clusters obtained using HOSVD were divided into three
groups. The first group of data including 80% of the total dataset
of clusters was used for the training data and the second group
of data including 10% of the total dataset of clusters was used for
the checking data. The remaining 10% data of clusters was used
for the testing data.
5. Result and discussion
In order to analyse the effectiveness of the proposed method,
several experiments were conducted on Yahoo!Movies datasetprovided by Yahoo! Research Alliance Webscope program
(http://webscope.sandbox.yahoo.com).
On the Yahoo!Movies network, users could rate movies in 4
dimensions (Story, Acting, Direction and Visuals) and assign an
overall rating. Users used a 13-level rating scale for ratings. The
four features for any movies were considered as: C1= Acting,
C2= Story,C3= Visuals andC4= Directing. As can be seen inTable 4,
all users ratings are measured in a value between 1 and 13 in
quantitative scale.
In the experimental dataset there are 257,317 tuples of rating in
the original dataset with 127,829 users and 8272 movies. However,
the resulting ratings tensor is extremely sparse, because many of
the user-item-criteria entries are just empty fields. The sparsity
level of dataset is about 97.57% (sparsity level = 1density = 1 (257,317 100)/(127,829 8272) = 0.9757). That means,
not even 2.43% of all entries in the rating tensor are filled. Similar
to the work by Jannach et al. [24], we pre-processed the datasets
and created the test datasets with different density and quality
levels and applied the proposed method on YM-20-20, YM-10-10,
and YM-5-5. In this form, the description of dataset is presented
inTable 5.
5.1. Performance of HOSVD clustering
Because HOSVD is quickly calculated, HOSVD is applied on the
training tensor AeR15005004, which corresponds to the training
set. As result an approximationeAa1;a2;a3 is retained. The valuesset for a1, a2 and a3 determine the dimensions of the core tensor.
It should be noted that all the experiments in this study wereimplemented using MATLAB and on a Microsoft Windows operat-
ing systems with Intel Core i5 processors having a speed of
2.66 GHz and 4 GB RAM.
For estimating the performance of HOSVD clustering for rank 2,
4, 8, 12, 16, and 20 approximations, we adopt Silhouette coefficient
[76]value as the standard measure for clustering quality and used
it to determine the best cluster formation. The Silhouette coeffi-
cient can assess the quality of a clustering. It is an internal index
that measures how good the clustering fits the original data based
on statistical properties of the clustered data. External indices, by
contrast, measure the quality of a clustering by comparing it with
an external (supervised) labeling. The Silhouette coefficient of an
elementi of a clusterk is defined by the average distance a(i) be-
tween i and the other elements of k (the intra-cluster distance),and the distance b(i) between i and the nearest element in the
nearest cluster (is minimal inter-cluster distance).
Fig. 8. Architecture of implemented ANFIS model for two inputs, one output and
two rules.
Fig. 9. Architecture of the implemented ANFIS.
Table 4
A sample of the multi-criteria dataset from the Yahoo!Movies.
Movie ID User ID Directing Story Visual Acting Overall rating
2 1 1 2 1 2 1
13 13 11 13 13 13
9 13 13 8 9 8
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
13 2 13 13 13 13 13
13 13 11 13 13 12
12 13 13 13 12 13
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
92 M. Nilashi et al. / Knowledge-Based Systems 60 (2014) 82101
http://www.webscope.sandbox.yahoo.com/http://www.webscope.sandbox.yahoo.com/ -
7/21/2019 1-s2.0-S0950705114000173-main
12/20
sci bi aimaxfai;big
26
which can be written as:
sci
1 ai=bi; if ai< bi
0; if ai bi
bi=ai 1; if ai> bi
8>: 27An overall score for a set ofnkelements (a cluster or the entire
clustering) is calculated by taking the average of the Silhouette
coefficientssciof all elements i in the set. Thus,SCkcan be defined
as
SCk 1nk
Xnki1
sci 28
The Silhouette coefficient takes values between 1 and 1. The
closer to 1, the better the clustering fits the data. Table 6 lists a
general rule of thumb for interpreting the Silhouette coefficient.
Table 7 shows the average Silhouette Coefficient for HOSVD
clustering for rank 2, 4, 8, 12, 16, and 20 approximations. According
to theTable 7, the highest average Silhouette coefficient for HOSVD
clustering obtained 0.867 for rank 10 approximation. This accuracy
percentage is reasonably good. Based on observation, lower
approximation ranks do better than the high approximation ranks.
This supports our claim that truncated HOSVD gives better results.
5.2. Evaluation of proposed ANFIS model
After cluster analysis, the ANFIS model was applied on one of
the clusters with maximum Silhouette coefficient. In that cluster,
four fuzzy clusters have been determined for the given 190 users
ratings in the third cluster generated by HOSVD method for rank
approximation 12. The number of fuzzy rule set was equal to the
number of cluster centers, each representing the characteristic of
the cluster as given inTable 8.For evaluating the ANFIS model, several measures of accuracy
were used to determine the model capability for predicting the
overall rating. For this reason, the models were evaluated by four
estimators Mean Absolute Percentage Error (MAPE), Root Mean
Square Error (RMSE), Mean Absolute Error (MAE) and coefficient
of determination (R2). These estimators are determined by
MSE
PnO1actualO predictionO
2
n 29
MAPE
PnO1actualO predictionO=actualO
n 30
R2 1
PnO1actualO predictionO
2PnO1actualO actualO
2 31
RMSE
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffiffiPnO1actualO predictionO
2
n
s 32
where actual (O) indicates the real overall rating provided by user,
prediction (O) implies the predicted overall rating value and n cor-
responds to the number of used users ratings.
Usually, in the training process RMSE and MSE measure are
used to test the prediction model, however, in this study, other
performance measures were used to investigate for a more effec-
tive performance evaluation that are coefficient of determination
R2 and MAPE. The coefficient of determination R2 provides a valuebetween[1] about the training of the proposed network. A value
closer to 1 stands for the success of learning. Also, in this study,
MAPE was used that accurately identifies the model deviations.
After implementing the ANFIS model using fuzzy logic toolbox
in MATLAB 7.10.0 software, the training and checking data from
the training and checking dataset were tested for error estimation.
Data from four inputs was given to trained model of ANFIS along
with actual overall ratings. From the inputs value, the suitable
MFs (seeFig. 7(a) and (b)) were selected to predict the overall rat-
ings using the extracted rules (see Table 8). From the fuzzy rule
viewer of established ANFIS model shown in Fig. 10, the process
of overall rating prediction by selecting the MFs can be better visu-
alized. It indicates the behavioral of users over the change in values
of all four inputs for overall rating. From the fuzzy rule viewerabove, when the input parameters of Acting is at 11, Directing at
12, Story at 12, and Visuals at 11, an output of overall rating at
12 is obtained.
Table 9presents errors for a sample of training and checking
dataset. As can be seen the error from nineteen samples in Table 9,
ANFIS model has been trained effectively using training data.
Table 5
Information of Yahoo!Movies dataset.
Name #Users #Items #Overall ratings
YM-20-20 429 491 18,504
YM-10-10 1827 1471 48,026
YM-5-5 5978 3079 82,599
Table 6
Rule of thumb for the interpretation of the Silhouette coefficient.
Range Interpretation
>0.70 Strong structure has been found
0.500.70 Reasonable structure has been found
0.250.50 The str ucture is weak and could be artificial
-
7/21/2019 1-s2.0-S0950705114000173-main
13/20
For subtractive clustering, the parameters were defined by a
trial and error approach as: range of influence: accept ratio: 0.5, re-
ject ratio: 0.15 and 0.5 and squash factor: 1.25. However, we could
test the effect of the two variables raand rbthat represent a radius
of neighborhood on the training, checking and test data for overall
rating prediction error. The error was estimated in lowest value for
therb= 1.5raand the results of varying rafrom 0.3 and 0.8 for the
radius of neighborhood.Fig. 11presents the overall rating predic-tion error of checking and training for nineteen samples.
In this study, the average error for checking data was equal to
0.0001904. After 200 epochs, the averages RMSE, MSE, MAPE and
R2 were calculated 0.02144, 0.00912, 0.18230 and 0.82460, respec-
tively. The average error for training data was equal to
0.000162221. After 200 epochs, the averages RMSE, MSE, MAPE
and R2 were calculated 0.01272, 0.00912, 0.18230 and 0.99460,
respectively. Also, after 200 epochs, the average error for testing
data was equal to 0.000172361. The averages RMSE, MSE, MAPEand R2 were calculated 0.01951, 0.00949, 0.10230 and 0.91150,
respectively. Average training and checking error after 200 epochs
are shown inFig. 12.
Fig. 13illustrates the interdependency of four inputs parame-
ters and the overall rating obtained from the fuzzy rules generated
by ANFIS combined with subtractive clustering through control
surface. The level of overall rating can be depicted as a continuous
function of its input parameters as Acting, Directing, Story and
Visuals. The surface plots in this figure depict the variation of over-
all rating based on identified fuzzy rules.Fig. 13(a) shows the inter-
dependency of overall rating on Directing and Acting. Fig. 13(b)
depicts interdependency of overall rating on Acting and Story.
Fig. 13(c) shows interdependency of overall rating on Visuals and
Acting. Fig. 13(d) depicts interdependency of overall rating on
Story and Directing.Fig. 13(e) depicts interdependency of overall
Fig. 10. Fuzzy rule viewer for input and output variables of ANFIS model.
Table 9Training and checking errors for prediction overall ratings by ANFIS.
S ample # Training d at a Training AN FI S out put Training e rror (%) Che ck ing da ta Che cking AN FI S outp ut Che cking e rror ( %)
1 12 12 0 11 11.01 0.01
2 10 10.0001 0.0001 12 12.009 0.009
3 13 13 0 10 10.009 0.009
4 12 12 0 12 12.0001 0.0001
5 12 12 0 11 11.0008 0.0008
6 13 13 0 11 11.009323 0.009323
7 12 12 0 12 12.0004 0.0004
8 13 13 0 12 12.00383 0.00383
9 12 12 0 12 11.998 0.002
10 12 12 0 13 12.999998 0.000002
11 12 12 0 11 11.003 0.003
12 10 10.0013 0.0013 11 11.0005 0.0005
13 13 13 0 12 12.00276 0.00276
14 12 12 0 12 12.0003 0.0003
15 11 10.9999 -0.0001 12 11.9917 0.0083
16 12 12 0 12 11.99346 0.00654
17 12 12 0 11 10.99299 0.00701
18 13 13 0 10 10.009 0.009
19 12 12 0 11 10.9901 0.0099
0 5 10 15 20
-0.01
-0.005
0
0.005
0.01
0.015
Number of Samples
Predictionerror
Training Error
Checking Error
Fig. 11. Training and checking error for nineteen samples in the dataset.
94 M. Nilashi et al. / Knowledge-Based Systems 60 (2014) 82101
-
7/21/2019 1-s2.0-S0950705114000173-main
14/20
rating on Visuals and Directing andFig. 13(f) shows interdepen-
dency of overall rating on Story and Visuals.
These surface plots exactly show the users perception and
behaviors on any two features of items in the cluster of users with
similar preferences. In addition, the results depicted in the surfaceplots are valuable to reveal users behavior about items features in
multi-criteria CF. Thus, theusers preferences in anycluster of users
can be modeled by ANFIS and recommender system can recognize
whichitem features(criteria)in which level is tailored to their pref-
erences. Also,the several curves presented in Fig. 14(ad) reveal dis-
tinctly the user behavior on any feature of items. As can be seen in
these curves, there is a significant increase for overall rating versus
Story criteria in relation to the other criteria. It can be inferred that
Story criteria is most important for users in that cluster.
5.3. Multi-criteria CF evaluation
In this section, we completely focus on multi-criteria CF recom-
mendation using proposed method. As mentioned before, we used
k-NN for classification data and also we stated that selectingk and
distance metric are important ink-NN method for accuracy of clas-
sification. Therefore, in this study, the optimal distance metric and
k were chosen using cross-validation [77]. Thus, classifier could
accurately predict the testing data. Five-fold cross-validation
method has been applied to choice the type of distance metric
and bestk value.
Using five-fold cross-validation approach, for valuesk = 1,k= 3,
k= 5 andk= 7 and three different methods of calculating the near-
est distance (Euclidean), correlation and City-Block, the result of
averaged classification accuracy presented in Table 10. From
Table 10, the highest averaged classification accuracy is obtained
about 98.91% using Euclidean distance metric for k = 5 in compar-
ison to the City-Block (95.89%) and Correlation (96.76%) distance
metrics. Also, using Euclidean method, the averaged classificationrate is higher than Correlation and City-Block methods for all val-
ues ofk. Thus, based on this result, we established the optimal va-
lue 5 obtained using five-fold cross-validation and Euclidean for
distance metric for classification accuracy.
We determined the precision respectively the recall of the Top-
N list of each element in the test set and build the arithmetic mean
of these values. The recommenders prediction accuracy was mea-
sured by RMSE[78], which is a widely used metric for evaluating
the statistical accuracy of recommendation algorithms, given by
RMSE
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffiffiffiffiffi1
jXj
Xui;oj 2X
jaijpijj2
s 33
where X = {(ui, oj)|uihad ratedojin the probe set}. A lower value ofRMSE indicates a higher accuracy of the recommendation system.
Table 11presents the RMSE obtained from proposed approach on
YM-5-5 (each movie has at least 5 ratings), YM-10-10 (each movie
has at least 10) ratings and YM-20-20 (each movie has at least 20
ratings).Fig. 15shows the prediction accuracy for different neigh-
borhood size on datasets YM-5-5, YM-10-10 and YM-20-20.
To compare the proposed method with the HOSVD, truncated
SVD and some stat-of-the-art approaches in multi-criteria CF, we
employ the recall and precision metrics, which are widely used
in recommender systems to evaluate the quality of recommenda-
tions[79,80]. Precision is the ratio of relevant items recommended
to total number of items recommended. Recall is the ratio of rele-
vant items recommended to total number of relevant items that
exist. The two measures are inversely related and are dependent
on the length of the recommendation list. The longer the recom-
mendation list, the easier it becomes to achieve high recall, but
the more difficult it becomes to achieve good precision. The F mea-
sure is the weighted harmonic mean that combines both precision
and recall[24].
Recall Number of correctly recommended itemsNumber of interesting items
34
PrecisionNumber of correctly recommended items
Number of recommended items 35
where items of interest to a customeru refer to products in the test
set that were purchased byu, and correctly recommended items are
items that match the items of interest. Although these measures are
simple to compute and intuitively appealing, they are in conflict be-
cause increasing the size of the recommendation set improves the
recall at the expense of reducing the precision[8].
The F1-metric [24,79], which combines precision and recall, is
also widely used to evaluate the quality of recommendations. Spe-
cifically, the trade-off between precision and recall is balancedusing this measure by assigning equal weights to both metrics.
Therefore, we use the F1-metric in our evaluation, as shown in
Eq.(36).
F12 Recall Precision
Recall Precision 36
We ran the experiments on datasets YM-10-10 and YM-20-20
datasets forNequal 1, 5, 7, 15, 25, 35 and 40, where Nis the num-
ber of items to be recommended by the Top-N recommender
systems.
From all the two F1 curves inFigs. 16 and 17, we can notice that
the proposed method gives high level of accuracy when the size of
neighbors is increased versus the Top-N recommendation. This
outcome demonstrates the significance of combining HOSVDmethod and ANFIS with subtractive clustering for overcoming
Fig. 12. The error of each observation for checking and training data.
M. Nilashi et al. / Knowledge-Based Systems 60 (2014) 82101 95
http://-/?-http://-/?-http://-/?-http://-/?- -
7/21/2019 1-s2.0-S0950705114000173-main
15/20
the problems connected to the multi-criteria CF. The results above
clearly reveal that the proposed method gives better result for YM-
20-20. In theFigs. 16 and 17, the significant changes in accuracymeasured by F1 between neighbor size 15 and 25 indicates that
high accuracy is obtained for large neighborhood compared with
the small neighborhood. These outcomes according to the experi-
ments are related to result of clustering and extracting fuzzy rulesfrom YM-20-20 and YM-10-10 datasets.
Fig. 13. Interdependency of overall rating on (a) Directing and Acting, (b) Acting and Story, (c) Visuals and Acting, (d) Story and Directing, (e) Visuals and Directing, and (f)
Story and Visuals.
96 M. Nilashi et al. / Knowledge-Based Systems 60 (2014) 82101
-
7/21/2019 1-s2.0-S0950705114000173-main
16/20
In order to compare the proposed method with previous work
[23,24,52], we also evaluated our approach on the YM-10-10 using
an additional set of metrics. In theTable 12, we report Precision@5
and Precision@7 values as well as the Mean Absolute Error (MAE).
We also performed SVD and HOSVD techniques without using AN-
FIS with subtractive clustering on YM-10-10 and YM-20-20 data-
sets; the results are presented in Table 13.
The MAE is determined as the average absolute deviation be-tween predicted ratings and true ratings shown in Eq. (37).
MAEpred; act XNi1
predu;iactu;i
N
37whereNis the number of items on which a useru has expressed an
opinion.
From the results, we can find that the precision at Top-5 and
Top-7 of the proposed method outperforms the algorithms in the
previous work and methods using solely HOSVD and SVD.
In order to compare proposed method with MC-SeCF developed
by Shambour and Lu[53]evaluated on MovieLens dataset, we also
evaluated our approach on YM-20-20 and YM-10-10 using MAE
metric for different neighborhood size. The MAE comparison isshown inFig. 18. When looking the curves in this figure, the signif-
icant improvement in recommendation accuracy is obtained in the
large neighborhood sizes. We present the recommendation accu-
racy using MAE for YM-20-20 and YM-10-10 in Table 14. From
Fig. 18andTable 14, it can be observed that the proposed method
slightly better improves recommendation accuracy on the YM-20-
20 for large neighborhood sizes. Compared with the MC-SeCF, the
MAE values for our method is slightly higher than the MAE values
of MC-SeCF. However, quite interestingly, on the YM-20-20 data-
set, better recommendation accuracy is obtained by our method.
This indicates that on the on the YM-20-20, the accuracy is rela-
tively high because of discovering better fuzzy rules. Also, as men-
tioned earlier, using semantic information of items can be
incorporated to the multi-criteria CF for obtaining more accuraterecommendations.
Fig. 14. Curves for revealing the relationship between overall rating and (a) Visuals, (b) Directing, (c) Story, and (d) Acting.
Table 10
Averaged classification accuracy for distance metrics and values ofk .
Distance metric k= 1 k= 3 k= 5 k= 7 k= 9
Euclidean 96.63 97.34 98.91 97.67 95.56
City block 94.72 94.87 95.89 93.89 93.73
Correlation 95.28 95.38 96.76 94.88 94.87
Table 11
Coverage and RMSE for YM-5-5, YM-10-10 and YM-20-20.
Size of neighborhood Dataset
YM-5-5 YM-10-10 YM-20-20
RMSE RMSE RMSE
5 0.551097 0.5365 0.53158
10 0.549707 0.5310 0.52988
15 0.544308 0.5289 0.52039
20 0.538909 0.5200 0.51558
25 0.530209 0.5184 0.51129
30 0.528609 0.5174 0.50349
5 15 25 35 45 550.5
0.51
0.52
0.53
0.54
0.55
0.56
0.57
0.58
Neighborhood Size
RMSE
YM 5-5
YM 10-10
YM 20-20
Fig. 15. RMSE and neighborhood size.
M. Nilashi et al. / Knowledge-Based Systems 60 (2014) 82101 97
http://-/?-http://-/?-http://-/?-http://-/?- -
7/21/2019 1-s2.0-S0950705114000173-main
17/20
Also, according to the rank 12 approximation defined for HOS-
VD decomposition, we employed the precision on different number
of clusters. For rank 12 approximation, the defined number of clus-
ters was changed iteratively: starting from the 3 clusters, after
each iteration number of clusters was increased by 3 until 12.
Fig. 19 illustrates the precision value for Precision@5 and Preci-
sion@7 versus number of clusters.
As can be seen inFig. 19, the worst precision is obtained for YM-10-10 at precision@5 in the third cluster and the best precision is
achieved for YM-20-20 at Precision@5 in the twelfth cluster. This
result demonstrates for YM-10-10 and YM-20-20 that the preci-
sion is increased with increasing the number of clusters.
To experimentally show the effectiveness of clustering using
HOSVD and cosine-based similarity, we performed the experi-
ments on similarity-based approach developed by Adomavicius
and Kwon [23] and compared with the proposed method. They
proposed different potential ways to calculate the similarity be-
tween users based on their criteria ratings. It should be noted that
Chebyshev distance metric performed best among their similarity-
based approaches.
Fig. 20presents the performance results of our experiments for
proposed method and similarity-based approach using Chebyshev
distance metric. The throughput is plotted as a function of thecluster size demonstrated in Fig. 20. We define throughput of a
1 5 7 15 25 35 400.77
0.78
0.79
0.80.81
0.82
0.83
0.84
0.85
0.86
Top-N
F
measure
5 Neighbors
15 Neighbors
25 Neighbors
35 Neighbors
Fig. 16. F1 measure and Top-N recommendation for YM-10-10.
1 5 7 15 25 35 400.76
0.78
0.8
0.82
0.84
0.86
Top-N
Fmeasure
5 Neighbors
15 Neighbors
25 Neighbors
35 Neighbors
Fig. 17. F1 measure and Top-N recommendation for YM-20-20.
Table 12
MAE, precision at Top-5 and Top-7 of proposed method HOSVD and truncated SVD for
YM-10-10 (neighborhood size: all users).
Algorithm Precision@5 Precision@7 MAE
HOSVD 75.34 72.85 1.17
Truncated SVD 74.03 72.19 1.75
HOSVD-ANFIS and subtractive
clustering
81.44 80.78 0.96
Table 13
MAE, precision at Top-5 and Top-7 for proposed method, HOSVD and truncated SVD
for YM-20-20 (neighborhood size: all users).
Algorithm Precision@5 Precision@7 MAE
HOSVD 78.57 76.43 0.95
Truncated SVD 75.12 73.21 1.45
HOSVD-ANFIS and subtractive
clustering
83.34 81.32 0.91
10 20 30 40 50 70 900.66
0.67
0.68
0.69
0.7
0.71
0.72
0.73
Size of Neighbors
MAE(%)
YM-10-10
YM-20-20
Fig. 18. Recommendation accuracy for different neighborhood sizes on YM-20-20
and YM-10-10.
Table 14
Recommendation accuracy using MAE for different neighborhood size.
Neighborhood size MAE(%) YM-10-10 MAE(%) YM-20-20
10 0.7325 0.7105
20 0.7370 0.7088
30 0.7249 0.7093
40 0.7260 0.7015
50 0.7184 0.6902
70 0.7112 0.6724
90 0.7180 0.6688
4 6 8 10 12
0.72
0.74
0.76
0.78
0.8
0.82
0.84
0.86
0.88
Number of Clusters
Precision
Precision @5 YM-20-20
Precision @7 YM-20-20
Precision @5 YM-10-10
Precision @7 YM-10-10
Fig. 19. Precision versus number of clusters in Precision@5 and Precision@7 for
different dataset.
98 M. Nilashi et al. / Knowledge-Based Systems 60 (2014) 82101
-
7/21/2019 1-s2.0-S0950705114000173-main
18/20
multi-criteria CF recommender system as the number of recom-
mendations generated per second for k selected user (k= 5). From
the curves in this plot, we see that using the HOSVD and cosine-based approaches for clustering the high-dimensional data, the
throughput is substantially higher than the multi-criteria CF based
on similarity-based approach. This is due to the reason that with
the clustered approach using HOSVD and cosine-based similarity
the prediction algorithm uses a fraction of neighbors. The through-
put of multi-criteria recommender system increases rapidly with
the increase in the number of clusters with the small sizes. Since
the multi-criteria CF based on similarity approach has to scan
through all the neighbors, the number of clusters does not impact
the throughput.
We also evaluated the recommendation quality using coverage
measures. Coverage measures the percentage of items for which a
CF system can provide a prediction or that ever appear in a recom-
mendation list[81]. It should be noted that a recommender systemmaintains a good level of coverage so that most of the items are
connected in some way to the rest of the data, otherwise they will
be isolated and essentially dormant in the system.
The curves shown inFig. 21present the quality of the recom-
mendation of proposed method and reveals that the coverage is
strongly related to the neighborhood size. Table 15 presents the
coverage obtained from the proposed method. To experimentally
show the effectiveness of clustering using HOSVD and cosine-
based similarity on coverage, we also performed the experiments
on similarity-based approach as presented in Table 15.
From theTable 15, the proposed method maintains a good level
of coverage in relation to the similarity-based approach on differ-
ent neighborhood sizes. In addition, the results also confirm that
proposed method and similarity-based approach have good cover-
age on YM-20-20.
6. Conclusion and future work
In this paper, a new method was proposed using a combination
of HOSVD and ANFIS combined with subtractive clustering to im-
prove the recommendation quality and predictive accuracy of mul-
ti-criteria CF. We proposed this method for overcoming the
existing shortcomings such as predicting the overall ratings, spar-
sity, scalability and uncertainty induced from vagueness and
imprecision in representing and reasoning items features in mul-
ti-criteria CF.
Using HOSVD, we reduced the noise of high-dimensional data
effectively and improved the scalability problem. Also, by HOSVD,
we considered all factors in the third-order tensor of user, item and
criteria all together to reveal latent relationships between them.
The results of applying HOSVD method on the high-dimensional
dataset assist us to have clusters with high quality using cosine-based similarity. In addition, tensor decomposition using HOSVD
on the experimental dataset demonstrated its advantages in case
of dimensionality reduction in more than two dimensions for
obtaining favorable approximation of information. From the exper-
iments, we observed that proposed method using HOSVD and AN-
FIS achieves better recommendation accuracy in relation to the
algorithms in the previous work and methods using solely SVD
and HOSVD.
The experimental results on movie dataset clearly demon-
strated the capability of ANFIS modeling using MFs and fuzzy rules
without the human expert intervention in multi-criteria CF. Be-
sides, the model of ANFIS combined with subtractive was used to
extract knowledge from user ratings and preferences on items fea-
tures. This was done by incorporating the element of training intothe existing Neuro-Fuzzy system. Furthermore, with the training
data of ANFIS, the rules and the MFs were properly tuned to predict
the unknown overall ratings for alleviating the sparsity problem
which have advantages in terms of the simplicity of the algorithm
and the speed of the training convergence. Moreover, users ratings
on items in multi-criteria CF are accumulated overtime and fuzzy
rules can be amended and maintained in rules database for predic-
tion tasks. The advantage of this method is its flexibility and
extendibility in which can be developed for any number of dimen-
sions and criteria/features the dataset.
We analysed the predictive accuracy of proposed method on a
real-world dataset in the domain of movie recommendation pro-
vided by Yahoo!Movie. We used the popular measurement met-
rics: the F1, RSME, MAE and the coverage. The proposed methodwas evaluated in cases of MAE, Precision@5 and Precision@7 using
3 6 9 12
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
Number of clusters
Throughput
(Recs./Sec)
Similarity-Based Approach YM-20-20
Similarity-Based Approach YM-10-10
HOSVD and Cosine-Based YM-20-20
HOSVD and Cosine-Based YM-10-10
Fig. 20. Throughput of proposed method versus similarity-based approach.
5 15 25 35 45 55
0.992
0.994
0.996
0.998
1
1.002
Neghiborhood Size
Coverage
YM 20-20
YM 10-10
YM 5-5
Fig. 21. Neigh