1-s2.0-S0950705114000173-main

download 1-s2.0-S0950705114000173-main

of 20

description

dskjsdklfs df vkfdsnklgdf

Transcript of 1-s2.0-S0950705114000173-main

  • 7/21/2019 1-s2.0-S0950705114000173-main

    1/20

    Multi-criteria collaborative filtering with high accuracy using higher

    order singular value decomposition and Neuro-Fuzzy system

    Mehrbakhsh Nilashi , Othman bin Ibrahim, Norafida Ithnin

    Faculty of Computing, Universiti Teknologi Malaysia, 81310 Skudai, Johor, Malaysia

    a r t i c l e i n f o

    Article history:Received 8 April 2013

    Received in revised form 3 January 2014

    Accepted 6 January 2014

    Available online 10 January 2014

    Keywords:

    Neuro-Fuzzy inference system

    Higher order singular value decomposition

    Subtractive clustering

    Sparsity

    Scalability

    Multi-criteria collaborative filtering

    a b s t r a c t

    Collaborative Filtering (CF) is the most widely used prediction technique in recommender systems. Itmakes recommendations based on ratings that users have assigned to items. Most of the current CF rec-

    ommender systems maintain only single user ratings inside the user-item ratings matrix. Multi-criteria

    based CF presents a possibility of providing accurate recommendations by considering the user prefer-

    ences in multi aspects of items. However, in the multi-criteria CF, the user behavior about items features

    is frequently subjective, imprecise and vague. These in turn induce uncertainty in reasoning and repre-

    sentation of items features that exactly cannot be solved using crisp machine learning techniques. In

    contrast, approaches such as fuzzy methods instead of crisp methods can better solve the issue of uncer-

    tainty. In addition, fuzzy methods can predict the users preference more accurately and even better alle-

    viate the sparsity problem in overall rating by considering user perception about items features. Apart

    from this, in the multi-criteria CF, users provide the ratings on different aspects (criteria) of an item in

    new dimensions; thereby, increasing the scalability problem. Appropriate dimensionality reduction tech-

    niques are thus needed to capture the high dimensions all together without reducing them into lower

    dimensions to reveal the latent associations among the components. This study presents a new model

    for multi-criteria CF using Adaptive Neuro-Fuzzy Inference System (ANFIS) combined with subtractive

    clustering and Higher Order Singular Value Decomposition (HOSVD). HOSVD is used for dimensionality

    reduction for improving the scalability problem and ANFIS is used for extracting fuzzy rules from theexperimental dataset, alleviating the sparsity problems in overall ratings and representing and reasoning

    the users behavior on items features. Experimental results on real-world dataset show that combination

    of two techniques remarkably improves the predictive accuracy and recommendation quality of multi-

    criteria CF.

    2014 Elsevier B.V. All rights reserved.

    1. Introduction

    During the last decade the amount of information available on-

    line increased exponentially and information overload problem has

    become one of the major challenges faced by information retrieval

    and information filtering systems. Recommender systems are one

    solution to the information overload problem. In the mid-1990s,recommender systems became active in the research domain when

    the focus was shifted to recommendation problems by researchers

    that explicitly rely on user rating structure and also emerged as an

    independent research area[1].

    Recommender systems based on Collaborative Filtering (CF) are

    particularly popular and used by large online[24]. CF algorithms

    can be divided into two categories: memory-based algorithms and

    model based algorithms [3,5,6]. Memory-based (or heuristic-

    based) methods, such as correlation analysis and vector similarity,

    search the user database for user profiles that are similar to the

    profile of the active user that the recommendation is made for

    [7]. Heuristic-based approaches are classed into user-based and

    item-based approaches [6,8]. User-based CF has been the most

    popular and commonly used (memory-based) CF strategy [9]. It

    is based on the premise that similar users will like similar items.

    Item-based CF was first proposed by [10] as an alternative styleof CF that avoids the scalability bottleneck associated with the tra-

    ditional user-based algorithm. The bottleneck arises from the

    search for neighbors in a population of users that is continuously

    growing. In item-based CF, similarities are calculated between

    items rather than between users, the intuition being that a user

    will be interested in items which are similar to items he has liked

    in the past. Two of the most popular approaches to computing sim-

    ilarities between users and items are the Pearson correlation coef-

    ficient and cosine-based coefficients.

    One of the main problems in the recommender systems specif-

    ically CF is known as the sparsity problem [1114]. Also, memory

    based CF approaches suffer from the scalability problem. Therefore,

    0950-7051/$ - see front matter 2014 Elsevier B.V. All rights reserved.http://dx.doi.org/10.1016/j.knosys.2014.01.006

    Corresponding author. Tel.: +60 197608281.

    E-mail address:[email protected](M. Nilashi).

    Knowledge-Based Systems 60 (2014) 82101

    Contents lists available at ScienceDirect

    Knowledge-Based Systems

    j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / k n o s y s

    http://dx.doi.org/10.1016/j.knosys.2014.01.006mailto:[email protected]://dx.doi.org/10.1016/j.knosys.2014.01.006http://www.sciencedirect.com/science/journal/09507051http://www.elsevier.com/locate/knosyshttp://www.elsevier.com/locate/knosyshttp://www.sciencedirect.com/science/journal/09507051http://dx.doi.org/10.1016/j.knosys.2014.01.006mailto:[email protected]://dx.doi.org/10.1016/j.knosys.2014.01.006http://crossmark.crossref.org/dialog/?doi=10.1016/j.knosys.2014.01.006&domain=pdf
  • 7/21/2019 1-s2.0-S0950705114000173-main

    2/20

  • 7/21/2019 1-s2.0-S0950705114000173-main

    3/20

    applied on tensors with more than 3 dimensions. This can be one of

    the main advantages of HOSVD which make it flexible and effective

    approach for multi-criteria CF where other traditional machine

    learning techniques have failed. It should be noted that using HOS-

    VD the computation time for decomposition procedure is high

    when the tensor order is increased. However, it can be done in

    the offline phase and with incremental learning for data approxi-

    mation procedure in the online phase.In the proposed model, ANFIS aims to extract knowledge (rules)

    from the users ratings in multi aspect to be used in overall rating

    prediction task. The extracted rules is employed for predicting un-

    known ratings for alleviating sparsity problem in overall rating and

    also revealing the real level of user preferences on items features.

    The ANFIS provides flexible structure of defined problem that is

    suitable for generating stipulated inputoutput pairs using a set

    of induced fuzzy IFTHEN rules with appropriate and varied MFs

    [27]. The produced Fuzzy Inference System (FIS) is served to pre-

    dict user overall preferences about items features with proper

    training. The elements of this model are a fuzzy set, a neural net-

    work and data clustering. In addition, non-stochastic uncertainty

    emerging from vagueness and imprecision is handled using ANFIS.

    The MFs produced by ANFIS is used for representation and reason-

    ing users behavior of providing rating according to their percep-

    tion about items features. The MFs formed by ANFIS are

    continuous and more accurate in representing the features of items

    and user feedbacks. Furthermore, to prevent the problem of over-

    fitting discussed in the previous researches [24,28], subtractive

    clustering is applied to minimize overfitting by fine-tuning the AN-

    FIS models and also the checking set is used to solve this problem

    in the training data.

    In the context of product recommendation, in practical appli-

    cations and situations, customers are interested in rating the

    items or express their preferences in linguistic terms, such as

    {low interest}, {high interest} or {no interest} for the item fea-

    tures. This gives a suggestion to design multi-criteria CF to be

    user-friendly and convenient for users in giving ratings to items.

    Therefore, for multi-criteria CF, the fuzzy logic and fuzzy set ismore appropriate in human linguistic reasoning with imprecise

    concepts in relation to the crisp approaches. In addition, linguis-

    tic terms are more suitable than numerical values in assessing

    qualitative information, which is usually related to the human

    perceptions, opinions and tastes. Hence, in multi-criteria CF, it

    is more appropriate that the linguistic terms be considered for

    users to express their preferences, knowledge and personal judg-

    ments [29]. From this perspective, we can define users degrees

    of preference regarding a particular item in a set of linguistic

    terms such as {low interest}, {high interest} or {no interest} for

    the feature of items. Furthermore, fuzzy approach provides a

    way to quantify the non-stochastic uncertainty that is induced

    from imprecision, vagueness, and subjectivity. Modeling with

    fuzzy approach is more reliable than traditional statistical meth-ods such as Bayesian method which handles uncertainty due to

    randomness. Moreover, the discovered fuzzy rules from the

    users ratings through ANFIS can maintain in the rules database

    to be used in the next predictions for items recommendation.

    These properties promise to provide the framework for address-

    ing the representation and inference challenges in multi-criteria

    CF research.

    In this study, we consider the proposed method for movie do-

    main recommender systems. However, the method can also be

    adopted for e-business and e-government applications recom-

    mender systems such as recommender systems developed by

    Zhang et al. [30]and Shambour and Lu[31,32]for e-business and

    e-government applications, respectively.

    Finally, we perform an in-depth experimental evaluation, whichthe user rating about items in multi aspects obtained from

    Yahoo!Movies network and several comparisons are conducted be-

    tween our method and other algorithms.

    Thus, in comparison with research efforts found in the litera-

    ture, our work has the following differences. In this research:

    A new hybrid recommendation model using HOSVD and Neuro-

    Fuzzy techniques is proposed for increasing the predictive accu-

    racy and improving the scalability of the multi-criteria CF. Sparsity issue in overall ratings is solved using Neuro-Fuzzy

    technique.

    HOSVD is used for scalability improvement.

    The remainderof this paper is organized as follows: In Section 2,

    research background and related work are described. HOSVD

    dimensional reduction technique,k-Nearest Neighbor (k-NN) Clas-

    sifier, ANFIS and subtractive clustering are introduced in the sepa-

    rate subsections in Sections3. Section4 provides an overview of

    research methodology. Section 5 presents the result and discus-

    sion. Finally, conclusions and future work is presented in Section6.

    2. Research background and related work

    In the area of personalized web search, Sun et al. [33]proposed

    Cube singular value decomposition (CubeSVD) to improve Web

    Search. Based on their CubeSVD analysis, which also used HOSVD

    technique, web search activities carried out more efficiently. They

    evaluated the method on MSN search engine data. In the field of

    recommender systems, several recommendation models have been

    proposed which have used three dimensional tensors for recom-

    mending music, objects and tags. Recommender models, using

    HOSVD for dimension reduction have been proposed for recom-

    mending personalized music[22]and tags[34]. Xu et al.[35]used

    HOSVD to provide item recommendations. Their work was com-

    pared with a standard CF algorithm, without focusing in tag recom-

    mendations. Leginus et al. [36] utilized clustering techniques for

    reducing tag space that improved the quality of recommendationsand also the execution time of the factorization and decreases the

    memory demands. Their proposed method was adaptable with

    HOSVD. They also introduced a heuristic method to speed-up

    parameters tuning process for HOSVD recommenders. Symeonidis

    et al.[37]introduced a recommender based on HOSVD where each

    tagging activity for a given item from a particular user is repre-

    sented by value 1 in the initial tensor, all other cases were repre-

    sented with 0. Li et al. [38] presented a multi-criteria rating

    approach to improve personalized services in mobile commerce

    using Multi-linear Singular Value Decomposition (MSVD). The

    aim of their paper was to exploit context information about the

    user as well as multi-criteria ratings in the recommendation

    process.

    The fuzzy logic field has grown considerably in a number ofapplications across a wide variety of domains like in the semantic

    music recommendation system [39] and product recommenda-

    tions[40]. Castellano et al.[41]developed a Neuro-Fuzzy strategy

    combined with soft computing approaches for recommending Uni-

    form Resource Locators (URLs) to the active users. They used fuzzy

    clustering for creating user profile considering the similar brows-

    ing behavior. de Campos et al.[42]proposed a model by combining

    Bayesian network for governing the relationships between the

    users and fuzzy set theory for presenting the vagueness in the

    description of users ratings. A conceptual framework based on fuz-

    zy logic-based was proposed by Yager[43]to represent and justify

    the recommendation rules. In the proposed framework, an internal

    description of the items was used that relied solely on the prefer-

    ences of the active user. Carbo and Molina[44]developed an algo-rithm based on CF that ratings and recommendations were

    84 M. Nilashi et al. / Knowledge-Based Systems 60 (2014) 82101

  • 7/21/2019 1-s2.0-S0950705114000173-main

    4/20

    considered as linguistic labels by using fuzzy sets. A model pro-

    posed by Pinto et al. [45] that combined fuzzy numbers, product

    positioning (from marketing theory) and item-based CF. Zhang

    et al. [30] developed a hybrid recommendation approach with

    combination of user-based and item-based CF techniques using

    fuzzy set techniques and applied it to mobile product and service

    recommendation. They tested the prediction accuracy of their hy-

    brid recommendation approach using MovieLens 100 K dataset.In case of multi-criteria CF, few researches has been conducted

    to develop the similarity calculation of the traditional memory-

    based CF approach to investigate multi-criteria rating [23,46,47]

    that the similarities between users are estimated through aggre-

    gating traditional similarities from individual criteria or applying

    multidimensional distance metrics. Aggregation function approach

    was seen by Adomavicius and Kwon [23] as the overall rating r0can serve as an aggregate of multi-criteria ratings. With all this

    presumption, this method finds aggregation function f represent-

    ing the connection between overall and multi-criteria ratings as:

    r0 fr1; . . . ; rk 1

    In order to developing the idea of Adomavicius and Kwon [23],

    Sahoo et al. [48,49] extended the flexible mixture model (FMM)

    developedby Si andJin [50] to multi-criteria recommendersystems.

    The assumption of FMM is that two latent variables Zuand Zi(for

    customers and products) provide just one rating ur of user u on item

    i. They discovered the dependency framework of the overall rating

    (r0) and multi-criteria ratings (r0,r1,r2, and r4). Liu et al. [51]pre-

    sented a multi-criteria recommendation approach which was based

    on the clustering of users. Their idea was that for each user one of

    the criteria is dominant and users are grouped according to their

    criteriapreferences. Theyapplied linear least squaresregression, as-

    sign each user to one cluster, and evaluated different schemes for

    the generation of predictions. They applied the methods on hotel

    domain dataset with five criteria, Value, Location, Rooms, Service

    and Cleanliness. Zhang et al. [52] proposed two types of multi-crite-

    ria probabilistic latent semantic analysis algorithms extended from

    the single-rating version. First, the mixture of multi-variate Gauss-ian distribution was assumed to be the underlying distribution of

    multi-criteria ratings of each user. Second, they further assumed

    the mixture of the linear Gaussian regression model as the underly-

    ing distributionof multi-criteria ratings of eachuser, inspired by the

    Bayesian network and linear regression.

    Shambour and Lu [53] implemented a hybrid Multi-Criteria

    Semantic enhanced CF (MC-SeCF) approach to alleviate limitations

    such as sparsity and cold-start of the item-based CF techniques.

    The experimental results on MovieLens dataset demonstrated the

    effectiveness of their proposed approach in alleviating the sparsity

    and cold-start items problems. They achieved high accuracy and

    more coverage in very sparse and new items datasets than the

    benchmark item-based CF recommendation algorithms. In the pro-

    posed method for building a model using HOSVD and ANFIS, theexplicit ratings are needed. However, based on Nielsens 90-9-1

    principle[54] more people will lurk in a virtual community than

    will participate. Hence, with considering the Nielsens 90-9-1 prin-

    ciple, appropriate and domesticated strategies are required to be

    incorporated in multi-criteria CF such as developed method by

    Shambour and Lu[53]which uses semantic information of items.

    Generally, we view the MC-SeCF approach to be complementary

    to our method. An opportunity for future work is therefore to com-

    bine the predictions of such MC-SeCF approach with our method in

    a hybrid approach. With respect to the achieved improvements by

    Shambour and Lu [53], the major problems such as sparsity and

    cold-start can be remarkably alleviated. These can be suggestions

    that methods proposed by Shambour and Lu[53]and Kernel-SVD

    [55,56]combined with HOSVD can be incorporated into multi-cri-teria CF to address the sparseness problem.

    Jannach et al.[24]further developed the accuracy of multi-cri-

    teria CF by proposing a method using Support Vector Regression

    (SVR) for automatically detecting the existing relationships be-

    tween detailed item ratings and the overall ratings. In addition,

    the learning process of SVR models was per item and user and

    lastly combined the individual predictions in a weighted approach.

    Similar to our research, they evaluated their methods using

    Yahoo!Movie dataset.

    3. Materials and methods

    3.1. Higher Order Singular Value Decomposition (HOSVD)

    To represent and recognize high-dimensional data effectively,

    the dimensionality reduction is conducted on the original dataset

    for low-dimensional representation [57]. Visualizing, comparing,

    and decreasing processing time of data are the main advantages

    of dimensionality reduction techniques. HOSVD is one of the pow-

    erful dimensionality reduction techniques for tensor decomposi-

    tion proposed by Lathauwer et al. [58]. They proposed HOSVD as

    a generalization of the SVD that is used for tensors decomposition.

    For obtaining HOSVD calculations the following steps are

    needed:

    Step 1: Unfolding of the mode-dtensor T2 RI1...Id which yields

    matrices A(1),. . .,A(d). They are defined as:

    An 2j

    nin1In2In3 . . . IdI1I2 . . . Id1in2In3In4 . . . IdI1I2 . . . Id1

    IdI1I2 . . . Id1i1I2I3 . . . In1 in1;

    in 0; 1; . . . In 1 2

    The matrix unfolding of a tensor can be defined as matrix rep-

    resentations of that tensor in which all the column (row, etc.) vec-

    tors are stacked one after the other[58].

    In the case of 3rd-order tensors T2 RI1I2I3 , there exist three

    matrix unfolding (seeFig. 1) as:

    mode 1: j =i2+ (i31)I3,

    mode 2: j =i3+ (i11)I1,

    mode 3: j =i1+ (i21)I2.

    Step 2: Identifying the d left singular matrices as U(1),. . .,U(d)

    obtained by:

    An Un

    XnV

    n; n 1; . . . ; d 3

    In the Eq. (3), the matrices Un 2 RIn In and valuesPn 2 RInI1I2 ...In1 In1 ...Id stands for singular values in a diagonal ma-trix includes with descending order. The matrix V(n) stands for right

    singular matrices that V(n)TV(n)=I and U(n)TU(n)=I. These singular

    matrices are orthonormal.

    Step 3: Finding the S2 RI1I2...Id (core tensor) through con-tracting the left singular matrices U(n) with original tensorT:

    S T1U1T2U

    2TdUdT 4

    whereSia a as sub-tensors ofS2 RI1 I2...Id are found through fix-

    ing the nth index to a with ordering properties as:

    kSin 1kF rn1 P kSin 2kFr

    n2 P P kSin InkF

    rnIn P 0 5

    In Eq.(5), for all possible values ofn, rni kSin ikF(Frobenious

    norms) stands to the ithn-mode singular value of tensorT.Fig. 2

    shows a pseudo code for HOSVD algorithm.

    Procedure HOSVD (Input: Tensor T)

    For HOSVD the computation cost is calculated as shown inTable 1.

    M. Nilashi et al. / Knowledge-Based Systems 60 (2014) 82101 85

    http://-/?-
  • 7/21/2019 1-s2.0-S0950705114000173-main

    5/20

    3.1.1. Truncated HOSVD

    The truncated HOSVD is defined as a multi-rank approximation.

    The truncated HOSVD is taken as the first approximation of an iter-

    ative algorithm. The matrices and core tensor are updated itera-

    tively starting with Eq. (4). The algorithm stops when it ceases to

    improve the approximation or it reaches a maximum number of

    iterations[59]. This iterative method belongs to the family of alter-

    nating least-squares methods, and is called higher-order orthogo-

    nal iteration[58].

    According to Lathauwer et al. [58], for the determined decom-

    position by HOSVD, the following norm holds:

    kTk2FXR1i1

    r1i 2

    XRdi1

    rdi 2

    kSk2F 6

    where the n-rank ofS is indicated by Rn. Suppose Rn(16 n6 d) bethe n-mode rank of tensor T. A tensoreT can be defined throughholding the largest I0n of n-mode singular values and ignoring the

    remaining values. Thus, because of rank truncation, the error is

    bounded by Lathauwer et al.[58]:

    kT eTk 6Xdn1

    XR1i1F1 1

    rnin 2

    7

    In practice, using an analogous procedure demonstrated in

    Fig. 2, the rank-(R1,R2,R2,. . .,Rd) ofeS(truncated core tensor) canbe defined by using Rnleading singular eigenvectors in preference

    to keeping all left singular eigenvectors to build the transformation

    matrixeUn.

    3.2. k-Nearest Neighbor (k-NN) classifier

    k-Nearest Neighbor (k-NN) classifier is a well-known and pow-

    erful instance-based machine learning technique for classification

    data[60]. By learning from all sorted training instances, k-NN sim-

    ply can be applied to get results from training instances. Thek-NN

    algorithm consists of two phases: training phase and classification

    phase. In training phase, the training examples are vectors (each

    with a class label) in a multidimensional feature space. In this

    phase, the feature vectors and class labels of training samples are

    stored. In the classification phase, k is a user-defined constant

    (seeFig. 3), a query or test point (unlabelled vector) is classified

    by assigning a label, which is the most recurrent among thektrain-

    ing samples nearest to that query point. In other words, the k-NN

    method compares the query point or an input feature vector with

    a library of reference vectors, and the query point is labeled with

    the nearest class of library feature vector. This way of categorizing

    query points based on their distance to points in a training dataset

    is a simple, yet an effective way of classifying new points. One of

    the main advantages of thek-NN method in classifying the objects

    is that it requires only few parameters to tune:k and the distancemetric, for achieving sufficiently high classification accuracy. Thus,

    in k-NN based implementations, the best choice ofk and distance

    metric for computing the nearest distance is an important task.

    In k-NN classifier, the distance function usually is considered

    Euclidean distance when the input vectors and outputs are real

    numbers and discrete classes, respectively. In this study, we use

    Euclidean, City-Block and correlation distance metrics for distance

    calculation ink-NN.

    Assume x1,x2,. . .,xmx indicates the first row vectors and y1,y2,. . .,ymy indicates the second row vectors, the various distance

    metrics for measuring distance between xs and ytare defined as

    follows:

    dst ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiXnj1xsjytj2r 8

    Mode-3 Unfolding: 1 2 31I I I

    ( )

    =A Mode-3 Unfolding: 2 1 32I I I

    ( )

    =A Mode-3 Unfolding: 3 1 23I I I

    ( )

    =A

    Fig. 1. Unfolding of a 3rd-order tensor.

    Fig. 2. Procedure for decomposing tensors via HOSVD [59].

    Table 1

    Computational cost for main steps in HOSVD.

    Step N-dim

    Unfolding the tensor T O(I1I2. . .IN)

    ConstructingAnAnT O(I2I1I2. . .In1In+1...IN)

    DeterminingAnAnT

    to obtainU(n) O(I3)

    Contract tensorTwith matrices U(n) s to get tensorS O(I2I1I2. . .In1In+1...IN)

    86 M. Nilashi et al. / Knowledge-Based Systems 60 (2014) 82101

  • 7/21/2019 1-s2.0-S0950705114000173-main

    6/20

    dstXnj1

    jxsjytj 9

    dst 1 xs xsyt xt

    0ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffixs xsxs xs

    0p ffiffiffiffiffi ffiffiffiffiffiffi ffiffiffiffiffiffiffi ffiffiffiffiffiffiffi ffiffiffiffiffiffi ffiffiffiffiffiffi

    yt ytyt yt0

    pxs

    1

    n

    Xj

    xsj and yt1

    n

    Xj

    ysj

    10

    where Eqs.(8)(10)stand for Euclidean, City-Block and correlation

    distance metrics, respectively.

    3.3. Adaptive Neuro-Fuzzy Inference System (ANFIS)

    Soft computing techniques are known for their efficiency in

    dealing with complicated problems when conventional analytical

    methods are infeasible or too expensive, with only sets of opera-

    tional data available. Fuzzy logic (FL) and Fuzzy Inference Systems(FIS), first proposed by Zadeh[61], provide a solution for decision

    making based on vague, ambiguous, imprecise or missing data.

    FL represents models or knowledge using IFTHEN rules. A Neu-

    ro-Fuzzy system is functionally equivalent to a FIS. A FIS mimics

    a human reasoning process by implementing fuzzy sets and

    approximate reasoning mechanism which use numerical values in-

    stead of logical values. A FIS requires a domain expert to define the

    MFs and to determine the associated parameters both in the MFs,

    and the reasoning section [62,63]. However, there is no standard

    for the knowledge acquisition process. Thus, the results may be dif-

    ferent if a different knowledge engineer is at work in acquiring the

    knowledge from experts. A Neuro-Fuzzy system can replace the

    knowledge acquisition process by humans using a training process

    with a set of inputoutput training dataset. Thus instead of depen-dent on human experts, the Neuro-Fuzzy system will determine

    the parameters associated with the Neuro-Fuzzy system through

    a training process, by minimizing an error criterion. A popular Neu-

    ro-Fuzzy system is called an ANFIS. ANFIS is fuzzy system that uses

    Artificial Neural Network (ANN) theory to determine its properties

    (fuzzy sets and fuzzy rules)[6469]. It consists of five feed-forward

    layers as shown inFig. 4.

    The ANFIS is functionally equivalent to TakagiSugenoKang

    (TSK) fuzzy model. It can also express its knowledge in the IF

    THEN rule format as follow:

    Rule 1:IF In1is A1AND In2is B1THENf11=p11In1+q11In2+r11 Rule 2:IF In1is A1AND In2is B1THENf12=p12In1+q12In2+r12

    Rule 3:IF In1is A2AND In2is B2THENf21=p21In1+q21In2+r21 Rule 4:IF In1is A2AND In2is B2THENf22=p22In1+q22In2+r22

    where the parametersA1,A2,B1and B2determine labels for indicat-

    ing MFs for the inputs parameters In1 and In2, respectively. Also,

    parameterspij,qijand rij(i,j = 1, 2) denote parameters of the output

    MFs.

    InFig. 4, the layers in ANFIS perform the different action that is

    detailed as bellow:

    Layer 1: In this layer, membership grades are provided by nodes

    which are adaptive nodes. The outputs in this layer are obtained

    by:

    O1Ai lAi In1; i 1; 2

    O1Bj lBjIn2; j 1;211

    where appropriate MFs are indicated by Ai and Bj for the input

    parametersIn1and In2that can be defined as triangular, trapezoidal

    and Gaussian functions. The Gaussian type MFs for Ai and Bj MFs

    and input parameters In1and In2are defined as below:

    lAi In1;ri; ci exp In1ci

    2

    2r2i !; i 1; 2

    lBj In2;rj; cj exp In2cj

    2

    2r2j

    !; j 1;2

    12

    where the parameters of the MFs are defined as {ri,ci} and {rj,cj},

    governing the Gaussian functions. In this layer, ANFIS parameters

    stand usually as premise parameters.

    Layer 2: There are fixed number of nodes in the second layer,

    labeled with P. The outputs of the second layer can be defined

    as:

    O2ij Wij lAi In1lBj In2; i;j 1; 2 13

    where the symbol Wij is used here to represent weight.

    Layer 3: In this layer, very nodeilabeled withNdetermines the

    ratio of theith rules firing strength to the sum of all rules firingstrengths as:

    Fig. 3. k-NN fork = 8 and k = 5.

    Fig. 4. The structure of ANFIS.

    M. Nilashi et al. / Knowledge-Based Systems 60 (2014) 82101 87

    http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-
  • 7/21/2019 1-s2.0-S0950705114000173-main

    7/20

    O3ij Wij WijX2

    i1

    P2j1Wij

    ; i;j 1; 2 14

    where the output of this layer represents the normalized firing

    strengths.

    Layer 4: The nodes are adaptive nodes. The output of each node

    in this layer is simply the product of the normalized firingstrength and a first-order polynomial (for a first-order Sugeno

    model). Thus, the outputs of this layer are given by:

    O4ij Wijfij WijpijIn1qijIn2rij; i;j 1; 2 15

    where Wij is the output of layer 3, and {pij, qij, rij} is the parameter

    set.

    Layer 5: There is only one single fixed node labeled withR. This

    node performs the summation of all incoming signals. Hence,

    the overall output of the model is given by:

    Out O5 X2i1

    X2j1

    Wijfij X2i1

    X2j1

    WijpijIn1qijIn2rij

    X2

    i1X2

    j1

    Wijpij In1 Wijqij In2 Wijrij 16where the overall output out is a linear combination of the conse-

    quent parameters when the values of the premise parameters are

    fixed.

    3.4. Subtractive clustering

    The idea in TSK model is that each rule in a rule base indicates

    an area for a model, which can be linear [70]. The TSK rule struc-

    ture in a basic shape is as follows:

    If fx1 is A1;x2 is A2; . . . ;xk is Ak then y gx1;x2; . . . 17

    where sentences forming the condition are connected through the

    logical function f. The output y is obtained by gthat is a function

    of the inputs x i.

    In order to establish an effective TSK model of a process, using

    subtractive clustering for generating clusters of datais constructive.

    The main goal of using subtractive clustering as a cluster analyser is

    to partition the dataset into a number of homogeneous and natural

    subsets. The subtractive clustering method assumes each datapoint

    is a potential cluster center and calculates a measure of the likeli-

    hood that each data point would define the cluster center, based

    on the density of surrounding data points. By using it, the quantity

    of calculation is in proportion to thenumber of data points which is

    foreign to the dimensions of problem. However, while the actual

    cluster centers are not necessarily located at one of the data points,

    in most cases it is a good approximation, especially with the re-

    duced computation this approach requires[71]. In this method, a

    data point with the highest potential, which is a function of the dis-tance measure, is considered as a cluster center. The data points

    that are close to new cluster center are penalized in order to facili-

    tate the emergence of new cluster centers [72]. From the Eq.(18),

    the potential cluster center Pican be obtained at a data pointxias:

    Pi Xmj1

    exp kxixjk

    2

    ra2

    2 !

    18

    whereXi= [Xi1,Xi2,. . .,Xin] andXj= [Xj1,Xj2,. . .,Xjn] are data vectors

    for input and output dimensions, ra is a positive constant defining

    the neighborhood range of the cluster or simply the radius of hyper-

    sphere cluster in data space and |||| indicates the Euclidean dis-

    tance. ra is a critical parameter that determines the number of

    cluster centers or locations. The first cluster center is selected asthe c1 data point with the highest potential value, P

    c1. For the sec-

    ond cluster center, for determining the new density values, the re-

    sult of the first cluster center is subtracted as follows:

    Pi PiPc1

    exp kxixjk

    2

    rb2

    2 !

    ; rb gra 19

    whererb is a positive constant, which defines a neighborhood that

    has measurable reductions in density measure and g indicates a

    constant greater than 1 to control and avoid cluster centers beingin too close proximity[73]. From the Eq. (19), the potential mea-

    surement will be significantly reduced from data points near the

    first cluster center c1. Based on the larger potential value, the data

    pointc2is chosen for the second cluster center.

    Usually, after determining thekth cluster centerck, according to

    the Eq.(20),the potential is revised as:

    Pi PiPck

    exp kxixkk

    2

    rb2

    2 !

    20

    where Pk is the largest potential density value and ck denotes the

    location of the kth cluster center. After revising the density function,

    the next cluster center is selected as the point having the greatest

    density value. This process continues until a sufficient number ofclusters is attained at which all points lie within a loop belonging

    to a cluster center.

    4. Research methodology

    Fig. 5 shows the general framework of proposed method with

    combination HOSVD for dimensionality reduction and ANFIS com-

    bined with subtractive clustering for discovering knowledge from

    users ratings and predicting overall ratings.

    In the first step, we apply the HOSVD for dimensionality reduc-

    tion to reveal the latent associations among the components in the

    user-item-criteria tensor. Then, we perform cosine-based similar-

    ity for clustering to obtain groups of similar users and determine

    labels for clusters. Indeed, by this way high quality clusters are ob-tained that is necessary for developing efficient ANFIS model. Then,

    ANFIS is applied on clusters for extracting fuzzy rules and predict-

    ing null values in overall ratings. The main tasks of dimensionality

    reduction process are reducing the dimension and obtaining best

    approximation of data in the tensor of user preferences about

    items on multi aspects and finding users with similar preferences

    on items and criteria. Measuring the similarity for users based on

    their ratings on criteria provides the possibility of applying cluster-

    ing method. After applying clustering method that provides the

    classes of users with similar taste, ANFIS is used to extract knowl-

    edge (fuzzy rules) from determined clusters. To increase the accu-

    racy of rule-based system, reduce the amount of data in any class

    and minimize overfitting in the training data, subtractive cluster-

    ing is combined with ANFIS. Thus, the main steps in the proposedmethod for developing the model in the offline phase are:

    Step 1: HOSVD is applied on training data in 3-order tensor for

    dimensionality reduction to get the best approximation of rat-

    ing information.

    Step 2: The approximated data by HOSVD is used for clustering

    using cosine-based similarity. In fact, in this step, label for each

    vector of ratings is defined to be used in k-NN method in online

    phase.

    Step 3: ANFIS combined with subtractive clustering is used for

    training data in clusters obtained in the previous step for

    extracting fuzzy rules and forming rule clusters.

    Step 4: The fuzzy rules are used for predicting existing null val-

    ues of overall ratings in offline and online phases. It should benoted that for predicting the unknown overall ratings, we

    88 M. Nilashi et al. / Knowledge-Based Systems 60 (2014) 82101

  • 7/21/2019 1-s2.0-S0950705114000173-main

    8/20

    solved the sparsity problem in criteria using the neighborhood

    formation in any clustering. For predicting the unknown criteria

    ratings for the target item, we relied on a cosine-based similar-

    ity as a similarity measure which was performed on approxi-

    mated data obtained by HOSVD.

    After learning the model in the offline phase, in the online

    phase, the recommender system follows the recommendation

    and prediction tasks of multi-criteria CF recommender systems

    using the 3 main steps as:

    Step 1: Usingk-NN method, recommender system predicts the

    class label for new data.

    Step 2: Recommender system refers to the corresponding fuzzy

    rule cluster and predicts the overall rating for active user (see

    Section4.2for more detail).

    Step 3: After overall rating prediction, recommender system

    forms the neighbors using cosine similarity presented in Eq.

    (21)for active user from corresponding cluster and makes pre-dictions and Top-N recommendations.

    4.1. Clustering the experimental dataset using HOSVD and improving

    the scalability of multi-criteria CF

    The multi-criteria CF are needed to quickly produce high quality

    recommendations for very large-scale problems. In this paper, we

    address the performance issues by scaling up the neighborhood

    formation process through the use of dimensionality reduction

    techniques. Scalability is an issue in multi-criteria CF because ten-

    sor of data is composed of multiple dimensions and the dimension

    in itself can be very large. There is no doubt that clustering tech-

    niques reduces the sparsity and improves scalability of recom-mender systems: it does this by effectively partitioning the

    ratings database. Previous studies[74,75]have indicated the ben-

    efits of applying clustering in recommender systems. Using HOSVD

    and cosine-similarity approaches, we perform the clustering task

    in an effective way for multi-criteria CF.

    As discussed earlier, for recommendation task in multi-criteria

    CF, recommender systems deal with high-dimensional data and

    this phenomenon makes the computational cost extremely high

    and even non-feasible for traditional dimensionality reductiontechniques. Given the scalability challenge, in this paper, HOSVD

    is able to (1) factorize large tensors efficiently using much less time

    than standard methods, while at the same time and (2) obtain low-

    rank factors that preserve the main variance of the tensors. Thus,

    due to the dimensionality reduction, we can better form and pre-

    compute the neighborhood that leads the prediction generation

    be much faster in multi-criteria CF and this means that forming

    neighborhoods in the low dimensional eigenspace provided better

    quality and performance. In addition, after tensor decomposing by

    HOSVD, the clustering of data using cosine-based similarity is per-

    formed in an effective way and once the clustering is complete, the

    performance of multi-criteria CF can be very good, since the size of

    the group that must be analyzed is much smaller.

    For applying HOSVD, 3-dimensional data is stored in the 3-

    dimensional tensor A2 RI1I2I3 , whereby I1 corresponds to the

    numberof users,I2corresponds to the number of items which were

    rated and I3 is thenumber of used criteria. Each entry of thetensorA

    is a number between 1 and 13. Using HOSVD the tensorA2 RI1I2I3

    that contains the user ratings about items on four criteria was

    decomposed into A2 S1U2V3W in which U2 RI1I1 , V2 RI2I2

    andW2 RI3I3 are orthonormal matrices, and S2 RI1I2I3 is a core

    tensor which satisfies all-orthogonality and ordering properties.

    Similar to the truncated SVD for low-rank approximation and

    dimensionality reduction of matrices, low-rank approximation

    and dimensionality reduction of higher order tensors can be done

    by the truncated HOSVD (but with better approximation and com-

    putation), that is, take the firstr1columns ofU, the firstr2columns

    ofV, the first r3 columns ofW, andthe top-left r1r2r3 block ofS.

    In that direction, for dimensionality reduction for 3 dimensionsdataset, HOSVD is an effectivemethod. It is flexible to choosediffer-

    ent rfor different modeof a tensor. The sizeof the datagoes downto

    r1r2r3+I1r1+I2r2+I3r3 from I1I2I3, and ifr1=r2=r3 the size of the

    data goes down to r3+r(I1+I2+I3). If we flat the tensor into a

    I1I2I3matrix, the size of the data only goes down to R2+R(I1+I2I3). Therefore, result of the HOSVD decomposition on 3rd tensor of

    users ratings are the matricesU,VandWthatshow the relationsbe-

    tween user and user, item and item, and criterion and criterion,

    respectively. This decomposition is obtained without splitting the

    3-dimensional space into pair relations. For the sake of conciseness,

    in the followinga very simpleexamplewithonly4 users 6 items and

    4 criteriais demonstrated. Table 2 shows theuserrating foritems by

    users based on 4 criteria and its decomposition to thematrices U,V,

    W,S(:,:,1),S(:,:,2),S(:,:,3) andS(:,:,4) is shown inTable 3.

    Unew U2V3W1Sy1 0:5091 0:2729

    As can be seen in the Fig. 6, using cosine similarity in Eq.(21),

    the similar users to the new user can be found. The cosine similar-

    ity between two vectors A and B can be defined as:

    similarity cosA; B AB

    kAk kBk

    Pni1AiBiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPn

    i1Ai2

    q

    ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPni1Bi

    2q 21

    By applying this method, system is able to cluster data based on

    user similarity from matrixU. Sincek-NN predictor requires super-vised learning, cosine-based similarity is selected to obtain clusters

    Dimensionality Reducon Using

    HOSVD

    ClusteringCluster 1 Cluster n

    (Cluster 1) IF THEN (Cluster n) IF THEN Fuzzy Rules

    Database

    Overall Rangs Predicon

    Criteria kCriteria 1 Criteria 2 Overall Rang

    Mul-Criteria Dataset

    ANFIS Combined with Subtracve

    ClusteringExtracng Fuzzy Rules

    Fig. 5. Proposed model using ANFIS and HOSVD.

    M. Nilashi et al. / Knowledge-Based Systems 60 (2014) 82101 89

    http://-/?-http://-/?-
  • 7/21/2019 1-s2.0-S0950705114000173-main

    9/20

    from approximated data by HOSVD to provide the labels for them.

    From the truncated matrixU, the first row of the matrix is selectedand system does cosine similarities calculation through Eq. (21)

    with the second row, third row and so on, until it reaches the last

    row. The highest value of cosine similarities is clustered with the

    first row. Applying this method on rows, the system will obtain

    clusters with small number of similar users. With determining a

    specific number of clusters, system can combines the close clusters

    by calculating cosine-based similarity. Finally, after constructing

    the clusters, the system assigns the category label to the each vec-

    tor of users ratings. Similar to this procedure, we can obtain the a

    specific number of clusters from the matrixV.

    4.2. ANFIS architecture of proposed method and solving the sparsity

    problem in multi-criteria CF

    As discussed earlier, multi-criteria CF recommender systems

    suffer from the sparsity problem in two sides, missing values in

    the overall and criteria, and the system ought to predict these

    missing ratings with new approaches. In this paper, we solve the

    problem of sparsity in overall ratings using Neuro-Fuzzy system.

    Generating the proper MFs and extracting the fuzzy rules for the

    prediction of overall ratings are the main advantages of this meth-

    od that can be used in the online and offline phases. Because in the

    multi-criteria CF the overall ratings are based on users perception

    of items features, thus, we can better alleviate sparsity problem in

    the overall ratings using the generated MFs and fuzzy rules ob-

    tained from users preference on the items features. In addition,

    solving and alleviating the sparsity problem in multi-criteria CF

    recommender systems improves the predictive accuracy of these

    systems that has been proved in the prior researches[24,53]. Based

    on the experimental results, we will also demonstrate that pro-

    posed method significantly improve predictive accuracy of multi-

    criteria CF. Using ANFIS, we will see that prediction error in overall

    ratings is very low and even zero in many cases and this show the

    capability of ANFIS in alleviating sparsity problem in an exact and

    effective way.In this study, discovering the knowledge (fuzzy rules) from

    users ratings and generalizing the relationship Y=f(X1,X2,. . .,Xn)

    are the main goal of applying ANFIS for accurate prediction of

    overall ratings that accordingly lead to predictive accuracy

    improvement in multi-criteria CF. In this relationship,X1,X2,. . .,Xnstands for input variables and Ystands for output variable. In the

    current study, overall rating or user overall preference about items

    can be determined as a function of items features or criteria. Thus,

    we associate the Yvariable to the overall rating and X1,X2,. . .,Xnvariables to the criteria ratings. Predicting the relationship be-

    tween inputs and output is one of the important tasks that ANFIS

    does. Based on the experimental dataset, the input parameters of

    the ANFIS model under consideration are Acting (A), Directing

    (D), Story (S) and Visuals (V). Overall rating (O) stands for outputthat is defined as overall preference. These attributes naturally

    are vague, imprecise and incomplete fuzzy terms that lead to

    uncertainty in user interest about items features such as Acting,

    Story, Visuals and Directing. Thus, in ANFIS, they can be introduced

    and expressed by fuzzy linguistic values (uncertainty modeling)

    such as {cluster 1}, {cluster 2}, {cluster 3} and {cluster 4} that

    determine the domain of user interest of Acting, Directing, Story

    and Visuals in four regions using MFs. They are given in Fig. 7a

    and b for two inputs Visuals and Directing, respectively.

    The relationship between input variables (criteria) and outputs

    (overall rating) can be defined as

    Overall rating fActing;Directing; Story; Visuals 22

    In ANFIS models, the output relations are related to the inputs

    by mathematical relationships mapping using fuzzy rules. Fuzzy

    rules play important role in the ANFIS models and they are back-

    bone of such systems. The shape of fuzzy rules in ANFIS is defined

    as

    Table 2

    Multi-criteria ratings for 4 users and 6 items on 4 criteria.

    Items

    Ratings on criteria 1 Ratings on criteria 2

    Users 13 12 11 11 5 5 5 4 4 12 9 5

    11 1 11 12 5 5 11 11 11 11 11 10

    1 13 4 3 12 13 13 13 5 12 4 4

    1 1 0 5 4 5 13 13 13 12 13 13

    Ratings on criteria 3 Ratings on criteria 4

    5 11 11 10 10 10 11 5 9 4 5 3

    4 9 11 11 4 3 3 11 11 12 13 13

    11 5 11 11 4 11 9 3 9 3 11 4

    9 3 3 9 4 9 3 3 8 4 5 3

    New user ratings on four criteria C1,C2, C3and C4C1 C2 C3 C4

    I1 5 4 4 11

    I2 4 5 11 4

    I3 4 3 5 12

    I4 13 3 12 4

    I5 13 13 13 11

    I6 13 12 12 13

    Rule 1: IF A is A1 AND Dis B1 AND Sis C1 AND Vis D1 THEN f1=p1 A +q1D +r1S+t1V+ p1Rule 2: IF A is A2 AND Dis B2 AND Sis C2 AND Vis D2 THEN f2=p2 A +q2D +r2S+t2V+ p2

    Rule 3: IF A is An AND Dis Bn AND Sis Cn AND Vis Dn THEN fn=pn A +qn D +rn S+tn V+ pn

    90 M. Nilashi et al. / Knowledge-Based Systems 60 (2014) 82101

  • 7/21/2019 1-s2.0-S0950705114000173-main

    10/20

    For example, in this study, from the users ratings to movies,

    ANFIS by training vectors of users ratings in any clusters extracts

    the fuzzy rules such as

    IFthe Acting of a movie is cluster1 ANDDirecting is cluster1AND

    Story is cluster1ANDVisuals is cluster1THENthe Overall Rating is

    out1cluster1.

    According to the extracted fuzzy rules by ANFIS, the out1cluster1

    for overall rating is obtained from the MFs degree of 4 input variables.

    Also, using subtractive clustering in ANFIS, system improves the

    precision of extracted fuzzy rules obtained from users ratings to

    movies and minimizes the overfitting in training the data. It re-

    veals the users preferences about items features in soft clusters

    and divides the user preferences on items features in fuzzy clus-

    ters that system can predict exact relation between any criteria

    and overall rating.

    To illustrate a simple model of ANFIS applied on multi-criteriaCF, assume the system has two criteria S and Vand one output

    along with two fuzzy IFTHEN rules. Fig. 8 shows the first-order

    Sugeno FIS, the ANFIS model with two rules.

    InFig. 8,SandVindicate the crisp inputs related to nodeiandAiandBiimply the linguistic labels distinguished by appropriate MFs

    lAiand lBi , respectively. In this study, ANFIS uses the Guassian MF

    as

    lAi S e

    Sbi 2

    2a2i 23

    lAi V e

    Vbi 2

    2a2i 24

    Table 3

    Generated matrices after applying HOSVD on tensor of users ratings.

    S(:,:, 1) S(:,:, 2)

    77.91 0.81 0.44 0.85 0.09 0.87 0.52 1.43 2.56 0.55 0.92 2.47

    0.99 1.44 1.45 0.72 6.52 1.07 5.74 13.11 2.32 0.17 1.85 0.22

    0.17 1.65 3.25 2.19 0.13 1.60 15.63 2.16 3.84 6.91 1.11 3.32

    0.50 0.55 1.39 1.06 1.33 0.95 3.96 5.15 5.08 0.01 0.41 0.28

    S(:,:, 3) S(:,:, 4)

    0.19 2.27 4.17 6.40 0.21 2.11 0.18 0.64 2.73 0.08 3.50 3.54

    5.80 3.74 1.87 4.98 2.50 3.21 0.40 9.24 6.96 1.28 3.30 1.39

    3.88 6.62 0.73 2.12 4.56 0.98 4.60 1.49 3.76 2.71 0.16 1.64

    5.85 6.28 1.63 2.45 1.91 2.11 0.70 1.80 0.77 0.59 3.14 0.26

    Matrix U Matrix V

    0.49 0.30 0.65 0.51 0.44 0.74 0.17 0.47

    0.56 0.60 0.34 0.46 0.60 0.66 0.12 0.43

    0.51 0.67 0.29 0.46 0.50 0.06 0.46 0.73

    0.43 0.33 0.62 0.56 0.44 0.08 0.86 0.23

    Matrix W

    0.40 0.65 0.05 0.21 0.61 0.05

    0.40 0.29 0.84 0.20 0.08 0.03

    0.43 0.20 0.32 0.45 0.59 0.360.46 0.25 0.03 0.74 0.33 0.26

    0.38 0.45 0.36 0.25 0.12 0.66

    0.38 0.43 0.24 0.31 0.39 0.60

    -0.65 -0.6 -0.55 -0.5 -0.45 -0.4 -0.35 -0.3-0.8

    -0.6-0.4

    -0.2

    0

    0.2

    0.4

    0.6

    0.8

    First Column of U and V

    SecondColumnofUandV

    U1

    U2

    U3

    U4

    I1

    I2

    I3

    I4

    I5

    I6

    Unew

    Items

    New User

    Users

    Fig. 6. 2D graph of users and items.

    (a) (b)

    10 10.5 11 11.5 12 12.5 130

    0.2

    0.4

    0.6

    0.8

    1

    Input Variable " Visual"

    DegreeofMembership Cluster 3 Cluster 2 Cluster 1Cluster 4

    10 10.5 11 11.5 12 12.5 130

    0.2

    0.4

    0.6

    0.8

    1

    Input Variable "Directing

    DegreeofMembership Cluster 1

    Cluster 4

    Cluster 3 Cluster 2

    Fig. 7. Membership functions for (a) Visuals and (b) Directing.

    M. Nilashi et al. / Knowledge-Based Systems 60 (2014) 82101 91

    http://-/?-http://-/?-http://-/?-
  • 7/21/2019 1-s2.0-S0950705114000173-main

    11/20

    where {ai, bi, ci} is the parameter set of the MFs in the premise part

    of fuzzy IFTHEN rules that change the shapes of the MFs. Parame-

    ters in this layer are referred to as the premise parameters.

    From the ANFIS architecture shown inFig. 8, it can be observed

    that when the values of the premise parameters are fixed, the over-

    all output can be expressed as a linear combination of the conse-

    quent parameters. In symbols, the output O can be rewritten as:

    O w1

    w1w2f1

    w1w1w2

    f2 . . .wn

    wn1wnfn

    w1p1Aq1Dr1St1V p1 wnpnAqnD

    rnStnV pn

    w1Ap1 w1Dq1 w1Sr1 w1Vt1 w1p1

    wnApn wnDqn wnSrn wnVtn wnpn 25

    which is linear in consequent parameters p i, q i, ri, ti, and pi. Fig. 9

    shows the architecture of the implemented ANFIS that consist of

    four inputs, four rules, sixteen MFs for inputs and output.

    4.2.1. Training the ANFIS and model validation using checking and

    testing datasetIn this study, three set of data were used for ANFIS modeling as

    training, checking and testing data. ANFIS uses training data for

    constructing the model of target system. The rows of training data

    are used as inputs and outputs for construction the target model.

    Checking data is used for testing generalization capability of the

    FIS at each epoch that prevents over-fitting networks and verifies

    the identified ANFIS. Similar to the format of training data, the for-

    mats for the checking and testing data are defined data but gener-

    ally their elements are different from those of the training data.

    Any clusters obtained using HOSVD were divided into three

    groups. The first group of data including 80% of the total dataset

    of clusters was used for the training data and the second group

    of data including 10% of the total dataset of clusters was used for

    the checking data. The remaining 10% data of clusters was used

    for the testing data.

    5. Result and discussion

    In order to analyse the effectiveness of the proposed method,

    several experiments were conducted on Yahoo!Movies datasetprovided by Yahoo! Research Alliance Webscope program

    (http://webscope.sandbox.yahoo.com).

    On the Yahoo!Movies network, users could rate movies in 4

    dimensions (Story, Acting, Direction and Visuals) and assign an

    overall rating. Users used a 13-level rating scale for ratings. The

    four features for any movies were considered as: C1= Acting,

    C2= Story,C3= Visuals andC4= Directing. As can be seen inTable 4,

    all users ratings are measured in a value between 1 and 13 in

    quantitative scale.

    In the experimental dataset there are 257,317 tuples of rating in

    the original dataset with 127,829 users and 8272 movies. However,

    the resulting ratings tensor is extremely sparse, because many of

    the user-item-criteria entries are just empty fields. The sparsity

    level of dataset is about 97.57% (sparsity level = 1density = 1 (257,317 100)/(127,829 8272) = 0.9757). That means,

    not even 2.43% of all entries in the rating tensor are filled. Similar

    to the work by Jannach et al. [24], we pre-processed the datasets

    and created the test datasets with different density and quality

    levels and applied the proposed method on YM-20-20, YM-10-10,

    and YM-5-5. In this form, the description of dataset is presented

    inTable 5.

    5.1. Performance of HOSVD clustering

    Because HOSVD is quickly calculated, HOSVD is applied on the

    training tensor AeR15005004, which corresponds to the training

    set. As result an approximationeAa1;a2;a3 is retained. The valuesset for a1, a2 and a3 determine the dimensions of the core tensor.

    It should be noted that all the experiments in this study wereimplemented using MATLAB and on a Microsoft Windows operat-

    ing systems with Intel Core i5 processors having a speed of

    2.66 GHz and 4 GB RAM.

    For estimating the performance of HOSVD clustering for rank 2,

    4, 8, 12, 16, and 20 approximations, we adopt Silhouette coefficient

    [76]value as the standard measure for clustering quality and used

    it to determine the best cluster formation. The Silhouette coeffi-

    cient can assess the quality of a clustering. It is an internal index

    that measures how good the clustering fits the original data based

    on statistical properties of the clustered data. External indices, by

    contrast, measure the quality of a clustering by comparing it with

    an external (supervised) labeling. The Silhouette coefficient of an

    elementi of a clusterk is defined by the average distance a(i) be-

    tween i and the other elements of k (the intra-cluster distance),and the distance b(i) between i and the nearest element in the

    nearest cluster (is minimal inter-cluster distance).

    Fig. 8. Architecture of implemented ANFIS model for two inputs, one output and

    two rules.

    Fig. 9. Architecture of the implemented ANFIS.

    Table 4

    A sample of the multi-criteria dataset from the Yahoo!Movies.

    Movie ID User ID Directing Story Visual Acting Overall rating

    2 1 1 2 1 2 1

    13 13 11 13 13 13

    9 13 13 8 9 8

    . . . . . . . . . . . . . . . . . .

    . . . . . . . . . . . . . . . . . .

    13 2 13 13 13 13 13

    13 13 11 13 13 12

    12 13 13 13 12 13

    . . . . . . . . . . . . . . . . . .

    . . . . . . . . . . . . . . . . . .

    92 M. Nilashi et al. / Knowledge-Based Systems 60 (2014) 82101

    http://www.webscope.sandbox.yahoo.com/http://www.webscope.sandbox.yahoo.com/
  • 7/21/2019 1-s2.0-S0950705114000173-main

    12/20

    sci bi aimaxfai;big

    26

    which can be written as:

    sci

    1 ai=bi; if ai< bi

    0; if ai bi

    bi=ai 1; if ai> bi

    8>: 27An overall score for a set ofnkelements (a cluster or the entire

    clustering) is calculated by taking the average of the Silhouette

    coefficientssciof all elements i in the set. Thus,SCkcan be defined

    as

    SCk 1nk

    Xnki1

    sci 28

    The Silhouette coefficient takes values between 1 and 1. The

    closer to 1, the better the clustering fits the data. Table 6 lists a

    general rule of thumb for interpreting the Silhouette coefficient.

    Table 7 shows the average Silhouette Coefficient for HOSVD

    clustering for rank 2, 4, 8, 12, 16, and 20 approximations. According

    to theTable 7, the highest average Silhouette coefficient for HOSVD

    clustering obtained 0.867 for rank 10 approximation. This accuracy

    percentage is reasonably good. Based on observation, lower

    approximation ranks do better than the high approximation ranks.

    This supports our claim that truncated HOSVD gives better results.

    5.2. Evaluation of proposed ANFIS model

    After cluster analysis, the ANFIS model was applied on one of

    the clusters with maximum Silhouette coefficient. In that cluster,

    four fuzzy clusters have been determined for the given 190 users

    ratings in the third cluster generated by HOSVD method for rank

    approximation 12. The number of fuzzy rule set was equal to the

    number of cluster centers, each representing the characteristic of

    the cluster as given inTable 8.For evaluating the ANFIS model, several measures of accuracy

    were used to determine the model capability for predicting the

    overall rating. For this reason, the models were evaluated by four

    estimators Mean Absolute Percentage Error (MAPE), Root Mean

    Square Error (RMSE), Mean Absolute Error (MAE) and coefficient

    of determination (R2). These estimators are determined by

    MSE

    PnO1actualO predictionO

    2

    n 29

    MAPE

    PnO1actualO predictionO=actualO

    n 30

    R2 1

    PnO1actualO predictionO

    2PnO1actualO actualO

    2 31

    RMSE

    ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffiffiPnO1actualO predictionO

    2

    n

    s 32

    where actual (O) indicates the real overall rating provided by user,

    prediction (O) implies the predicted overall rating value and n cor-

    responds to the number of used users ratings.

    Usually, in the training process RMSE and MSE measure are

    used to test the prediction model, however, in this study, other

    performance measures were used to investigate for a more effec-

    tive performance evaluation that are coefficient of determination

    R2 and MAPE. The coefficient of determination R2 provides a valuebetween[1] about the training of the proposed network. A value

    closer to 1 stands for the success of learning. Also, in this study,

    MAPE was used that accurately identifies the model deviations.

    After implementing the ANFIS model using fuzzy logic toolbox

    in MATLAB 7.10.0 software, the training and checking data from

    the training and checking dataset were tested for error estimation.

    Data from four inputs was given to trained model of ANFIS along

    with actual overall ratings. From the inputs value, the suitable

    MFs (seeFig. 7(a) and (b)) were selected to predict the overall rat-

    ings using the extracted rules (see Table 8). From the fuzzy rule

    viewer of established ANFIS model shown in Fig. 10, the process

    of overall rating prediction by selecting the MFs can be better visu-

    alized. It indicates the behavioral of users over the change in values

    of all four inputs for overall rating. From the fuzzy rule viewerabove, when the input parameters of Acting is at 11, Directing at

    12, Story at 12, and Visuals at 11, an output of overall rating at

    12 is obtained.

    Table 9presents errors for a sample of training and checking

    dataset. As can be seen the error from nineteen samples in Table 9,

    ANFIS model has been trained effectively using training data.

    Table 5

    Information of Yahoo!Movies dataset.

    Name #Users #Items #Overall ratings

    YM-20-20 429 491 18,504

    YM-10-10 1827 1471 48,026

    YM-5-5 5978 3079 82,599

    Table 6

    Rule of thumb for the interpretation of the Silhouette coefficient.

    Range Interpretation

    >0.70 Strong structure has been found

    0.500.70 Reasonable structure has been found

    0.250.50 The str ucture is weak and could be artificial

  • 7/21/2019 1-s2.0-S0950705114000173-main

    13/20

    For subtractive clustering, the parameters were defined by a

    trial and error approach as: range of influence: accept ratio: 0.5, re-

    ject ratio: 0.15 and 0.5 and squash factor: 1.25. However, we could

    test the effect of the two variables raand rbthat represent a radius

    of neighborhood on the training, checking and test data for overall

    rating prediction error. The error was estimated in lowest value for

    therb= 1.5raand the results of varying rafrom 0.3 and 0.8 for the

    radius of neighborhood.Fig. 11presents the overall rating predic-tion error of checking and training for nineteen samples.

    In this study, the average error for checking data was equal to

    0.0001904. After 200 epochs, the averages RMSE, MSE, MAPE and

    R2 were calculated 0.02144, 0.00912, 0.18230 and 0.82460, respec-

    tively. The average error for training data was equal to

    0.000162221. After 200 epochs, the averages RMSE, MSE, MAPE

    and R2 were calculated 0.01272, 0.00912, 0.18230 and 0.99460,

    respectively. Also, after 200 epochs, the average error for testing

    data was equal to 0.000172361. The averages RMSE, MSE, MAPEand R2 were calculated 0.01951, 0.00949, 0.10230 and 0.91150,

    respectively. Average training and checking error after 200 epochs

    are shown inFig. 12.

    Fig. 13illustrates the interdependency of four inputs parame-

    ters and the overall rating obtained from the fuzzy rules generated

    by ANFIS combined with subtractive clustering through control

    surface. The level of overall rating can be depicted as a continuous

    function of its input parameters as Acting, Directing, Story and

    Visuals. The surface plots in this figure depict the variation of over-

    all rating based on identified fuzzy rules.Fig. 13(a) shows the inter-

    dependency of overall rating on Directing and Acting. Fig. 13(b)

    depicts interdependency of overall rating on Acting and Story.

    Fig. 13(c) shows interdependency of overall rating on Visuals and

    Acting. Fig. 13(d) depicts interdependency of overall rating on

    Story and Directing.Fig. 13(e) depicts interdependency of overall

    Fig. 10. Fuzzy rule viewer for input and output variables of ANFIS model.

    Table 9Training and checking errors for prediction overall ratings by ANFIS.

    S ample # Training d at a Training AN FI S out put Training e rror (%) Che ck ing da ta Che cking AN FI S outp ut Che cking e rror ( %)

    1 12 12 0 11 11.01 0.01

    2 10 10.0001 0.0001 12 12.009 0.009

    3 13 13 0 10 10.009 0.009

    4 12 12 0 12 12.0001 0.0001

    5 12 12 0 11 11.0008 0.0008

    6 13 13 0 11 11.009323 0.009323

    7 12 12 0 12 12.0004 0.0004

    8 13 13 0 12 12.00383 0.00383

    9 12 12 0 12 11.998 0.002

    10 12 12 0 13 12.999998 0.000002

    11 12 12 0 11 11.003 0.003

    12 10 10.0013 0.0013 11 11.0005 0.0005

    13 13 13 0 12 12.00276 0.00276

    14 12 12 0 12 12.0003 0.0003

    15 11 10.9999 -0.0001 12 11.9917 0.0083

    16 12 12 0 12 11.99346 0.00654

    17 12 12 0 11 10.99299 0.00701

    18 13 13 0 10 10.009 0.009

    19 12 12 0 11 10.9901 0.0099

    0 5 10 15 20

    -0.01

    -0.005

    0

    0.005

    0.01

    0.015

    Number of Samples

    Predictionerror

    Training Error

    Checking Error

    Fig. 11. Training and checking error for nineteen samples in the dataset.

    94 M. Nilashi et al. / Knowledge-Based Systems 60 (2014) 82101

  • 7/21/2019 1-s2.0-S0950705114000173-main

    14/20

    rating on Visuals and Directing andFig. 13(f) shows interdepen-

    dency of overall rating on Story and Visuals.

    These surface plots exactly show the users perception and

    behaviors on any two features of items in the cluster of users with

    similar preferences. In addition, the results depicted in the surfaceplots are valuable to reveal users behavior about items features in

    multi-criteria CF. Thus, theusers preferences in anycluster of users

    can be modeled by ANFIS and recommender system can recognize

    whichitem features(criteria)in which level is tailored to their pref-

    erences. Also,the several curves presented in Fig. 14(ad) reveal dis-

    tinctly the user behavior on any feature of items. As can be seen in

    these curves, there is a significant increase for overall rating versus

    Story criteria in relation to the other criteria. It can be inferred that

    Story criteria is most important for users in that cluster.

    5.3. Multi-criteria CF evaluation

    In this section, we completely focus on multi-criteria CF recom-

    mendation using proposed method. As mentioned before, we used

    k-NN for classification data and also we stated that selectingk and

    distance metric are important ink-NN method for accuracy of clas-

    sification. Therefore, in this study, the optimal distance metric and

    k were chosen using cross-validation [77]. Thus, classifier could

    accurately predict the testing data. Five-fold cross-validation

    method has been applied to choice the type of distance metric

    and bestk value.

    Using five-fold cross-validation approach, for valuesk = 1,k= 3,

    k= 5 andk= 7 and three different methods of calculating the near-

    est distance (Euclidean), correlation and City-Block, the result of

    averaged classification accuracy presented in Table 10. From

    Table 10, the highest averaged classification accuracy is obtained

    about 98.91% using Euclidean distance metric for k = 5 in compar-

    ison to the City-Block (95.89%) and Correlation (96.76%) distance

    metrics. Also, using Euclidean method, the averaged classificationrate is higher than Correlation and City-Block methods for all val-

    ues ofk. Thus, based on this result, we established the optimal va-

    lue 5 obtained using five-fold cross-validation and Euclidean for

    distance metric for classification accuracy.

    We determined the precision respectively the recall of the Top-

    N list of each element in the test set and build the arithmetic mean

    of these values. The recommenders prediction accuracy was mea-

    sured by RMSE[78], which is a widely used metric for evaluating

    the statistical accuracy of recommendation algorithms, given by

    RMSE

    ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffiffiffiffiffi1

    jXj

    Xui;oj 2X

    jaijpijj2

    s 33

    where X = {(ui, oj)|uihad ratedojin the probe set}. A lower value ofRMSE indicates a higher accuracy of the recommendation system.

    Table 11presents the RMSE obtained from proposed approach on

    YM-5-5 (each movie has at least 5 ratings), YM-10-10 (each movie

    has at least 10) ratings and YM-20-20 (each movie has at least 20

    ratings).Fig. 15shows the prediction accuracy for different neigh-

    borhood size on datasets YM-5-5, YM-10-10 and YM-20-20.

    To compare the proposed method with the HOSVD, truncated

    SVD and some stat-of-the-art approaches in multi-criteria CF, we

    employ the recall and precision metrics, which are widely used

    in recommender systems to evaluate the quality of recommenda-

    tions[79,80]. Precision is the ratio of relevant items recommended

    to total number of items recommended. Recall is the ratio of rele-

    vant items recommended to total number of relevant items that

    exist. The two measures are inversely related and are dependent

    on the length of the recommendation list. The longer the recom-

    mendation list, the easier it becomes to achieve high recall, but

    the more difficult it becomes to achieve good precision. The F mea-

    sure is the weighted harmonic mean that combines both precision

    and recall[24].

    Recall Number of correctly recommended itemsNumber of interesting items

    34

    PrecisionNumber of correctly recommended items

    Number of recommended items 35

    where items of interest to a customeru refer to products in the test

    set that were purchased byu, and correctly recommended items are

    items that match the items of interest. Although these measures are

    simple to compute and intuitively appealing, they are in conflict be-

    cause increasing the size of the recommendation set improves the

    recall at the expense of reducing the precision[8].

    The F1-metric [24,79], which combines precision and recall, is

    also widely used to evaluate the quality of recommendations. Spe-

    cifically, the trade-off between precision and recall is balancedusing this measure by assigning equal weights to both metrics.

    Therefore, we use the F1-metric in our evaluation, as shown in

    Eq.(36).

    F12 Recall Precision

    Recall Precision 36

    We ran the experiments on datasets YM-10-10 and YM-20-20

    datasets forNequal 1, 5, 7, 15, 25, 35 and 40, where Nis the num-

    ber of items to be recommended by the Top-N recommender

    systems.

    From all the two F1 curves inFigs. 16 and 17, we can notice that

    the proposed method gives high level of accuracy when the size of

    neighbors is increased versus the Top-N recommendation. This

    outcome demonstrates the significance of combining HOSVDmethod and ANFIS with subtractive clustering for overcoming

    Fig. 12. The error of each observation for checking and training data.

    M. Nilashi et al. / Knowledge-Based Systems 60 (2014) 82101 95

    http://-/?-http://-/?-http://-/?-http://-/?-
  • 7/21/2019 1-s2.0-S0950705114000173-main

    15/20

    the problems connected to the multi-criteria CF. The results above

    clearly reveal that the proposed method gives better result for YM-

    20-20. In theFigs. 16 and 17, the significant changes in accuracymeasured by F1 between neighbor size 15 and 25 indicates that

    high accuracy is obtained for large neighborhood compared with

    the small neighborhood. These outcomes according to the experi-

    ments are related to result of clustering and extracting fuzzy rulesfrom YM-20-20 and YM-10-10 datasets.

    Fig. 13. Interdependency of overall rating on (a) Directing and Acting, (b) Acting and Story, (c) Visuals and Acting, (d) Story and Directing, (e) Visuals and Directing, and (f)

    Story and Visuals.

    96 M. Nilashi et al. / Knowledge-Based Systems 60 (2014) 82101

  • 7/21/2019 1-s2.0-S0950705114000173-main

    16/20

    In order to compare the proposed method with previous work

    [23,24,52], we also evaluated our approach on the YM-10-10 using

    an additional set of metrics. In theTable 12, we report Precision@5

    and Precision@7 values as well as the Mean Absolute Error (MAE).

    We also performed SVD and HOSVD techniques without using AN-

    FIS with subtractive clustering on YM-10-10 and YM-20-20 data-

    sets; the results are presented in Table 13.

    The MAE is determined as the average absolute deviation be-tween predicted ratings and true ratings shown in Eq. (37).

    MAEpred; act XNi1

    predu;iactu;i

    N

    37whereNis the number of items on which a useru has expressed an

    opinion.

    From the results, we can find that the precision at Top-5 and

    Top-7 of the proposed method outperforms the algorithms in the

    previous work and methods using solely HOSVD and SVD.

    In order to compare proposed method with MC-SeCF developed

    by Shambour and Lu[53]evaluated on MovieLens dataset, we also

    evaluated our approach on YM-20-20 and YM-10-10 using MAE

    metric for different neighborhood size. The MAE comparison isshown inFig. 18. When looking the curves in this figure, the signif-

    icant improvement in recommendation accuracy is obtained in the

    large neighborhood sizes. We present the recommendation accu-

    racy using MAE for YM-20-20 and YM-10-10 in Table 14. From

    Fig. 18andTable 14, it can be observed that the proposed method

    slightly better improves recommendation accuracy on the YM-20-

    20 for large neighborhood sizes. Compared with the MC-SeCF, the

    MAE values for our method is slightly higher than the MAE values

    of MC-SeCF. However, quite interestingly, on the YM-20-20 data-

    set, better recommendation accuracy is obtained by our method.

    This indicates that on the on the YM-20-20, the accuracy is rela-

    tively high because of discovering better fuzzy rules. Also, as men-

    tioned earlier, using semantic information of items can be

    incorporated to the multi-criteria CF for obtaining more accuraterecommendations.

    Fig. 14. Curves for revealing the relationship between overall rating and (a) Visuals, (b) Directing, (c) Story, and (d) Acting.

    Table 10

    Averaged classification accuracy for distance metrics and values ofk .

    Distance metric k= 1 k= 3 k= 5 k= 7 k= 9

    Euclidean 96.63 97.34 98.91 97.67 95.56

    City block 94.72 94.87 95.89 93.89 93.73

    Correlation 95.28 95.38 96.76 94.88 94.87

    Table 11

    Coverage and RMSE for YM-5-5, YM-10-10 and YM-20-20.

    Size of neighborhood Dataset

    YM-5-5 YM-10-10 YM-20-20

    RMSE RMSE RMSE

    5 0.551097 0.5365 0.53158

    10 0.549707 0.5310 0.52988

    15 0.544308 0.5289 0.52039

    20 0.538909 0.5200 0.51558

    25 0.530209 0.5184 0.51129

    30 0.528609 0.5174 0.50349

    5 15 25 35 45 550.5

    0.51

    0.52

    0.53

    0.54

    0.55

    0.56

    0.57

    0.58

    Neighborhood Size

    RMSE

    YM 5-5

    YM 10-10

    YM 20-20

    Fig. 15. RMSE and neighborhood size.

    M. Nilashi et al. / Knowledge-Based Systems 60 (2014) 82101 97

    http://-/?-http://-/?-http://-/?-http://-/?-
  • 7/21/2019 1-s2.0-S0950705114000173-main

    17/20

    Also, according to the rank 12 approximation defined for HOS-

    VD decomposition, we employed the precision on different number

    of clusters. For rank 12 approximation, the defined number of clus-

    ters was changed iteratively: starting from the 3 clusters, after

    each iteration number of clusters was increased by 3 until 12.

    Fig. 19 illustrates the precision value for Precision@5 and Preci-

    sion@7 versus number of clusters.

    As can be seen inFig. 19, the worst precision is obtained for YM-10-10 at precision@5 in the third cluster and the best precision is

    achieved for YM-20-20 at Precision@5 in the twelfth cluster. This

    result demonstrates for YM-10-10 and YM-20-20 that the preci-

    sion is increased with increasing the number of clusters.

    To experimentally show the effectiveness of clustering using

    HOSVD and cosine-based similarity, we performed the experi-

    ments on similarity-based approach developed by Adomavicius

    and Kwon [23] and compared with the proposed method. They

    proposed different potential ways to calculate the similarity be-

    tween users based on their criteria ratings. It should be noted that

    Chebyshev distance metric performed best among their similarity-

    based approaches.

    Fig. 20presents the performance results of our experiments for

    proposed method and similarity-based approach using Chebyshev

    distance metric. The throughput is plotted as a function of thecluster size demonstrated in Fig. 20. We define throughput of a

    1 5 7 15 25 35 400.77

    0.78

    0.79

    0.80.81

    0.82

    0.83

    0.84

    0.85

    0.86

    Top-N

    F

    measure

    5 Neighbors

    15 Neighbors

    25 Neighbors

    35 Neighbors

    Fig. 16. F1 measure and Top-N recommendation for YM-10-10.

    1 5 7 15 25 35 400.76

    0.78

    0.8

    0.82

    0.84

    0.86

    Top-N

    Fmeasure

    5 Neighbors

    15 Neighbors

    25 Neighbors

    35 Neighbors

    Fig. 17. F1 measure and Top-N recommendation for YM-20-20.

    Table 12

    MAE, precision at Top-5 and Top-7 of proposed method HOSVD and truncated SVD for

    YM-10-10 (neighborhood size: all users).

    Algorithm Precision@5 Precision@7 MAE

    HOSVD 75.34 72.85 1.17

    Truncated SVD 74.03 72.19 1.75

    HOSVD-ANFIS and subtractive

    clustering

    81.44 80.78 0.96

    Table 13

    MAE, precision at Top-5 and Top-7 for proposed method, HOSVD and truncated SVD

    for YM-20-20 (neighborhood size: all users).

    Algorithm Precision@5 Precision@7 MAE

    HOSVD 78.57 76.43 0.95

    Truncated SVD 75.12 73.21 1.45

    HOSVD-ANFIS and subtractive

    clustering

    83.34 81.32 0.91

    10 20 30 40 50 70 900.66

    0.67

    0.68

    0.69

    0.7

    0.71

    0.72

    0.73

    Size of Neighbors

    MAE(%)

    YM-10-10

    YM-20-20

    Fig. 18. Recommendation accuracy for different neighborhood sizes on YM-20-20

    and YM-10-10.

    Table 14

    Recommendation accuracy using MAE for different neighborhood size.

    Neighborhood size MAE(%) YM-10-10 MAE(%) YM-20-20

    10 0.7325 0.7105

    20 0.7370 0.7088

    30 0.7249 0.7093

    40 0.7260 0.7015

    50 0.7184 0.6902

    70 0.7112 0.6724

    90 0.7180 0.6688

    4 6 8 10 12

    0.72

    0.74

    0.76

    0.78

    0.8

    0.82

    0.84

    0.86

    0.88

    Number of Clusters

    Precision

    Precision @5 YM-20-20

    Precision @7 YM-20-20

    Precision @5 YM-10-10

    Precision @7 YM-10-10

    Fig. 19. Precision versus number of clusters in Precision@5 and Precision@7 for

    different dataset.

    98 M. Nilashi et al. / Knowledge-Based Systems 60 (2014) 82101

  • 7/21/2019 1-s2.0-S0950705114000173-main

    18/20

    multi-criteria CF recommender system as the number of recom-

    mendations generated per second for k selected user (k= 5). From

    the curves in this plot, we see that using the HOSVD and cosine-based approaches for clustering the high-dimensional data, the

    throughput is substantially higher than the multi-criteria CF based

    on similarity-based approach. This is due to the reason that with

    the clustered approach using HOSVD and cosine-based similarity

    the prediction algorithm uses a fraction of neighbors. The through-

    put of multi-criteria recommender system increases rapidly with

    the increase in the number of clusters with the small sizes. Since

    the multi-criteria CF based on similarity approach has to scan

    through all the neighbors, the number of clusters does not impact

    the throughput.

    We also evaluated the recommendation quality using coverage

    measures. Coverage measures the percentage of items for which a

    CF system can provide a prediction or that ever appear in a recom-

    mendation list[81]. It should be noted that a recommender systemmaintains a good level of coverage so that most of the items are

    connected in some way to the rest of the data, otherwise they will

    be isolated and essentially dormant in the system.

    The curves shown inFig. 21present the quality of the recom-

    mendation of proposed method and reveals that the coverage is

    strongly related to the neighborhood size. Table 15 presents the

    coverage obtained from the proposed method. To experimentally

    show the effectiveness of clustering using HOSVD and cosine-

    based similarity on coverage, we also performed the experiments

    on similarity-based approach as presented in Table 15.

    From theTable 15, the proposed method maintains a good level

    of coverage in relation to the similarity-based approach on differ-

    ent neighborhood sizes. In addition, the results also confirm that

    proposed method and similarity-based approach have good cover-

    age on YM-20-20.

    6. Conclusion and future work

    In this paper, a new method was proposed using a combination

    of HOSVD and ANFIS combined with subtractive clustering to im-

    prove the recommendation quality and predictive accuracy of mul-

    ti-criteria CF. We proposed this method for overcoming the

    existing shortcomings such as predicting the overall ratings, spar-

    sity, scalability and uncertainty induced from vagueness and

    imprecision in representing and reasoning items features in mul-

    ti-criteria CF.

    Using HOSVD, we reduced the noise of high-dimensional data

    effectively and improved the scalability problem. Also, by HOSVD,

    we considered all factors in the third-order tensor of user, item and

    criteria all together to reveal latent relationships between them.

    The results of applying HOSVD method on the high-dimensional

    dataset assist us to have clusters with high quality using cosine-based similarity. In addition, tensor decomposition using HOSVD

    on the experimental dataset demonstrated its advantages in case

    of dimensionality reduction in more than two dimensions for

    obtaining favorable approximation of information. From the exper-

    iments, we observed that proposed method using HOSVD and AN-

    FIS achieves better recommendation accuracy in relation to the

    algorithms in the previous work and methods using solely SVD

    and HOSVD.

    The experimental results on movie dataset clearly demon-

    strated the capability of ANFIS modeling using MFs and fuzzy rules

    without the human expert intervention in multi-criteria CF. Be-

    sides, the model of ANFIS combined with subtractive was used to

    extract knowledge from user ratings and preferences on items fea-

    tures. This was done by incorporating the element of training intothe existing Neuro-Fuzzy system. Furthermore, with the training

    data of ANFIS, the rules and the MFs were properly tuned to predict

    the unknown overall ratings for alleviating the sparsity problem

    which have advantages in terms of the simplicity of the algorithm

    and the speed of the training convergence. Moreover, users ratings

    on items in multi-criteria CF are accumulated overtime and fuzzy

    rules can be amended and maintained in rules database for predic-

    tion tasks. The advantage of this method is its flexibility and

    extendibility in which can be developed for any number of dimen-

    sions and criteria/features the dataset.

    We analysed the predictive accuracy of proposed method on a

    real-world dataset in the domain of movie recommendation pro-

    vided by Yahoo!Movie. We used the popular measurement met-

    rics: the F1, RSME, MAE and the coverage. The proposed methodwas evaluated in cases of MAE, Precision@5 and Precision@7 using

    3 6 9 12

    500

    1000

    1500

    2000

    2500

    3000

    3500

    4000

    4500

    5000

    Number of clusters

    Throughput

    (Recs./Sec)

    Similarity-Based Approach YM-20-20

    Similarity-Based Approach YM-10-10

    HOSVD and Cosine-Based YM-20-20

    HOSVD and Cosine-Based YM-10-10

    Fig. 20. Throughput of proposed method versus similarity-based approach.

    5 15 25 35 45 55

    0.992

    0.994

    0.996

    0.998

    1

    1.002

    Neghiborhood Size

    Coverage

    YM 20-20

    YM 10-10

    YM 5-5

    Fig. 21. Neigh