[IEEE 2010 10th Symposium on Neural Network Applications in Electrical Engineering (NEUREL 2010) -...

5
149 978-1-4244-8820-9/10/$26.00 ©2010 IEEE Abstract—Nonnegative tensor factorization (NTF) is a recent multiway (multilinear) extension of negative matrix factorization (NMF), where nonnegativity constraints are mainly imposed on CANDECOMP/PARAFAC model and recently, also, on Tucker model. Nonnegative tensor factorization algorithms have many potential applications, including multiway clustering, multi-sensory or multidimensional data analysis and nonnegative neural sparse coding. In this paper we will present new approach to NTF which is based on CANDENCOMP/PARAFAC model. The proposed method is simple, computationally effective, easily extensible to higher dimensional tensors, can handle some problems related to rank-deficient tensors and can be used for analysis of the higher dimensional tensors than most of the known algorithms for NTF. Index Terms—nonnegative tensor factorization, PARFAC model I. INTRODUCTION ULTIWAY data analysis captures multilinear structures in higher-order datasets, where data have more than two modes. Standard two-way methods commonly applied on matrices often fail to find the underlying structures in multiway arrays. Multiway data analysis, originating in psychometrics back in the sixties [1], is the extension of two-way data analysis to higher-order datasets. Multiway analysis is often used for extracting hidden structures and capturing underlying correlations between variables in a multiway array. For example, multi- channel electroencephalogram (EEG) data are commonly represented as an m x n matrix containing signals recorded for m time samples at n electrodes. In order to discover hidden brain dynamics, often frequency content of the signals, for instance signal power at p particular frequencies, also needs to be considered. In that case, EEG data can be arranged as an m x p x n three-way dataset [2]. Multiway analysis of a three-way EEG array then enables us to extract the signatures of brain dynamics in time, frequency and electrode domains. As opposed to two-way analysis showing brain activities at certain time periods at certain electrodes, multiway analysis can differentiate between the brain M. V. Jankovic is on leave from the Institute of Electrical Engineering “Nikola Tesla”, 11000 Belgrade, Serbia (phone: +381-3690-548; fax: +381-3691-423; e-mail: elmarkoni@ ieent.org). B. Reljin, is with the School of Electrical Engineering, Belgrade University, Kralja Aleksandra 73a, 11000 Belgrade, Serbia. (e-mail: [email protected]). activities with different spectral signatures. It has been shown in numerous research areas including social networks [3], neuroscience [4], process analysis [5], that underlying information content of the data may not be captured accurately or identified uniquely by two-way analysis methods. Two-way analysis methods, e.g. factor models, suffer from rotational freedom unless specific constraints such as statistical independence, orthogonality, etc. are enforced. On the other hand, these constraints requiring prior knowledge or unrealistic assumptions are not often necessary for multiway models. For example, in fluorescence spectroscopy, a Parallel Factor Analysis (PARAFAC) model can uniquely identify the pure spectra of chemicals from measurements of mixtures of chemicals. Consequently, multiway analysis with advantages over two- way analysis in terms of uniqueness, robustness to noise, ease of interpretation, etc. has been a popular exploratory analysis tool in a variety of application areas. II. MULTIWAY ARRAYS First difference between two-way and multiway data analysis is the format of the data being analyzed. Multiway arrays, often referred to as tensors, are higher-order generalizations of vectors and matrices. Higher-order arrays are represented as 1 2 N I xI xI X R , where the order of X is N (N > 2) while a vector and a matrix is an array of order 1 and 2, respectively. Higher-order arrays have a different terminology compared to that of two-way datasets. Each dimension of a multiway array is called a mode (or a way) and the number of variables in each mode is used to indicate the dimensionality of a mode. For instance, 1 2 N I xI xI X R is a multiway array with N modes (called N-way array or Nth order tensor) with I 1 , I 2 , ... dimensions in the first, second, etc. mode, respectively. Each entry of X is denoted by x i1i2…iN . The higher order analogue of matrix rows and columns are called fibers. A matrix column is a mode-1 fiber and a matrix row is a mode-2 fiber. For a third order tensor ( 1 2 3 I xI xI X R ), we have column, row, and tube fibers, which are denoted by x :jk , x i:k , and x ij: , respectively; see Figure 2. Then x i1i2i3 denotes the entry in the i 1 -th row, i 2 -th column and i 3 -th tube of X. For orders higher than three, the fibers no longer have special names. We always assume that fibers are column vectors. When an index is fixed in one mode and the indices vary in the two other modes, this data partition is called a slice (or a slab) in higher-order terminology. For example, when the Nonnegative Contraction/Averaging Tensor Factorization Marko V. Jankovic, Senior Member, IEEE, and Branimir Reljin, Senior Member, IEEE M

Transcript of [IEEE 2010 10th Symposium on Neural Network Applications in Electrical Engineering (NEUREL 2010) -...

149 978-1-4244-8820-9/10/$26.00 ©2010 IEEE

Abstract—Nonnegative tensor factorization (NTF) is a recent

multiway (multilinear) extension of negative matrixfactorization (NMF), where nonnegativity constraints are mainly imposed on CANDECOMP/PARAFAC model and recently, also, on Tucker model. Nonnegative tensor factorization algorithms have many potential applications, including multiway clustering, multi-sensory or multidimensional data analysis and nonnegative neural sparse coding.

In this paper we will present new approach to NTF which is based on CANDENCOMP/PARAFAC model. The proposed method is simple, computationally effective, easily extensible to higher dimensional tensors, can handle some problems related to rank-deficient tensors and can be used for analysis of the higher dimensional tensors than most of the known algorithms for NTF.

Index Terms—nonnegative tensor factorization, PARFAC model

I. INTRODUCTION

ULTIWAY data analysis captures multilinear structures in higher-order datasets, where data have

more than two modes. Standard two-way methods commonly applied on matrices often fail to find theunderlying structures in multiway arrays. Multiway data analysis, originating in psychometrics back in the sixties [1], is the extension of two-way data analysis to higher-order datasets. Multiway analysis is often used for extracting hidden structures and capturing underlying correlations between variables in a multiway array. For example, multi-channel electroencephalogram (EEG) data are commonly represented as an m x n matrix containing signals recorded for m time samples at n electrodes. In order to discover hidden brain dynamics, often frequency content of the signals, for instance signal power at p particular frequencies, also needs to be considered. In that case, EEG data can be arranged as an m x p x n three-way dataset [2]. Multiway analysis of a three-way EEG array then enables us to extract the signatures of brain dynamics in time, frequency and electrode domains. As opposed to two-way analysis showing brain activities at certain time periods at certain electrodes, multiway analysis can differentiate between the brain

M. V. Jankovic is on leave from the Institute of Electrical Engineering “Nikola Tesla”, 11000 Belgrade, Serbia (phone: +381-3690-548; fax: +381-3691-423; e-mail: elmarkoni@ ieent.org).

B. Reljin, is with the School of Electrical Engineering, Belgrade University, Kralja Aleksandra 73a, 11000 Belgrade, Serbia. (e-mail: [email protected]).

activities with different spectral signatures. It has been shown in numerous research areas including

social networks [3], neuroscience [4], process analysis [5], that underlying information content of the data may not be captured accurately or identified uniquely by two-way analysis methods. Two-way analysis methods, e.g. factor models, suffer from rotational freedom unless specific constraints such as statistical independence, orthogonality, etc. are enforced. On the other hand, these constraints requiring prior knowledge or unrealistic assumptions are not often necessary for multiway models. For example, in fluorescence spectroscopy, a Parallel Factor Analysis (PARAFAC) model can uniquely identify the pure spectra of chemicals from measurements of mixtures of chemicals. Consequently, multiway analysis with advantages over two-way analysis in terms of uniqueness, robustness to noise, ease of interpretation, etc. has been a popular exploratory analysis tool in a variety of application areas.

II. MULTIWAY ARRAYS

First difference between two-way and multiway data analysis is the format of the data being analyzed. Multiway arrays, often referred to as tensors, are higher-order generalizations of vectors and matrices. Higher-order arrays are represented as 1 2 NI xI xIX R∈ � , where the order of X is N (N > 2) while a vector and a matrix is an array of order 1 and 2, respectively. Higher-order arrays have a different terminology compared to that of two-way datasets. Each dimension of a multiway array is called a mode (or a way) and the number of variables in each mode is used to indicate the dimensionality of a mode. For instance, 1 2 NI xI xIX R∈ � is a multiway array with N modes (called N-way array or Nth order tensor) with I1, I2, ... dimensions in the first, second, etc. mode, respectively. Each entry of X is denoted by xi1i2…iN . The higher order analogue of matrix rows and columns are called fibers. A matrix column is a mode-1 fiber and a matrix row is a mode-2 fiber. For a third order tensor ( 1 2 3I xI xIX R∈ ), we have column, row, and tube fibers, which are denoted by x:jk, xi:k, and xij:, respectively; see Figure 2. Then xi1i2i3 denotes the entry in the i1-th row, i2-th column and i3-th tube of X. For orders higher than three, the fibers no longer have special names. We always assume that fibers are column vectors.

When an index is fixed in one mode and the indices vary in the two other modes, this data partition is called a slice (or a slab) in higher-order terminology. For example, when the

Nonnegative Contraction/Averaging Tensor Factorization

Marko V. Jankovic, Senior Member, IEEE, and Branimir Reljin, Senior Member, IEEE

M

150

ith row of X is fixed, then it is a horizontal slice of size I2 x I3 or similarly, if the jth column of X is fixed, it is a lateral slice of size I1 x I3, etc.

III. MODELS

We have briefly introduced the type of data being an analyzed by multiway analysis techniques. These types of data require extensions to analysis methods already available for two-way data analysis. In general, multiway data analysis methods are generalizations of two-way analysis techniques based on the idea of factor models.

A model, which is an approximation of data, consists of two parts: a structural part describing the structure in data and a residual part expressing the part of the data, which cannot be captured by the structural part. Using bilinear or multi-linear models, factors (or components, loadings), which are linear combinations of variables, are extracted. These factors are later used to interpret the underlying information content of the data.

While most multiway analysis techniques preserve the multiway nature of the data, some techniques such as Tucker1 [1, 6] are based on matricization of a multiway array, which means transforming a third or higher-order array into a two-way dataset. Matricization (or unfolding, flattening) has multiple definitions in literature [7, 8] but the definition in [8] is commonly followed. Once a three-way array is flattened and arranged as a two-way dataset, two-way analysis methods, e.g. Singular Value Decomposition (SVD) [10] and other factor models [9], can be employed in understanding the structure in data.

Rearranging multiway arrays as two-way datasets andanalyzing them with two-way methods, though, may result in information loss and misinterpretation especially if the data are noisy. An intuitive example is often given on a sensory dataset, where eight judges evaluate ten breads based on eleven attributes [15]. When this dataset is modeled using a PARAFAC model, the model assumes that there is a common sense of evaluation among judges and each judge pertains to this sense of evaluation at different amounts. On the other hand, when sensory data are unfolded in bread mode and modeled using a two-way factor model, there is no assumption being made about a common sense of evaluation. Every judge may behave completely independently. In such a scenario, a two-way factor model may extract as many factors as possible to explain the variation in data. However, a PARAFAC model can only explain the variation that follows the basic assumption. Extra variation captured by a two-way factor model might actually explain noise rather than a certain structure. Thus, multiway models are more advantageous in terms of interpretation and accuracy compared to two-way models. Multilinear models (i.e. PARAFAC [11], Tucker [1, 6] and their derivatives) capture the multilinear structure in data. Multilinearity of the model denotes that the model is linear in each mode and factors extracted from each mode are linear combinations of the variables in that mode. A component matrix, whose columns are the factors determined by the model, is then constructed to summarize the structure in each mode. These

models have been applied on various datasets shown to contain multilinear structure, e.g. three-way fluorescencenspectroscopic datasets with modes: samples x emission x excitation [13] or wavelet-transformed multi-channel EEG arranged as a three-way array with modes: frequency x time samples x electrodes [2, 12].

IV. THE PARAFAC DECOMPOSITION

PARAFAC [11] is an extension of bilinear factor models to multilinear data. It is based on Cattell's principle of Parallel Proportional Profiles [14]. The idea behind Parallel Proportional Profiles is that if the same factors are present in two samples under different conditions, then each factor in the first sample is expected to have the same pattern in the second sample but these patterns will be scaled depending on the conditions. Mathematically, a PARAFAC model can be represented as the decomposition of a tensor as a linear combination of rank-1 tensors (An Nth order rank-1 tensor is a tensor that can be written as the outer product of N vectors).

The PARAFAC decomposition of 1 2 NI xI xIX R∈ � is given by

(1) (2) ( ), , , .NX A A A� �= � ��

Here ( ) nI xRnA R∈ , for n = 1, ..., N. PARAFAC decomposition can be also expressed as:

• Elementwise:

1 2 1 2

(1) (2) ( )

1,

N N

RN

i i i i r i r i rr

x a a a=

= ⋅ ⋅ ⋅�� �

• Sum of outer products:(1) (2) ( ): : :

1,

RN

r r rr

X a a a=

=� � ��� � or

• Slice notation (three-way only): If , , IxJxKX A B C R= ∈� �� � , then we can for example

write each frontal slice as

::k T

k sX AD B= (1) where the R × R diagonal matrix Ds

(k) is defined by Ds

(k) =diag(ck:). Slice notation can be used in the other

directions as well:

:: :

: : :

( ) , 1, , ,( ) , 1, , .

Ti i

Tj i

X Bdiag a C for i I andX Adiag b C for j J

= == =

� (2)

If we introduce nonnegativity constraint we have that ( ) nI xRnA R+∈ , or in the 3D case , ,, , I J KxRA B C R+∈ . The

motivation behind PARAFAC is to obtain a unique solution such that component matrices are determined uniquely up to a permutation and scaling of columns. It is this uniqueness property that makes PARAFAC a popular technique in various fields. For example in fluorescence spectroscopic data analysis [13], a unique PARAFAC model allows us to find physically and chemically meaningful factors directly from measurements of mixtures of chemicals. Uniqueness is achieved by the restrictions imposed by the model. The most significant restriction is that factors in different modes can only interact factorwise.

151

V. PROPOSED METHOD

For large dimension of tensors, many of the known algorithms [18] for nonnegative tensor decomposition could be time consuming. Also, if the size of the tensor become larger then 200x200x200 then such kind of tensors becomes intractable on common PC stations (see [17]). Having that in mind, here we propose an alternative simple approach which converts complex problem of NTF to less complex sub-problems. We will call it Nonnegative Collapsing/Averaging Tensor Factorization (NCATF). In that case we will reduce the computational burden and make possible analysis of much bigger tensors. In the case of 3D tensor we reduce the problem to 2-NMF problems (or in the case of rank deficiency 3-NMF problems). Proposed idea is in a sense extension of the ideas used by for simple and fast decomposition of super-symmetric 3D tensor [16].

In order to introduce the new solution we will assume that we want to decompose 3D tensor. Based on (1) and (2) we can write the following equations

,Tavk avk avkX AC B N= + (3)

,Tavj avj avjX AB C N= + (4)

,Tavi avi aviX BA C N= + (5)

where Xavk, Xavj and Xavi represent the mean frontal, mean lateral and mean horizontal slice of the given 3D tensor, given by the following equations

:: : : ::1 1 1

1 1 1, , ,K J I

avk k avj j avi ik j i

X X X X X XK J I= = =

= = =� � �and where Cavk, Bavj and Aavi are defined as

: :1 1

:1

1 1( ), ( ),

1 ( ).

K J

avk k avj jk j

I

avi ii

C diag c B diag bK J

A diag aI

= =

=

= =

=

� �

�The Navk, Navj and Navi (representing the noise or

residuals) are defined analogously to Xavk, Xavj and Xavi. In other words, we performed collapsing of 3D tensor by the one the modes (using language from [17]).

Equations (3-5) can be written in the following form (if we ignore noise/residuals)

,Tavk k kX A B= (6)

,Tavj j jX A C= (7)

,Tavi i iX BC= (8) where

1 2 1 2

1 2 1 2

1 2 1 2

, , . . ,, , . . ,, , . . ,

k k k k k k avk

j j j j j j avj

k i i i i i avi

A AD B BD s t D D CA AD C CD s t D D BB BD C CD s t D D A

= = == = == = =

and where Dxy (x= {1,2} and y = {i, j, k}) represent diagonal matrices.

We can see that set of equations (6-8) represents 3separate NMF problems. So, our initial problem of tensor decomposition is converted to three (two) NMF problems where common factors can be extracted by any of thestandard NMF problems. Obviously, factors will be extracted up to the permutation and scaling ambiguities.

Of course, after extraction of the factors by the implementation of the second NMF algorithm it is necessary

to make alignment of the new factors with previously extracted factors by identification of the permutation of the factors that were already extracted in the first NMF and then rearranging the third extracted set of the factors. Also, it is necessary to make additional rescaling in order to make it consistent with already identified factors.

It is not difficult to see that it is possible to apply the proposed technique to the some rank deficient problems. If the different modes have different ranks, and if there are at least two modes with the same and biggest ranks, then it is possible to extract all factors correctly without introduction of any prior knowledge (like it is the case in PARALIND [15]). Also, if we know that there is no rank-deficiency of the given tensor, and if we do not want to check the consistency of PARAFAC model, then it is necessary to perform only 2 NMF decompositions.

VI. EXPERIMENTAL RESULTS

In order to show effectiveness of the proposed method we will perform decomposition of a synthetically generated 3D tensor. The following figure depicts the factors that are used for creation of the 3D tensor.

152

Fig. 1. Factors used for the creation of the 3D tensor

By applying NMF algorithm after averaging over dimensions 1, we obtain the following factors (depicted in the following figure).

By applying NMF algorithm after averaging over dimensions 2, we obtain the following factors (depicted in the following figure).

We can see that the proposed method gives good results, and based on 2 NMF problems we can extract all three factors.

VII. CONCLUSION

In this paper we presented a new approach to NTF which is based on CANDENCOMP/PARAFAC model. The proposed method is simple, computationally effective, suitable for implementation on multicore architectures, can handle some problems related to rank-deficient tensors and can be used for analysis of the higher dimensional tensors than most of the known algorithms for NTF. This method can be easily extended to higher dimensional tensors. Efficiency of the proposed method in the case when there is significant level of correlated noise between slices, should be additionally investigated.

153

REFERENCES

[1] L. R. Tucker, “The extension of factor analysis to three-dimensional matrices,” In Contributions to Mathematical Psychology . Holt, Rinehart and Winston, New York, 1964, pp. 110-182.

[2] F. Miwakeichi, E. Martnez-Montes, P. Valds-Sosa, N. Nishiyama, H. Mizuhara, and Y. Yamaguchi, “Decomposing eeg data into spacetime- frequency components using parallel factor analysis,” NeuroImage, vol. 22, no. 3, pp. 1035-1045, 2004.

[3] E. Acar, S. A. Camtepe, M. Krishnamoorthy, and B. Yener, “Modeling and multiway analysis of chatroom tensors,” In Proc. ofIEEE International Conference on Intelligence and SecurityInformatics. Springer, Germany, pp. 256-268, 2005.

[4] F. Estienne, N. Matthijs, D. L. Massart, P. Ricoux, and D. Leibovici, “Multi-way modelling of high-dimensionality electroencephalographic data,” Chemometrics Intell. Lab. Systems, vol. 58, no. 1, pp. 59-72, 2001.

[5] S. Gourvnec, I. Stanimirova, C. A. Saby, C. Y. Airiau, and D. L. Massart, “Monitoring batch processes with the statistics approach,” J. of Chemometrics, vol. 19, pp. 288-300, 2005.

[6] L. R. Tucker, “Some mathematical notes on three- mode factor analysis,” Psychometrika, vol. 31, pp. 279-311, 1966.

[7] L. D. Lathauwer, B. D. Moor, and J.Vandewalle, “A multilinear singular value decomposition,” SIAM J. Matrix Anal. Appl., vol. 21, no. 4, pp. 1253-1278, 2000.

[8] H. A. L. Kiers, “Towards a standardized notation and terminology in multiway analysis,” J. of Chemometrics, vol. 14, no. 3, pp. 105-122, 2000.

[9] J. Kim, and C. W. Mueller, Introduction to factor analysis: What it is and how to do it. Sage Publica-tions, Newbury Park, CA, 1978.

[10] G. H. Golub, and C. F. V. Loan, Matrix Computations. The Johns Hopkins University Press, Baltimore, MD, 1996.

[11] R. A. Harshman, “Foundations of the parafac procedure:models and conditions for an 'explanatory' multi-modal factor analysis,” UCLA working papers in phonetics, vol. 16, pp. 1-84, 1970..

[12] E. Acar, C. A. Bingol, H. Bingol, R. Bro, and B. Yener, “Multiway analysis of epilepsy tensors,” Technical Report 07-04, Rensselaer Polytechnic Institute, 2007.

[13] C. M. Andersen, and R. Bro, “Practical aspects of parafac modeling of fluorescence excitation-emission data,” J. of Chemometrics, vol. 17, no. 4, pp. 200-215, 2003.

[14] R. B. Cattell, “Parallel proportional profiles and other principles for determining the choice of factors by rotation,” Psychometrika 9, pp. 267-283, 1944.

[15] R. Bro, “Multi-way analysis in the food industry: models, algorithms, and applications,” Ph.D. thesis, University of Amsterdam, Amsterdam, Holland, 1998.

[16] A. Cichocki, M. Jankovic, R. Zdunek, S. Amari, "Sparse super symmetric tensor factorization", in Proc. 14th International Conference on Neural Information Processing, ICONIP07, Kitakyushu, Japan, November 13-16, 2007, Lecture Notes in Computer Science, Springer, 2008 .

[17] T. G. Kolda and B. W. Bader, “Tensor decompositions and applications,” Technical Report Number SAND2007-6702, Sandia National Laboratories, Albuquerque, NM and Livermore, CA, November 2007.

[18] [18] A. Cichocki, R. Zdunek, A. Phan, S-I. Amari, Non-negative matrix and tensor factorizations: Applications to exploratory multiway data analysis and blind source separation . John Wiley & Sons, Ltd, New York, 2009.