Separation of human and animal seismic signatures using non-negative matrix factorization

Pattern Recognition Letters 33 (2012) 2085–2093

Contents lists available at SciVerse ScienceDirect

Pattern Recognition Letters

journal homepage: www.elsevier .com/locate /patrec

Separation of human and animal seismic signatures using non-negativematrix factorization q

Asif Mehmood ⇑, Thyagaraju Damarla, James SabatierUS Army Research Laboratory, 2800 Powder Mill Road, Adelphi, MD 20783, United States

a r t i c l e i n f o

Article history:Received 7 November 2011Available online 7 July 2012

Communicated by S.Sarkar

Keywords:Non negative matrix factorizationDimensionality reductionSparsitySingle channel source separationSpectrogram

0167-8655/$ - see front matter � 2012 Elsevier B.V. Ahttp://dx.doi.org/10.1016/j.patrec.2012.06.015

q This work was completed with the support of Oak(ORAU) postdoctoral program.⇑ Corresponding author. Tel.: +1 301 394 0818; fax

E-mail address: [email protected] (A. M

a b s t r a c t

Seismic footstep detection based systems can be employed for homeland security applications such asperimeter protection and the border security. This paper reports an approach based on non-negativematrix factorization (NMF) for seismic footstep signal separation for a single channel recording. A super-vised NMF technique is employed to separate the human footstep signatures from the horse footstep sig-natures. The proposed algorithm is applied on the spectrogram of human footstep signals and horsefootstep signals. The spectrograms of these signals are presented as a sum of components, each havinga fixed spectrum and time-varying gain. The main benefit of the proposed technique is its ability todecompose a complex signal automatically into objects that have a meaningful interpretation. In thispaper, a sparsity-based NMF algorithm is developed and implemented on seismic data of human andhorse footsteps. The performance of this method is very promising and is demonstrated by the experi-mental results.

� 2012 Elsevier B.V. All rights reserved.

1. Introduction

Personnel detection is an important aspect of intelligence, sur-veillance, and reconnaissance (ISR). It plays a vital role in perimeterand camp protection and in curtailing illegal border crossings(Damarla and Ufford, 2007; Sabatier et al., 2010). All these applica-tions involve deployment of sensors for a prolonged time and oftencamouflaged so as not to be noticeable by an intruder’s visualinspection (Berger et al., 2008; Berger et al., 2009).

Human motion detection based on footstep signatures usingseismic/acoustic sensors is an active research topic. Bland (2006)has discussed the use of autoregressive (AR) coefficients in design-ing a footstep detection scheme from acoustic and seismic sensors.Gampert et al. (2001) used the kurtosis statistic to detect footsteps.Berger et al. (2007) have considered the problem of detecting andclassifying perimeter intrusion using geophones. They used a neu-ral network approach to classify vehicular and human intrusionand used footsteps as the signature for the detection of humanintrusion. Iyengar et al. (2007) fused acoustic and seismic signalsfor footstep detection. Their work discusses a novel approachbased on canonical correlation analysis and copula theory to estab-lish a likelihood ratio test. Subramanian et al. (2009) proposed adata-driven personnel detection scheme. Their method employed

ll rights reserved.

Ridge University Associated

: +1 301 394 5410.ehmood).

empirical mode decomposition (EMD)-based signal processingscheme for the detection of footsteps. Houston and McGaffigan(2003) have proposed using cadence features for detection of foot-steps. But most of these methods are prone to false alarms becausehumans and animals have similar walking mechanisms and gener-ate similar rhythmic temporal seismic patterns. Furthermore, anyquadruped ambling around with a slow cadence can generate thesame cadence frequency as the one from a human or a fast-movinghuman can generate the same cadence frequency as that of thequadruped. Therefore, it is imperative to come up with a methodthat can be used to differentiate quadrupeds from humans basedon their walk such as seismic footstep signatures. To address theproblem of discriminating human from animal based on their walk,a source separation techniques can be employed. The source sepa-ration procedure can separate the human-generated signal from ahorse-generated signal by capturing their footstep signatures usingseismic sensors and then classifying them automatically. In thispaper, we developed and implemented single channel source sep-aration algorithm based on non-negative matrix factorization(NMF).

NMF is a recent method for factorizing a matrix as the productof two matrices, in which all elements are non-negative (Lee andSeung, 1999). It is based on the fact that in many data processingtasks negative numbers are physically meaningless and contradictphysical realities. Many physical signals, such as pixel intensities,amplitude spectra, and occurrence counts, are naturally repre-sented by non-negative numbers. In the analysis of mixturesof such data, non-negativity of the individual components is a

http://dx.doi.org/10.1016/j.patrec.2012.06.015

mailto:[email protected]

http://dx.doi.org/10.1016/j.patrec.2012.06.015

http://www.sciencedirect.com/science/journal/01678655

http://www.elsevier.com/locate/patrec

2086 A. Mehmood et al. / Pattern Recognition Letters 33 (2012) 2085–2093

reasonable constraint. NMF has found widespread application inmany different areas including pattern recognition, clustering,dimensionality reduction, and spectral analysis (Sajda et al.,2003, 2004). A decade ago, a very simple algorithm (Lee and Seung,2000) for computing the NMF was introduced. This has initiatedmuch research aimed at developing more robust and efficient algo-rithms, and efforts have been made to enhance the quality of theNMF by adding further constraints to the decomposition, such assparsity (Hoyer, 2002).

In addition to other applications, NMF is widely used in sourceseparation. It is typically used on a magnitude spectrogram ofa signal. The main benefit of the non-negative spectrogram factor-ization techniques is their ability to decompose a complex signalautomatically into objects that have a meaningful interpretation.In recent years, source separation of linearly mixed signals usingNMF has attracted a wide range of researchers. The source separa-tion problem becomes more challenging when a single channelrecording is available. The single channel separation problemhas been studied extensively within the signal processing andmachine learning communities and different parametric andnon-parametric signal models have been proposed. Hidden Markovmodels (HMM) are quite powerful for modeling a single source. Ithas been suggested by Roweis (2000, 2003) to use a factorial HMMto separate mixed speech. Another suggestion by Roweis is to use afactorial-max vector quantizer (Roweis, 2003). Jang et al. (2003)use independent component analysis (ICA) to learn a dictionaryfor sparse encoding, which optimizes an independence measureacross the encoding of the different sources. Pearlmutter andOlsson (2006) generalize these results to overcomplete dictionar-ies, where the number of dictionary elements is allowed to exceedthe dimensionality of the data. Other methods learn spectraldictionaries based on different types of NMF (Lee and Seung,1999). One idea is to assume a convolutive sum mixture, allowingthe bases functions to capture time frequency (TF) structures(Smaragdis, 2004; Schmidt and Morup, 2006). In the work by Ellisand weiss (2006), careful consideration is given to the representa-tion of the signals so that the perceived quality of the separation ismaximized.

All the above-mentioned approaches are applied to speech ormicrophone data and no one has yet applied NMF methods to sep-arate two or more seismic sources. In this pursuit, we propose touse the sparse non-negative matrix factorization (SNMF) (Schmidtand Olsson, 2006) as a computationally attractive approach to sep-arate a human’s footsteps from a horse’s footsteps. To our knowl-edge, sparsity-based NMF has not been applied in this mannerfor the human and animal footstep signatures separation. As a firststep, overcomplete dictionaries are estimated for both sources, i.e.,human and horse footsteps, to give sparse representations of thesignals. Separation of the source signals is achieved by mergingthe dictionaries pertaining to the sources in the mixture and thencomputing the sparse decomposition (Donoho and Stodden, 2003;King and Atlas, 2010). We explore the significance of the degree ofsparseness and the number of bases vectors employed. The resultsare presented for the experimental data taken at a horse farm. Wethen compare the unsupervised SNMF with a non-sparse NMF, andfound that SNMF outperforms the conventional NMF methods. Thispaper is organized as follows: Section 2 describes methods: NMFand SNMF. An overview of the framework is explained in Section3. Results and discussion are presented in Section 4. The conclusionis presented in Section 5.

2. Methodology

Before we discuss the method that we employed to extractindividual sources from a mixture. It is important to discuss firstthe features extraction. The goal of the feature extraction is to

characterize an object to be recognized by measurements whosevalues are very similar for objects in the same category, and verydifferent for objects in different categories. This leads to the ideaof seeking distinguishing features that are invariant to irrelevanttransformations of the input. All the physics-based algorithms ex-ploit features to extract the valuable information about an event.For example, the seismic signatures of footsteps of a horse and ofa human contain different frequencies and amplitudes because oftheir feet sizes and shapes. We want to incorporate this informa-tion in our algorithm by extracting their footstep spectral features.

A general problem in many applications is extracting the fea-tures that can be used to separate different sources from a mixturesuch as horse footstep signatures from that of a human. Based ondifferent processing domains, current feature extraction methodsfor seismic signals can be classified into three categories: time do-main methods, frequency domain methods, and TF domain meth-ods. Time-domain analysis is particularly vulnerable tointerfering noise, the complicated waveforms, and the variationof the terrains. As we are dealing with the seismic signals thatare non-stationary in nature, we would use TF representation toextract the features of each source in the mixture. TF representa-tion is widely used in the analysis and feature extraction of seismicsignals. The spectrogram in terms of time and frequency capturesthe spectral contents well enough for discrimination. The spectro-gram computes the short time Fourier transform of the signal anddisplays the magnitude spectral density at chosen frequency, andcan be mathematically expressed as:

STFTðt; f Þ ¼Z

Sðt þ sÞwðsÞ expð�j2pfsÞds; ð1:0:1Þ

where SðtÞ is the signal under consideration, wðtÞ is a sliding win-dow function (e.g., a Hamming window), t is time, and f isfrequency.

In the following, we discuss NMF method and its sparsity-basedversion called SNMF used for source separation.

2.1. Non-negative matrix factorization

NMF has been proposed (Lee and Seung, 2000) as a novel sub-space method that can achieve parts-based representation of ob-jects by imposing non-negative constraints. Given a matrix X, itis possible to approximate it by the product of two matrices, Band W, enforcing the constraint that all matrices are non-negative.The non-negativity constraints make the NMF representationpurely additive (allowing no subtractions), contrary to other linearrepresentations such as principal component analysis (PCA) andindependent component analysis (ICA) (Delac et al., 2004; Draperet al., 2003). These conventional factorization techniques allowthe bases vectors to comprise both positive and negative terms,and the interaction between them as specified by the componentsof B to be both positive and negative. In practice, neither the datasets such as matrices that represent sequences of magnitude spec-tral vectors nor the bases derived from them can be negative sincenegative magnitude simply carries no physical meanings.

In the formulation of NMF, first the data are converted into themagnitude spectrogram using Eq. (1.0.1), and represented by aM � Nmatrix X such that X is a non-negative real-value matrix.NMF algorithm can represent X as the product of two non-negativematrices:

X � X ¼ B W s:t: B;W P 0; ð1:1:1Þ

where M � R matrix B contains the bases vectors and R� N matrixW contains the weights required to properly approximate thecorresponding columns of the matrix X, as a linear combinationswith the columns of B.

A. Mehmood et al. / Pattern Recognition Letters 33 (2012) 2085–2093 2087

The number of bases vectors (R) that is also the rank of factor-ization is usually chosen (Lee and Seung, 1999) so thatðM þ NÞR < MN, and the product of B and W can be consideredas the compressed form of the original data matrix X. The smallerthe value of R, we get the greater dimensionality reduction. But thequestion arises how small should be the rank of bases or number ofbases used. The solution to this problem is obtained by employingthe sparsity using techniques such as l1 norm. By enforcing thesparsity of the weighting matrix, W, it is possible to sparse X intoits sources if the bases are diverse enough. As a consequence of theabove, two connected tasks have to be solved: (1) the learning ofsource-specific dictionaries that yield sparse codes, and (2) com-puting the sparse decompositions for separation. We use the SNMFmethod proposed by Schmidt and Olsson (2006) to accomplishthese tasks.

2.2. Sparse non-negative matrix factorization

NMF computes the decomposition in Eq. (1.1.1) subject to theconstraints that all matrices are non-negative, leading to solutionsthat are parts-based or sparse (Hoyer, 2002). However, the basicNMF does not provide a well-defined solution in the case of over-complete dictionaries, when the non-negativity constraints are notsufficient to obtain a sparse solution. SNMF that optimizes the costfunction can be represented as:

E ¼ kX� BWk2F þ k

Xij

Wij s:t: B;W P 0 ð1:2:1Þ

E is a linear combination of reconstruction term and the sparsityterm. But in Eq. (1.2.1), we have a scaling problem. If we scale thebases vector Bj with a constant bj and corresponding weightingvector Wj by 1

bj, and obtain the same reconstruction cost but differ-

ent sparsity; that means we can always scale up the bases andscale down the weights to get the lower cost function. It wouldthen lead to an optimal solution when sparsity approaches zerosand bases are not out of bound. In other words, the cost functiondoes not depend on the sparsity anymore. However, this problemcan be circumvented by normalizing the bases. A detail descriptionon the normalization of the bases can be found in Eggert et al. pa-per (Eggert and Korner, 2004). The cost function given in Eq. (1.2.1)can be reformulated as using a normalized bases

E ¼ kX� �BWk2F þ k

Xij

Wij s:t: B;W P 0 ð1:2:2Þ

where Bj :¼ Bj

jjBj jj.

From encoding viewpoint, we can consider the bases matrix as acodebook or dictionary and the coefficient matrix as the encodingcoefficients. �B, the column-wise value, is a normalized bases vec-tor. The parameter k used in Eq. (1.2.2) controls the degree ofsparseness in the code matrix. The concept of ‘sparse coding’ refersto a representational scheme where only a few units (out of a largepopulation) are effectively used to represent typical data vectors(Hoyer, 2002). In effect, this implies most units taking values closeto zero while only few take significantly non-zero values. SNMFcan be computed by alternating updates of B and W by the follow-ing rule (Eggert and Korner, 2004; Schmidt and Morup, 2006;Schmidt and Olsson, 2006)

Wij Wij �XT

i Bj

PTi Bj þ k

ð1:2:3Þ

Bj Bj �P

iWij½Xi þ ðPTi BjÞBj�P

iWij½Pi þ ðXTi BjÞBj�

ð1:2:4Þ

where P ¼ BW , and the bold operators indicate point-wise multipli-cation and division. We first apply SNMF to learn dictionaries ofindividual seismic signals. To separate seismic mixtures, we keep

the dictionary fixed and update only the code matrix, W. The seis-mic signal pertaining to each source is then separated by computingthe reconstruction of the parts of the sparse decomposition pertain-ing to each of the used bases. The main idea in this paper is to useSNMF as a means of separating human footstep signatures fromhorse footstep signatures.

3. Overview of the framework

The approach employed in this paper can be generalized to aframework consisting of two steps: decomposition, separationand reconstruction. First we took a several segments of data fromeach source and call that training data. We have two sourcesthat generate seismic signals. These are: a walking human and awalking horse. After computing the spectrogram of each sourcein the training set, an overcomplete bases in the feature spaceusing SNMF were estimated. These bases can be seen as a source-dependent, non-parametric, generative model, i.e., the observa-tions for a specific source is generated as non-negative linear com-binations of elements in this bases (Schmidt and Morup, 2006). Thebases thus obtained can represent the underlying features of thesignal. It means sets of bases components are different for differentsignals. Therefore, we can exploit this characteristics of bases vec-tors to sperate different signal from a mixture. In our work, we in-tend to separate the seismic signals representing a human footstepsignatures and the seismic signals representing a horse footstepsignatures from the mixed seismic signal. In this regard, we concat-enate the bases of each source signal to form a joint bases matrix.Two test signal representing each source are added together tomake a mixture and their spectrogram is computed. Now the spec-trogram of the test signals is decomposed using the joint bases ma-trix and some randomly generated weights. During the separationprocedure, the observed features are mapped onto the concate-nated bases of the two sources in the mixture. The bases were keptfixed and weights were updated during the iterative process ofSNMF until the algorithm converged to a predefined value set forconvergence. The predefined value of convergence can be eitherin terms of an error value or the number of iterations to be per-formed. When the SNMF algorithm converges, the values of thebases matrix and weighting matrix found at that point are groupedtogether. The separation is possible now because the bases indicesare known for each source. This gives an estimate of each source inthe feature space, which was mapped back to the spectrogramspace. Here, the smoothed time varying Wiener filters (Schmidtand Olsson, 2006) were computed, which are used to filter the ori-ginal mixture giving the final result. In the following, we discuss inmore detail the steps involved in separating two signal from a mix-ture, and these steps are: decomposition, and separation andreconstruction.

3.1. Decomposition of seismic signals using SNMF

In order to decompose the data matrix X, we can rewrite Eq.(1.1.1) as,

X � bX ¼XJ

j¼1

BjWj ð2:1:1Þ

where X is the magnitude spectrogram of the mixture. Vector Bj isthe jth column of B ¼ fB1;B2; . . . :BRg, and is the set of bases func-tions. The corresponding coefficients or the weights of each basesare W ¼ fW1;W2; . . . :WRgT .

In our work, we are employing a supervised SNMF, whichmeans we obtain bases from the training data and then use themduring the testing of the mixed data that are to be separated. Weemploy supervised SNMF where we learn about the bases of each

Fig. 1. Sensors layout in the barn and the path trajectory of the subjects.


source separately from the training data. Here, we have twosources S1 and S2 representing human footsteps and horse foot-steps, respectively, and the data matrix computing spectrogramsare X1 and X2. We can decompose these data matrices by the fol-lowing expressions:

X1 � X1 ¼ B1 W1

X2 � X2 ¼ B2 W2 ð2:1:2Þ

where B1 and B2 are the bases matrix belong to S1 and S2, respec-tively. As mentioned above, the bases obtained from the trainingdata can be employed in the mixture data for the separation. Nowwe concatenate the bases from each source and form a large basesmatrix B as given in the following expression:

B ¼ ½B1B2� ¼ ½B11B12 . . . B1pB21B22 . . . B2q� ð2:1:3Þ

where p and q represent the size of bases set in X1 and X2, respec-tively. In order to separate individual signal from the mixture, wedecompose the spectrogram of the mixed signal as we did withtraining data using the SNMF algorithm. However, during thedecomposition of mixture spectrogram, the bases are kept fixedand only weights were updated during each iteration. The final val-ues of matrix B and W are combined to perform the separation andreconstruction that is explained in the next section.

3.2. Separation and reconstruction of individual sources

In order to separate the individual source signals from the mix-ture, the bases vectors Bj learnt during the decomposition of thetraining data need to be grouped as shown in Eq. (2.1.3). In thesupervised SNMF decomposition, this step is straightforward sincethe pre-defined bases B are already grouped and does not changeduring processing. In the separation process, the feature space rep-resentation of the mixture is mapped onto the joint bases B of theseismic sources using the Eq. (1.2.3) to compute the appropriateweights.

The estimate of the mixed signal is thus given by the followingequation:

Y ¼ B1 B2½ �W1

W2

� �ð2:2:1Þ

and B ¼ B1 B2½ �. Now the estimates of the individual source sig-nals can be computed easily for each source by multiplying therows of B with corresponding columns of W in conjunction withrefiltering, we obtain the magnitude spectrogram of each sourcesignal as given in Eqs. (2.2.2) and (2.2.3).

Y1 ¼ B1 W1 ð2:2:2Þ

and

Y2 ¼ B2 W2 ð2:2:3Þ

where Y1 and Y2 are the magnitude spectrogram of the estimatedsource signals. The idea of refiltering (Roweis, 2000) is to separatethe seismic sources by filtering the mixed signal with time-varyingfilter, Wfact , designed to preserve TF regions containing only the sig-nal of interest and attenuate regions contain the interfering signal.Assuming that each TF bin in an STFT representation is dominatedby the human footstep signatures, a filter often used is the timevarying Wiener filter, where each TF bin is multiplied by the esti-mated ratio of the target to the mixed signal.

jWifactj

2 ¼ jYij2

jYij2 þ jYjj2ð2:2:4Þ

The output of separation using SNMF is an estimate of the magni-tude spectrogram and is obtained by multiplying the Wfact with Yi

where Yi is the spectrogram of the signal to be separated and Yj isinterfering signal’s spectrogram. But in order to reconstruct the ori-ginal sources, i.e. to get the time domain representation of eachsource signal, we use the inverse short time Fourier transform (IST-FT). The time domain signal then can easily be converted to a wavefile so that we can listen to the separated signal. This is also an eval-uation or the test how good the SNMF algorithm performed for thesingle channel seismic source separation.

4. Experimental results and discussion

This section begins with a brief overview of data acquisition fol-lowed by the detailed analysis of results for two scenarios, evalua-tion of the proposed algorithm, and a comparison of proposedSNMF with the baseline NMF algorithm, respectively.

4.1. Data acquisition

The data was collected when the subjects (human(s) andhorse(s)) walked by the sensors in the barn on number ‘‘8’’ shapedpath at a horse farm. The layout of the barn is shown in Fig. 1. Theshape of the path was chosen based on the space available in thebarn and has no other significance whatsoever. It neither effectsthe propagation of seismic waves nor have any impact on the per-formance of the proposed algorithm. The seismic sensors alongwith ultrasonic sensors are placed inside the ‘‘8’’ shaped path.There are six seismic sensors, and as the subject(s) passed by thesesensors they capture the footstep signatures of the subject(s).These seismic footstep signatures of the targets (subjects) werecollected using Bruel & Kjaer (B & K) 12-channel, 24-bits dataacquisition system. The data were sampled at a 32 kHz samplingrate initially and was then down-sampled to 2.00 kHz. The objec-tive of the experiment and subsequent analysis of the data wasto distinguish the signatures of a walking human footstep fromthat of a walking horse. In order to extract the features of human

Time (s)

Freq

uenc

y (H

z)

20 40 60 80 100 120 140 160 1800

50

100

150

200

250

300

350

400

450

500

−10

−5

0

5

10

15

20

25

30

35

Time (s)

Freq

uenc

y (H

z)

20 40 60 80 100 120 140 160 1800

50

100

150

200

250

300

350

400

450

500

−10

−5

0

5

10

15

20

25

30

35

Fig. 2. Spectrograms (a) Human training data, (b) horse training data.

Number of Bases

Freq

uenc

y (H

z)

5 10 15 20 25 30

50

100

150

200

250

300

350

400

450

500

−60

−55

−50

−45

−40

−35

−30

−25

−20

−15

Fig. 3. Human and horse data bases.


footsteps or horse footsteps the spectrogram of each individual sig-nal was computed. The parameters used in the spectrogram calcu-lations are: hamming window of 125 ms (256 samples) and anoverlap of with 50%. These parameters values give a good TF reso-lution for the footstep signals.

In order to perform source separation using SNMF, first we tooktwo training data sets of 180 s each. The first training data set com-prised of the walking human footstep seismic signatures, and the

Time (s)

Freq

uenc

y (H

z)

1 2 3 4 5 6 7 8 90

50

100

150

200

250

300

350

400

−10

−5

0

5

10

15

20

25

30

35

(a)Time (s)

Freq

uenc

y (H

z)

1 2 3 4 5 60

50

100

150

200

250

300

350

400

450

500

(b)

Fig. 4. Spectrograms (a) Human test data, (b) horse t

second set contained seismic signatures of a walking horse. It isworth mentioning here that the seismic signals of the same subjectsuch as a human are different for different surface types (e.g., dirtor wet soil, snow, ice) (Lacombe et al., 2006). This is due to the factthat the type of surface can affect the propagation of the seismicwaves. Therefore, the training data and test data must be collectedfrom the same soil type, and need not to be from the same location.First the spectrograms are computed for the training data as a pre-processing step, but it also gives us the spectral features of footstepsignatures. The spectrograms of the human training data and thehorse training data are shown in Figs. 2(a) and 2(b).

The data matrices representing the human spectrogram andthe horse spectrogram were then used to compute the basescharacterizing the human footstep signatures and the horse foot-step signature by using SNMF. The bases thus obtained from thetraining data were used for source separation. The choice of num-ber of bases to be used in SNMF is an open research topic. It is cho-sen heuristically and can be between 1 and the total number ofcolumns in the data matrix. The signal to error ratio (SER) can becomputed for each bases set used in source separation problemand the number of bases that give the best SER are selected. How-ever, in SNMF we can use all bases vectors and then by employingsparsity we can neutralize the contribution of those bases that areinsignificant. To have a better understanding of bases, a represen-tation of 15 bases from the spectrogram of each of the source con-catenated together is depicted in Fig. 3. In Fig. 3, the first 15 basescorresponding to the human data and the remaining 15 correspondto horse data.

7 8 9

−10

−5

0

5

10

15

20

25

30

35

Time (s)

Freq

uenc

y (H

z)

1 2 3 4 5 6 7 8 90

50

100

150

200

250

300

350

400

450

500

−10

−5

0

5

10

15

20

25

30

35

(c)

est data and (c) mixed data (human and horse).

Time (s)

Freq

uenc

y (H

z)

1 2 3 4 5 6 7 8 90

50

100

150

200

250

300

350

400

−10

−5

0

5

10

15

20

25

30

35

Time (s)

Freq

uenc

y (H

z)

1 2 3 4 5 6 7 8 90

50

100

150

200

250

300

350

400

450

500

−10

−5

0

5

10

15

20

25

30

35

Fig. 5. Spectrograms of reconstructed data (a) human and (b) horse.

Time (s)

Freq

uenc

y (H

z)

1 2 3 4 5 6 7 8 90

50

100

150

200

250

300

350

400

−10

−5

0

5

10

15

20

25

30

35

Time (s)

Freq

uenc

y (H

z)

1 2 3 4 5 6 7 8 90

50

100

150

200

250

300

350

400

450

500

−10

−5

0

5

10

15

20

25

30

35

Time (s)

Freq

uenc

y (H

z)

1 2 3 4 5 6 7 8 90

50

100

150

200

250

300

350

400

450

500

−10

−5

0

5

10

15

20

25

30

35

Fig. 6. Spectrograms (a) three human test data, (b) two horses test data and (c) mixed data (humans and horses).

Time (s)

Freq

uenc

y (H

z)

1 2 3 4 5 6 7 8 90

50

100

150

200

250

300

350

400

−10

−5

0

5

10

15

20

25

30

35

Time (s)

Freq

uenc

y (H

z)

1 2 3 4 5 6 7 8 90

50

100

150

200

250

300

350

400

450

500

−10

−5

0

5

10

15

20

25

30

35

Fig. 7. Spectrograms of reconstructed data (a) three humans and (b) two horses.


The source separation was performed on the data obtained fromtwo different scenarios. In the first scenario, the test data was ta-ken from a single human and a single horse walking. In the secondscenario, the test data was obtained from 3 humans (2 men and 1woman) and 2 horses (one big and one small). In both scenarios,we took 10 s of human(s) data and 10 s of horse(s) data to test.

The reason that we took 10 s of data is because we can take intoconsideration different variations in the footstep signatures. How-ever about 2.0 s of data is considered enough to characterize thehuman or a horse footstep signatures Lacombe et al., 2006. As longas we are able to generate the pattern of human walk and the horsewalk, which is characterized by their walk mechanism, we can

0 0.2 0.4 0.6 0.8 1−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

Time (s)

Am

plitu

de (v

)TestRecon

1.4 1.45 1.5 1.55 1.6 1.65 1.7 1.75−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

Time (s)

Am

plitu

de (v

)

TestRecon

Fig. 8. Correlation between test and reconstructed signals (a) a human (b) a horse.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7−1.2

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

Time (s)

Am

plitu

de

TestRecon

1.55 1.56 1.57 1.58 1.59 1.6−1.2

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

Time (s)

Am

plitu

de (v

)TestRecon

Fig. 9. Correlation between test and reconstructed signals (a) multiple humans (b) multiple horses.

Table 1SER for human footstep signals.

No. of bases 10 20 30 50 75 100 200

Scenario I SER (dB) SNMF 10.32 10.35 10.38 10.40 10.41 10.42 10.45NMF 7.45 7.49 7.51 7.62 7.71 7.75 7.81

Scenario II SER (dB) SNMF 9.86 9.92 9.95 10.08 10.13 10.16 10.20NMF 6.92 7.13 7.15 7.19 7.24 7.29 7.33


separate them using the proposed algorithm, and a couple of foot-step are sufficient for it. We discuss the results obtained in bothscenarios one by one.

4.2. Scenario 1

In the first scenario, we formed a mixture of a human and ahorse signatures, and called it test data. The spectrograms of 10 sof a human data, 10 s of a horse data and 10 s of the mixed dataare shown in Figs. 4(a), 4(b) and 4(c), respectively. The task aheadis to separate individual signals from the mixture. The basesalready calculated for each source from their respective trainingdata are used in the separation. SNMF is performed by keeping

the bases fixed and updating the weighting matrix by using Eq.(1.2.3). Once the weight matrices were calculated, the separationstep is possible because the bases indices are known for eachsource (Schmidt and Olsson, 2006). The spectrograms of the recov-ered signals are shown in Figs. 5 (a) and 5(b). Now we can comparethe spectrograms of the test signals as shown in Figs. 4(a) and 4(b)with the spectrograms of the recovered signals in Figs. 5(a) and5(b) by visual inspection, and can see how well the reconstructedspectrograms match the test signal spectrograms. It is clear fromFigs. 4(a) and 5 (a) that they have a good match when you lookat the low frequency contents below 50 Hz with high energy. Thehigh frequency contents are representative of noise. The noise con-tents in the recovered signal are less prominent because SNMF has

0 1 2 3 4 5−1

−0.5

0

0.5

1

Time (s)

Mag

nitu

de

(b)

0 1 2 3 4 5−1

−0.5

0

0.5

1

Time (s)

Mag

nitu

de

(a)

Fig. 10. SNMF: Human footsteps signals (a) test and (b) reconstructed.

0 1 2 3 4 5−1

−0.5

0

0.5

1

Time (s)

Mag

nitu

de

(a)

0 1 2 3 4 5−1

−0.5

0

0.5

1(b)

Time (s)

Mag

nitu

de

Fig. 11. NMF: Human footsteps signals (a) test and (b) reconstructed.


built-in noise removal mechanism. All the footstep pulses withhigh energy seen in Fig. 4(a) can also be seen in Fig. 5(a). Similarlythe spectral features corresponding to a horse footsteps observedin Fig. 4(b) can be viewed in Fig. 5(b).

In order to demonstrate the effectiveness of our algorithm, wealso tested it on multiple humans and multiple horses seismic sig-nals and is discussed in the following.

4.3. Scenario 2

In this scenario, we took 10 s of humans and 10 s of horses dataand performed spectrograms on them similar to the scenario 1.This is shown in Figs. 6(a) and 6(b). The test data from each sourceis then combined to form a mixture, and is shown in Fig. 6(c).SNMF was then employed on the mixture to perform the separa-tion by taking the same steps as in scenario 1. The set of bases thatwere obtained from the training data were used and the weightswere obtained using Eq. (1.2.3) while basis were kept constant.The spectrogram of the recovered signals are shown in Figs. 7(a)and 7(b). Now the spectrogram of the test signals can be comparedwith the spectrograms of the recovered signal. By visual inspection,we see that the recovered signals spectrograms have a very good

match with the original test signals spectrograms. But visualinspection to measure the performance of an algorithm is probablynot enough, therefore, we used different ways to evaluate the per-formance of SNMF. This is explained in the following section.

4.4. Evaluation

We are employing three different ways to evaluate the perfor-mance of our proposed algorithm. Here we are only discussingthe evaluations for the single human and single horse case becauseof the length of the paper constraint. First evaluation method is tolisten to the recovered signal wave file and rate the performance.We listened to the recovered signals for both human and horse.After hearing them, we found that the human footsteps were sep-arated from the horse footsteps. A small difference between theoriginal and the recovered signals is slightly visible from theirwaveforms too. Listening to the results, we found the recoveredhuman footstep signal had better quality while the horse footstepsignal suffered some interference. In the second method, we com-puted the correlation between the test signals and the recoveredsignals and found them 96.8% and 89.6% correlated (similar) forhuman and horse signals, respectively. A portion of recovered sig-nal and the original test are plotted in Fig. 8(a) and and Fig. 8(b) forsingle human and single horse case, and Fig. 8(b) and Fig. 9(b) formultiple humans and multiple horses. These Figures highlight thecorrelation between test signals and the recovered signals. In thethird method, the SERs are computed to rate the performance ofour algorithm. The SER expressed in decibels (dB) is used and theerror is calculated by the least square error:

SERðiÞ ¼P

nðxiðnÞÞ2PnðxiðnÞ � yiðnÞÞ

2 ð3:0:5Þ

where xiðnÞ is the ith input signal and yiðnÞ is the separated counter-part. We ran the algorithm several times with different initial valueand bases number so the SER varied. The experiment results areshown in Table 1, and it shows that the SER slightly increases byincreasing the number of bases but at cost of computation.

4.5. SNMF vs. NMF

In the last segment of our discussion on results, the SNMF ap-proached employed to single channel source separation is com-pared with the conventional NMF. The test signal and thereconstructed signals for SNMF and NMF are shown in Figs. 10and 11, respectively. The test signal was taken from the single hu-man walking data set. From visual inspection we can see thatSNMF has slight improvement over the NMF method. This is dueto the reason that SNMF takes into consideration the best basesthat represent the signal because of the sparsity constraint. Butin NMF method we pick our number that represent the best basesto be considered for decomposition. In Fig. 11(b), there is morenoise found compared to Fig. 10(b) between 3–5 s. We have alsoevaluated the SER using Eq. (3.0.5), and found that SNMF has 3-dB improvement over NMF as shown in Table 1. Both techniquesdo fairly a good job in this seismic source separation but SNMF ismore efficient both computationally as well as qualitatively.

5. Conclusion

We have successfully applied SNMF to the problem of separat-ing a human footstep from a horse footsteps. NMF is a type of ma-trix decomposition method for non-negative data and its basic ideais the linear nonnegative decomposition of non-negative data. TheSNMF learns large overcomplete dictionaries of bases functions,leading to more sparse representations of individual seismic signal


than, for example, basic NMF. Inspection of the bases reveals thatthey capture fundamental properties of individual seismic signals.SNMF preserves the nonnegative requirement very well by adopt-ing simple yet effective rule of multiplicative update, and the resultof SNMF has quite definite physical meaning, and moreover, it is alow rank approximation algorithm, that can effectively save stor-age and computation resources.

Acknowledgement

This research was supported in part by an appointment to theORAU Postdoctoral Program at Army Research Laboratory adminis-tered by Oak Ridge Associated Universities. We are thankful toBrian king and Les Atlas from University of Washington for thevaluable discussion on NMF and source separation.

References

Bland, R., 2006. Acoustic and seismic signal processing for footsetp detection.Master’s thesis, Massachusetts Institute of Technology, Dept. of ElectricalEngineering and Computer Science.

Damarla, T., Ufford, D., 2007. Personnel detection using ground sensors. Proc. SPIE,656205.

Delac, K., Grgic, M., Grgic, S., 2004. Independent comparative study of PCA, ICA, andLDA on the feret data set.

Donoho, D., Stodden, V., 2003. When Does Non-Negative Matrix Factorization GiveCorrect Decomposition Into Parts? MIT Press, pp. 1–8.

Draper, B.A., Baek, K., Bartlett, M.S., Beveridge, J.R., 2003. Recognizing faces with PCAand ICA. CVIU 91 (1–2), 115–137.

Eggert, J., Korner, E., 2004. Sparse coding and nmf. In: Proc. of 2004 IEEE Internat.Joint Conf. on Neural Networks, vol. 4, July pp. 2529–2533.

Ellis, D.P.W., Weiss, R.J., 2006. Model-based monaural source separation using avector-quantized phase-vocoder representation. In: Proc. 2006 IEEE Internat.Conf. on , Acoust. Speech Signal Process. ICASSP 2006, vol. 5, May, p. V.

Gampert, R., Succi, G., Clapp, D., Prado, G., 2001. Footstep detection and tracking.Proc. SPIE 4393, 22–29.

Berger, T., Park, H., Dibazar, A., 2007. The application of dynamic synapse neuralnetworks on footstep and vehicle recognition. In: Proc. Internat. Joint Conf. onNeural Networks IJCNN 2007, pp. 1842–1846.

Berger, T., Park, H., Dibazar, A., 2008. Protecting military perimeters fromapproaching human and vehicle using biologically realistic dynamic synapseneural network. In: IEEE Conf. on Technol. for Homeland Security, May, pp. 73–78.

Berger, T., Park, H., Dibazar, A., 2009. Cadence analysis of temporal gait patterns forseismic discrimination between human and quadruped footsteps. In: IEEEInternat. Conf. on Acoust. Speech Signal Process., pp. 1749–1752.

Houston, K.M., McGaffigan, D.P., 2003. Spectrum analysis techniques for personneldetection using seismic sensors. Proc. SPIE 5090, 162–173.

Hoyer, P.O., 2002. Non-negative sparse coding. In: Proc. 2002 12th IEEE Workshopon Neural Networks for Signal Process., pp. 557–565.

Iyengar, S.G., Varshney, P.K., Damarla, T., 2007. On the detection of footsteps basedon acoustic and seismic sensing. In: Conf. Record of the Forty-First AsilomarConf. on Signals Syst. Comput. ACSSC 2007, November, pp. 2248–2252.

Jang, G., Lee, T., Cardoso, J.F., Oja, E., Amari, S.I., 2003. A maximum likelihoodapproach to single-channel source separation. J. Machine Learn. Res. 4, 1365–1392.

King, B., Atlas, L., 2010. Single-channel source separation using simplified-trainingcomplex matrix factorization. In: IEEE Internat. Conf. on Acoust. Speech SignalProcess. (ICASSP), pp. 4206–4209.

Lacombe, J., Peck, L., Anderson, T., Fisk, D., 2006. Seismic detection algorithm andsensor deployment recommendations for perimeter security. Proc. SPIE 6231,623109.

Lee, D.D., Seung, H.S., 1999. Learning the parts of objects by non-negative matrixfactorization. Nature 401 (6755), 788–791.

Lee, D.D, Seung, H.S., 2000. Algorithms for non-negative matrix factorization. NIPSpp. 556–562.

Pearlmutter, B.A., Olsson, R.K., 2006. Linear program differentiation for single-channel speech separation. Proc. 2006 16th IEEE Signal Process. Soc. Workshopon Mach. Learning for Signal Process., 421–426, Sept..

Sabatier, J., Damarla, T., Ekimov, A., 2010. Personnel detection at a border crossing.In: Proc. Military Sensing Symposium National.

Roweis, S.T., 2003. Factorial models and refiltering for speech separation anddenoising. In EUROSPEECH, 1009–1012.

Roweis, S.T., 2000. One microphone source separation. Advances in NeuralInformation Processing Systems, 13. MIT Press, pp. 793–799.

Sajda, P., Du, S., Brown, T.R., Stoyanova, R., Shungu, D.C., Mao, X., Parra, L.C., 2004.Nonnegative matrix factorization for rapid recovery of constituent spectra inmagnetic resonance chemical shift imaging of the brain. IEEE Trans. Med.Imaging 23 (12), 1453–1465.

Sajda, P., Du, S., Parra, L., 2003. Recovery of constituent spectra using non-negativematrix factorization. Storage and Retrieval for Image and Video Databases.

Schmidt, M.N., Morup, M., 2006. Nonnegative matrix factor 2-d deconvolution forblind single channel source separation. Independent Component Anal., 700–707.

Schmidt, M.N., Olsson, R.K., 2006. Single-channel speech separation using sparsenon-negative matrix factorization. In: Internat. Conf. on Spoken LanguageProcess. (INTERSPEECH).

Smaragdis, P., 2004. Non-negative matrix factor deconvolution; extraction ofmultiple sound sources from monophonic inputs. Independent ComponentAnal. Blind Signal Separat., 494–499.

Subramanian, A., Iyengar, S.G., Mehrotra, K.G., Mohan, C.K., Varshney, P.K., Damarla,T., 2009. A data-driven personnel detection scheme for indoor surveillanceusing seismic sensors. Proc. SPIE 733315, 1–15.

Separation of human and animal seismic signatures using non-negative matrix factorization

Documents

Transcript of Separation of human and animal seismic signatures using non-negative matrix factorization