Wavelet dissimilarities and clustering of economic time series · Wavelet dissimilarities and...

7
Wavelet dissimilarities and clustering of economic time series V. J. Bolós * and R. Benítez * Abstract— Time series clustering is an important field in economics and finance. Typical applications range from the identifi- cation of different economic sectors and areas to the determination of assets with similar risk patterns in portfolio selection. In clustering algorithms, data are grouped according to a predefined measure of distance or dissimilarity. Therefore, the resulting clusters depend heavily on the choice of such measure. However, in time series clustering, dissimilarity measures give a global value for the entire signal, losing the possible temporal evolution. This is an important drawback for the case of non stationary time series, which is usually the case in economics and finance. In this work, we introduce some dissimilarity measures based on the continuous wavelet transform, in particular on the wavelet coherence and the windowed scalogram difference. These wavelet tools have the advantage of giving us dissimilarity measures for each time and for each scale, allowing us to study different frequency bands while preserving the temporal information. Finally, we illustrate the use of these dissimilarities with some ex- amples of economic time series and analyze different methods for visualizing the clusters and extracting information from them. Keywords: Clustering, time series, dissimilarity measures, wavelet coherence. 1 Introduction Clustering is an unsupervised machine learning technique whereby different observational units or data are grouped into subsets (clusters) so that the elements of each cluster have a certain degree of similarity (see [8]). Clusters are established so that the similarity between the elements of the same cluster (intracluster similarity) is greater than that of the elements of different clusters (intercluster similarity). The main cluster algorithms are k-means clustering (see [9]) and hierarchichal clustering. Both algorithms are iterative. In the first, each iteration consists of two phases: assignment and update. In the assignment phase, each element x (i) is assigned to a cluster - defined by the position of its centroid c j - in such a way that the following cost function is minimized: J , c)= 1 N N X i=1 k X j=1 π ij d x (i) ,c j 2 , being N the total number of observations, k the number of clusters, Π an assignation matrix whose elements π ij ∈{0, 1} determine whether a point x (i) belongs to cluster with centroid c j or not, and d is the distance or dissmilarity measure used. In the update phase, the new position of the centroid of each cluster is defined by the average of the positions of the points assigned to it. The algorithm stops when the positions of the centroids do not change or change less than a preset threshold amount. In the hierarchical clustering algorithm the approach is some- what different. Initially, every observation defines a singleton * Dpto. Matemáticas para la Economía y la Empresa, Universidad de Valencia, Avda. Tarongers s/n, 46022–Valencia (SPAIN). Email: [email protected], [email protected] 1

Transcript of Wavelet dissimilarities and clustering of economic time series · Wavelet dissimilarities and...

Page 1: Wavelet dissimilarities and clustering of economic time series · Wavelet dissimilarities and clustering of economic time series V. J. Bolós and R. Benítez Abstract— Time series

Wavelet dissimilarities and clusteringof economic time series

V. J. Bolós∗ and R. Benítez∗

Abstract— Time series clustering is an important field in economics and finance. Typical applications range from the identifi-cation of different economic sectors and areas to the determination of assets with similar risk patterns in portfolio selection. Inclustering algorithms, data are grouped according to a predefined measure of distance or dissimilarity. Therefore, the resultingclusters depend heavily on the choice of such measure. However, in time series clustering, dissimilarity measures give a globalvalue for the entire signal, losing the possible temporal evolution. This is an important drawback for the case of non stationarytime series, which is usually the case in economics and finance. In this work, we introduce some dissimilarity measures based onthe continuous wavelet transform, in particular on the wavelet coherence and the windowed scalogram difference. These wavelettools have the advantage of giving us dissimilarity measures for each time and for each scale, allowing us to study differentfrequency bands while preserving the temporal information. Finally, we illustrate the use of these dissimilarities with some ex-amples of economic time series and analyze different methods for visualizing the clusters and extracting information from them.

Keywords: Clustering, time series, dissimilarity measures, wavelet coherence.

1 Introduction

Clustering is an unsupervised machine learning techniquewhereby different observational units or data are grouped intosubsets (clusters) so that the elements of each cluster have acertain degree of similarity (see [8]). Clusters are establishedso that the similarity between the elements of the same cluster(intracluster similarity) is greater than that of the elements ofdifferent clusters (intercluster similarity).

The main cluster algorithms are k-means clustering (see [9])and hierarchichal clustering. Both algorithms are iterative. Inthe first, each iteration consists of two phases: assignment andupdate. In the assignment phase, each element x(i) is assignedto a cluster - defined by the position of its centroid cj - in such

a way that the following cost function is minimized:

J (Π, c) =1

N

N∑i=1

k∑j=1

πijd(x(i), cj

)2,

being N the total number of observations, k the number ofclusters, Π an assignation matrix whose elements πij ∈ {0, 1}determine whether a point x(i) belongs to cluster with centroidcj or not, and d is the distance or dissmilarity measure used.In the update phase, the new position of the centroid of eachcluster is defined by the average of the positions of the pointsassigned to it. The algorithm stops when the positions of thecentroids do not change or change less than a preset thresholdamount.

In the hierarchical clustering algorithm the approach is some-what different. Initially, every observation defines a singleton

∗Dpto. Matemáticas para la Economía y la Empresa, Universidad de Valencia, Avda. Tarongers s/n, 46022–Valencia (SPAIN). Email: [email protected],[email protected]

1

Page 2: Wavelet dissimilarities and clustering of economic time series · Wavelet dissimilarities and clustering of economic time series V. J. Bolós and R. Benítez Abstract— Time series

2 V. J. Bolós et al.

cluster (leaves) and, in each iteration, the two closests clustersare joined in a bigger cluster. The process continues until allobjects are together in one single cluster (root), giving as a re-sult a tree that is usually represented as a dendrogram. Oncethe dendrogram is built, clusters are obtained by cutting thedendrogram at a certain height, determined the distance fromwhich two elements do not belong to the same cluster.

It is important to emphasize that in all clustering algorithms thedistance or measure of dissimilarity used is fundamental, sincedifferent choices of such distance can lead to totally differentresults.

This is particularly noticeable in the case of time series clus-tering, where the observational units are finite sequences witha large number of elements. While in low dimesional cluster-ing, the usual distances (euclidean distance, Manhattan, etc.)are the most used, such choice does not usually work wellwith time series. For these cases, other measures of dissimi-larity are considered, such as the correlation distance (Pearson,Eisen, etc.), dynamic time warping, periodogram dissimilarityand dissimilarity of autocorrelation among others (see [1] for arecent review on the subject).

A major drawback of these measures is that they provide aglobal dissimilarity value for the entire time series, withoutconsidering the possibility that such dissimilarity may varyover time and therefore, obtaining static clusters. In fieldsworking with non-stationary time series this may be a seriousproblem.

In this work we introduce a new dissimilarity measure basedon wavelet analysis, windowed scalogram difference, and wecompare it with another wavelet tool, the wavelet coherence,which is widely used for comparing non-stationary time series.These dissimilarity measures have the advantage of conservingthe temporal information giving as a result dynamic clusterswhich evolve over time.

2 Wavelet tools for comparing time series

In this section, we are going to introduce two wavelet tools formeasuring the similarity of two time series: the wavelet co-herence and the windowed scalogram difference. Both toolsare useful for determining a dissimilarity between two time se-ries preserving the time-frequency information. But first, wepresent a brief introduction to the wavelet theory.

A wavelet is a small “wave packet” that grows and decays ina limited time period. It is given by a function ψ in L2 (R)“centered” at the origin (more or less), with zero average andnormalized. A family of daughter wavelets ψu,s(t) can be ob-tained by scaling and translating ψ:

(1) ψu,s(t) =1√sψ

(t− us

),

where s is a scaling parameter that controls the length of the

wavelet, and u is a time location parameter that indicates wherethe wavelet is centered. Given a signal f(t) in L2 (R), its con-tinuous wavelet transform CWT with respect to the wavelet ψat time u and scale s is given by

(2) Wf (u, s) =

∫ +∞

−∞f(t)ψ∗u,s(t) dt,

where ∗ denotes complex conjugation. It represents the fre-quency components (or details) of f(t) corresponding to thescale s and time location u, providing a continuous time-frequency decomposition of f(t).

On the other hand, the dyadic version of (1) is given by

(3) ψj,k(t) =1√2kψ

(t− 2kj

2k

),

where j, k ∈ Z (note that there is an abuse of notation be-tween (1) and (3), but the context disambiguates it). It is impor-tant to construct wavelets so that the family of dyadic wavelets{ψj,k}j,k∈Z is an orthonormal basis of L2 (R). Thus, any func-tion f ∈ L2 (R) can be written as

(4) f =∑j,k∈Z

dj,kψj,k,

where dj,k = 〈f, ψj,k〉 is the discrete wavelet transform(DWT) of f at time 2kj and scale 2k. In fact, the DWT isthe particular dyadic version of the CWT given by (2).

2.1 Wavelet coherence

According to [12], the wavelet coherence (or wavelet squaredcoherence, WSC) between two time series f(t) and g(t) is de-fined by

(5) R2(u, s) =|S(s−1Wfg(u, s)

)|2

S (s−1|Wf (u, s)|2)S (s−1|Wg(u, s)|2),

where Wfg(u, s) = Wf (u, s)W ∗g (u, s) is the cross-waveletspectrum and S is a smoothing operator in both time and fre-quency. The WSC (5) ranges from 0 (no correlation) to 1 (per-fect correlation) and is analogous to the squared correlation co-efficient in linear regression. This concept is particularly use-ful for determining the regions in the time-frequency domainwhere two time series have a significant co-movement or inter-dependence.

2.2 Windowed scalogram difference

The scalogram of a time series f at a given scale s > 0 is givenby

(6) S(s) =

(∫ +∞

−∞|Wf (u, s) |2 du

)1/2

.

It captures the “energy” of the CWT of the time series f ata particular scale and allows for the identification of the most

Page 3: Wavelet dissimilarities and clustering of economic time series · Wavelet dissimilarities and clustering of economic time series V. J. Bolós and R. Benítez Abstract— Time series

Wavelet dissimilarities and clustering 3

representative scales, that is, the scales that contribute most toits total energy. So, it is clear that if two time series show asimilar pattern, then their scalograms should be very similar.In this regard, it is important to point out certain requirementsfor two time series have the same scalogram.

PROPOSITION 1 Let f ∈ L2 (R) be a time series and c ∈ R.Then, −f(t), f(t) + c and f(t + c) have the same scalogramas f(t). Moreover, if the wavelet ψ is symmetric or antisym-metric, i.e. ψ(−t) = ±ψ(t) (e.g. Haar, Mexican Hat, Morlet,etc.), then f(−t) has also the same scalogram.

It is worth highlighting that most wavelets are “almost” sym-metric or antisymmetric (e.g. Daubechies). In this case,

(7) ± f (±t+ c1) + c2

has approximately the same scalogram as f(t), where c1, c2 ∈R. So, we will say that (7) follows the same pattern as f(t).

Taking into account the decomposition of a function by meansof the DWT (4), it is convenient to use base 2 power scales, andthus

(8) S(k) =

(∫ +∞

−∞|Wf

(u, 2k

)|2 du

)1/2

,

where k ∈ R is the binary logarithm of the scale that we willcall log-scale. Again, there is an abuse of notation that will beclarified by the context, this time between (6) and (8).

Hence, the non-commutative scalogram difference of two timeseries f, g at log-scale k and log-scale radius r is given by

(9) SDr(k) =

(∫ k+r

k−r

(S(κ)− S ′(κ)

S(κ)

)2

)1/2

,

where S,S ′ represent the scalogram of f, g, respectively. How-ever, it is recommended to redefine (9) in a commutative way

(10) SDr(k) =1

2

(∫ k+r

k−r

(S(κ)

S ′(κ)− S

′(κ)

S(κ)

)2

)1/2

.

Obviously, equations (9) and (10) have sense only when thetwo series considered are expressed in the same unit of mea-sure. Otherwise, it will be necessary to somehow normalizetheir scalograms.

The windowed scalogram of a time series f centered at time twith time radius τ is given by

(11) WSτ (t, k) =

(∫ t+τ

t−τ|Wf

(u, 2k

)|2 du

)1/2

.

Based on (10) and (11), the commutative windowed scalogramdifference (WSD) of two time series f, g centered at (t, k) with

time radius τ and log-scale radius r is given by

WSDτ,r(t, k) =

(12)

1

2

(∫ k+r

k−r

(WSτ (t, κ)

WS ′τ (t, κ)− WS

′τ (t, κ)

WSτ (t, κ)

)2

)1/2

,

whereWSτ ,WS ′τ denote the windowed scalogram of f, g, re-spectively. It can be defined a non-commutative WSD from (9),but it is recommended to use the commutative version given by(12).

The WSD measures the difference between the windowedscalograms of two time series and it enables us to quantify thelevel of similarity between two time series for different finitetime and scale intervals (see [3]). However, the WSD is definedin relative terms and so, we can not compare the scalogram dif-ferences of two distinct pairs of time series that use differentunits of measure or are not normalized.

Finally, to facilitate comparison with the WSC, for which highvalues indicate a high degree of similarity, it is worthy to con-sider log2

(WSD−1

)rather thanWSD. In this way, we have

a direct relationship between the value of log2

(WSD−1

)and

the level of similarity between the patterns of the two time se-ries.

Moreover, we can perform a Monte Carlo simulation, comput-ing the WSDs of a large number of pairs of random time se-ries with the same length as the original signals f, g. Next,at each (t, k) we can divide the original WSD by the meanat (t, k) of these WSDs, thus obtaining a modified WSD inwhich values greater than 1 denote significant differences be-tween the patterns of f and g. In this case, negative valuesof log2

(WSD−1

)stand for low similarity and positive values

stand for high similarity.

3 Wavelet dissimilarities

The WSD and WSC can be used to define dissimilarity mea-sures, since both tools quantify the degree of similarity be-tween two time series. However, we will have a measure ofdissimilarity for each time and each scale, which will lead usto have a dynamic clustering. Thus, the belonging of an el-ement to a cluster will depend on the instant of time and thescale considered.

Given two time series f, g, the WSD itself can be used as adissimilarity measure, since it ranges from 0 (high similarity)to +∞ (low similarity). Nevertheless, the WSC ranges from 0(low similarity) to 1 (high similarity) and so, we have to makesome fixes. For example, taking into account that the WSC isin some way a kind of correlation (see [12]), some dissimilar-ity measures based on the WSC according to [7, 11] can be

Page 4: Wavelet dissimilarities and clustering of economic time series · Wavelet dissimilarities and clustering of economic time series V. J. Bolós and R. Benítez Abstract— Time series

4 V. J. Bolós et al.

defined:dWSC1 (u, s) =

√2 (1−R2 (u, s)),

or

dWSC2 (u, s) =

√(1−R2 (u, s)

1 +R2 (u, s)

)β,

where the parameter β ≥ 0 regulates the fast decreasing of thedissimilarity measure.

4 Examples

In this section we are going to give some examples to illustratethe use of the wavelet tools WSC and WSD with applicationsto economic time series and analyze different methods for vi-sualizing the clusters and extracting information from them.

4.1 European government bond markets

The data for this application consist of yields on 10-year gov-ernment bonds of ten euro area countries: Germany, Austria,France, Finland, Netherlands, Spain, Portugal, Greece, Italyand Ireland. The sample ranges from January 1999 to April2013, thus covering the turbulent period which includes the re-cent global financial and Eurozone debt crises. Weekly data(sampled on Wednesdays) are used.

In figures 1, 2, 3 and 4 are represented the log2

(WSD−1

)and

the WSC of Spain and Finland, respectively, with Germany.The level of similarity is indicated by color coding, whichranges from black (low similarity) to white (high similarity).The regions encircled by a thick black line represent signifi-cant areas at the 5% level. Monte Carlo methods are used toassess the statistical significance. Specifically, the significancelevel is determined with 1000 pairs of random time series of thesame length and with the same variance as the original series.The cone of influence, below which edge effects might distortthe results, is designated by a thin black line.

Figure 1: Logarithm of the inverse of the commutative WSDbetween yields on 10-year government bonds of Spain and Ger-many. The WSD has been calculated by using a window oftime radius 25 (approximately half a year) and log-scale radius4/12.

Figure 2: WSC between yields on 10-year government bondsof Spain and Germany.

Figure 3: Logarithm of the inverse of the commutative WSDbetween yields on 10-year government bonds of Spain and Ger-many. The WSD has been calculated by using a window oftime radius 25 (approximately half a year) and log-scale radius4/12.

Figure 4: WSC between yields on 10-year government bondsof Spain and Germany.

It can be seen that both wavelet tools give similar results about

Page 5: Wavelet dissimilarities and clustering of economic time series · Wavelet dissimilarities and clustering of economic time series V. J. Bolós and R. Benítez Abstract— Time series

Wavelet dissimilarities and clustering 5

the similarity of the time series. In the case of Spain (and otherperipheral countries such as Greece, Italy, Ireland and Portu-gal) there is an almost perfect bond market integration in theyears following the launch of the euro as a result of the re-moval of exchange rate risk, the nominal convergence of eco-nomic fundamentals and harmonization of fiscal and regulatoryframeworks within the European Monetary Union (EMU). Incontrast, the time interval after the collapse of the U.S. bankLehman Brothers in September 2008 is characterized by the de-coupling of 10-year government bond yields of the peripheralcountries relative to those of Germany for virtually all scales.

On the other hand, it is worth noting that the sharp decline inbond market integration since the aggravation of the financialcrisis in late 2008 above documented for peripheral countriesis not found for Finland and other European core countries.Specifically, a certain reduction in the level of bond market in-tegration is observed during the hardest stage of the Europeandebt crisis. However, the values of the log2

(WSD−1

)are not

only positive but greater than 1, suggesting a high level of co-movement along all the sample. Therefore, this evidence seemsto indicate that the fragmentation of government bond marketsduring the recent financial crisis period has primarily affectedEuropean peripheral countries.

4.2 More government bond markets

In this example, we are going to consider yields on 10-yeargovernment bonds of several world countries: Germany, Aus-tria, France, Finland, Netherlands, Denmark, Norway, Sweden,Spain, Portugal, Italy, Ireland, United Kingdom, Switzerland,USA, Canada, Japan and EMU. The sample ranges from Jan-uary 1995 to November 2012. In this case, monthly data (sam-pled on 1st) are used.

In figures 5, 6, 7, 8, 9 and 10 we have represented some dendro-grams resulting from applying the dissimilarity measure givenby the WSD. In these dendrograms we have choosen a cuttingheight such that the result is formed by 5 clusters. Moreover,we have represented several dendrograms to show that the clus-tering depends on a given scale (6 months for figures 5, 6, 7,8, 12 months for figure 9, and 48 months for figure 10) andalso on time, obtaining a different dynamic clustering for eachscale.

Figure 5: Dendrogram at a scale of 6 months on January 1995.

Figure 6: Dendrogram at a scale of 6 months on March 2001.

Page 6: Wavelet dissimilarities and clustering of economic time series · Wavelet dissimilarities and clustering of economic time series V. J. Bolós and R. Benítez Abstract— Time series

6 V. J. Bolós et al.

Figure 7: Dendrogram at a scale of 6 months on October 2005.

Figure 8: Dendrogram at a scale of 6 months on November2012.

Figure 9: Dendrogram at a scale of 12 months on January 1995.

Figure 10: Dendrogram at a scale of 48 months on January1995.

Page 7: Wavelet dissimilarities and clustering of economic time series · Wavelet dissimilarities and clustering of economic time series V. J. Bolós and R. Benítez Abstract— Time series

Wavelet dissimilarities and clustering 7

4.3 Clustering using graphs

Another possible representation of the clustering is by meansof a graph (see [4]). It is an alternative to the dendrogram andit is specially useful for dynamic clustering. From the dissim-ilarity matrix, a complete graph is constructed whose edgesare weighted according to the corresponding dissimilarity mea-sure. Then, the minimum spanning tree of such complete graphis computed (see [10]), giving as a result a graph without loopsand that connects all its vertices with a minimum total weight.Finally, a threshold distance can be set for determining if twovertices are joined or not and establishing the clusters. Therole played by this threshold distance is the same as the cuttingheight in the dendrogram technique.

For example, if we use the data of the first example takingthe dissimilarity measure given by the WSD and a scale of40 weeks, we obtain the graphs given in figures 11 and 12 onNovember 2000 and May 2008, respectively. The thresholddistance is the same in both graphs and it has been taken con-sidering the mean of the distances between some core coun-tries. It can be seen that in the first graph all the Europeancountries considered are in the same cluster, but in the secondone, after the financial crisis, the peripheral countries are dis-connected, forming their own clusters.

Figure 11: Graph of the clustering of yields on 10-year govern-ment bonds of some European countries at a scale of 40 weekson November 2000.

Figure 12: Graph of the clustering of yields on 10-year govern-ment bonds of some European countries at a scale of 40 weekson May 2008.

References[1] S. Aghabozorgi et al. Time-series clustering - A decadal review. Infor-

mation systems. 53 (2015), 16–38.

[2] N. Basalto et al., Hausdorff clustering of financial time series . PhysicaA: Statistical Mechanics and its Applications. 379 (2007), 635–644.

[3] V. J. Bolós et al., The windowed scalogram difference: a novel wavelettool for comparing time series . Preprint submitted to Applied Mathe-matics and Computation.

[4] R. Diestel, Graph Theory. Electronic Edition. Springer-Verlag. NewYork, 2000.

[5] P. D’Urso et al., Clustering of financial time series . Physica A: Statisti-cal Mechanics and its Applications. 392 (2013), 2114–2129.

[6] J. G. Dias et al., Clustering financial time series: New insights froman extended hidden Markov model . European Journal of OperationalResearch. 243 (2015), 852–864.

[7] X. Golay et al., A new correlation-based fuzzy logic clustering algo-rithm for fMRI . Magnetic Resonance in Medicine. 40 (2005), 249–60.

[8] T. Hastie et al., Unsupervised Learning in The elements of statisticallearning: data mining, inference and prediction . Springer-Verlag. NewYork, 2009, 485–585.

[9] S. P. Lloyd, Least squares quantization in PCM . Technical report, BellLaboratories. Published in 1982 in IEEE Transactions on InformationTheory. 28 (1982), 129–137.

[10] R. N. Mantegna, Hierarchical structure in financial markets . The Euro-pean Physical Journal B. 11 (1999), 193–197.

[11] P. Montero, J. A. Vilar, TSClust: an R package for time series cluster-ing . Journal of Statistical Software. 62 (2014), 1–43.

[12] C. Torrence, P. J. Webster, Interdecadal changes in the ENSO-monsoonsystem . Journal of Climate. 12 (1999), 2679–2690.