WORKING PAPER CMVM - CMVM - Homepage · WORKING PAPER CMVM Modeling and Forecasting Liquidity using...

34
WORKING PAPER CMVM C OMISSÃO DO M ERCADO DE V ALORES M OBILIÁRIOS * Nº 03/ 2013 MODELING AND FORECASTING LIQUIDITY USING PRINCIPAL COMPONENT ANALYSIS AND DYNAMIC FACTOR MODELS AN ILLIQUIDITY COMPOSITE INDICATOR PROPOSAL

Transcript of WORKING PAPER CMVM - CMVM - Homepage · WORKING PAPER CMVM Modeling and Forecasting Liquidity using...

Page 1: WORKING PAPER CMVM - CMVM - Homepage · WORKING PAPER CMVM Modeling and Forecasting Liquidity using Principal Component Analysis and Dynamic Factor Models An Illiquidity Composite

WORKING

PAPER

CMVM C O M I S S Ã O D O M E R C A D O D E V A L O R E S M O B I L I Á R I O S * N º 0 3 / 2 0 1 3

MODELING AND FORECASTING

LIQUIDITY USING PRINCIPAL

COMPONENT ANALYSIS

AND DYNAMIC FACTOR MODELS

AN ILLIQUIDITY COMPOSITE

INDICATOR PROPOSAL

Page 2: WORKING PAPER CMVM - CMVM - Homepage · WORKING PAPER CMVM Modeling and Forecasting Liquidity using Principal Component Analysis and Dynamic Factor Models An Illiquidity Composite

WORKING PAPER

CMVM

Modeling and Forecasting Liquidity

using Principal Component Analysis

and Dynamic Factor Models

An Illiquidity Composite

Indicator Proposal

Paulo Pereira da Silva*

CMVM-Portuguese Securities Commission

Rua Laura Alves nº 4

Apartado 14258

1064-003 LISBOA

Email: [email protected]

* The views stated herein are those of the authors and not those of the Portuguese Securities Commission.

Page 3: WORKING PAPER CMVM - CMVM - Homepage · WORKING PAPER CMVM Modeling and Forecasting Liquidity using Principal Component Analysis and Dynamic Factor Models An Illiquidity Composite

ABSTRACT

I survey and describe the main liquidity proxies used in the literature, highlighting

some of their merits. Some theoretical background and motivation for the usage of

PCA and DFM in the design of a liquidity composite indicator is provided. I apply the

PCA/ DFM to a set of nine liquidity proxies over a group of four western European

equity markets. The emphasis is placed in extracting a latent variable – a liquidity

component that captures the co-movement of the proxies. Besides the signal

extraction, stress testing for equity market liquidity is illustrated. Finally, I also

present some applications regarding the suitability of DFM to model and forecast

future liquidity.

W O R K I N G P A P E R N º 3 / 2 0 1 3

03

Page 4: WORKING PAPER CMVM - CMVM - Homepage · WORKING PAPER CMVM Modeling and Forecasting Liquidity using Principal Component Analysis and Dynamic Factor Models An Illiquidity Composite

1. INTRODUCTION

The microeconomic concept of liquidity is multidimensional and comprehends

several dimensions, of which five are well documented in the literature:

(i) tightness, (ii) immediacy, (iii) depth, (iv) breadth and (v) resilience (see Sarr

and Lybek, 2002 for a quick survey).

Tightness refers to reduced transaction costs. Immediacy reflects the velocity by

which orders are transmitted to the market and settled. Depth concerns the

presence of abundant orders both above and below the price at which the security

is trading. Breadth refers to the existence of numerous and large in volume orders

with minimal impact on prices. Finally, resilience is associated to the market ability

to correct order imbalances, which tend to move the price away from the intrinsic

value of the security. In short, market participants perceive a security as liquid if

they can quickly sell large amounts of the security without affecting its price.

Liquidity is not directly observable. Since there are several dimensions of liquidity,

there are also numerous different empirical measures. Several proxies are often

used but none captures all the dimensions of the concept. In this regard, Goyenko

et al. (2009) perform a horserace of both monthly and annual liquidity measures to

evaluate their merits. Sarr and Lybek (2002), Lesmond et al. (1999), Hasbrouck

(2004, 2009) and Lesmond (2005) also compare several liquidity proxies based on

monthly and daily data.

In this paper, I propose the use of a composite indicator of liquidity based on a well

-known static method, Principal Component Analysis (PCA, henceforth), and

dynamic factor models. One of the main characteristics of these methods is their

ability to capture the main features of the data. Regarding PCA techniques, I will

use them to extract a few key, uncorrelated liquidity latent variables – which are

called the principal components – from a larger set of correlated liquidity proxies.

The suitability of these techniques will depend on the correlation of the proxies:

the higher the correlation between the original set of variables, the better this

technique will perform. In effect, a highly correlated set of variables means that it

will require only a few principal components to characterize the latent(s) variable

(s). PCA takes historical data on movements in the proxies and attempts to define a

set of orthogonal components that explain the movements. The PCA methodology is

derived from an eigenvalue analysis of a large covariance matrix of several

commonly used variables that proxy liquidity. The basic idea is that the main

M O D E L I N G A N D F O R E C A S T I N G L I Q U I D I T Y . . .

04

Page 5: WORKING PAPER CMVM - CMVM - Homepage · WORKING PAPER CMVM Modeling and Forecasting Liquidity using Principal Component Analysis and Dynamic Factor Models An Illiquidity Composite

factors represent the common trend of liquidity over the analyzed time span. PCA

permits to reduce the number of liquidity proxies to a manageable dimension and

to detect its sources of variability.

Notice that proxies with higher correlations are considered more capable of captur-

ing liquidity, provided that they convey the same information (Naes et al., 2011). A

good liquidity proxy should capture time-series variation in liquidity. So, PCA and

dynamic factor models can be used to assess the liquidity of stock markets and to

capture the co-movement of different correlated proxies. In

addition, dynamic factor models also capture persistence in liquidity and allow

making one-step-ahead predictions for the liquidity proxies and latent liquidity.

In the second section, I survey and describe the liquidity proxies used in the study

highlighting some of their merits. I use the majority of the proxies proposed by

Sarr and Lybek (2002), Zhang (2010) and by Goyenko et al. (2005).

In the third section some theoretical background and motivation for the usage of

PCA and dynamic factor models in the design of the liquidity composite indicator is

provided.

In the fourth section, I apply the PCA/ dynamic factor models to a set of nine

liquidity proxies over a group of four western European equity markets. The

emphasis is placed in extracting a latent variable – a liquidity component – from

the proxies described in section 2. Besides the signal extraction, stress testing for

equity market liquidity is performed. In this section I also present some

applications regarding the suitability of dynamic factor models to model and

forecast future liquidity.

2. Liquidity proxies

Nine liquidity proxies are used in this paper. Three of them are closely related to

transaction costs (bid-ask spread, effective bid-ask spread and Roll’s modified

measure), four are associated to market impact (Amihud illiquidity indicator, HHL,

Zeros and Market-Efficiency coefficient) and the final two are related to breadth

and depth (value turnover and turnover ratio).

i) Value turnover: indicator of realized liquidity that is computed as the daily

sum of the value of all the transactions. Benston and Hagerman (1974) and

Stoll (1978) argue that value turnover, volatility and price influence liquidity.

W O R K I N G P A P E R N º 3 / 2 0 1 3

05

Page 6: WORKING PAPER CMVM - CMVM - Homepage · WORKING PAPER CMVM Modeling and Forecasting Liquidity using Principal Component Analysis and Dynamic Factor Models An Illiquidity Composite

ii) Turnover ratio: defined as the ratio between the value turnover and the

market capitalization of a listed company:

iii) Bid-ask spread: measured as the absolute difference between bid and ask

prices or as a percentage spread. The latter is more convenient in compari-

sons of different securities provided that higher prices tend to exhibit higher

absolute spreads. The bid-ask spread is a measure of implicit transaction

costs: high transaction costs reduce the demand for trades and, thus, the

number of potentially active participants in a market. Concurrently, the

reduction of the number of participants in the market due to high transaction

costs influences market breadth and resilience. According to Glosten and

Milgrom (1985), bid-ask spreads may also reflect the degree of information

asymmetry. The absolute bid-ask spread is expressed as:

where and are the ask and bid prices, respectively. The percentage spread is

defined as:

iv) Effective bid-ask spread is also used to capture transaction costs.

where is the trading price of the security and the prevailing mid-quote when

the trade occurs.

v) Roll (1984) proposes an estimator of the effective spread based on the

serial covariance of the changes in prices. Suppose that the unobservable

fundamental value of a stock is a random walk with the following stochastic

behavior:

where is a white noise. The last observed trade price on day t is given by

where is the effective spread and is a categorical variable that equals 1 if the

last trade was buyer initiated and -1 otherwise. Roll (1984) assumes equal proba-

bilities for each of the possible values. In addition, he considers that is serially

uncorrelated and independent of such that:

06

M O D E L I N G A N D F O R E C A S T I N G L I Q U I D I T Y . . .

Page 7: WORKING PAPER CMVM - CMVM - Homepage · WORKING PAPER CMVM Modeling and Forecasting Liquidity using Principal Component Analysis and Dynamic Factor Models An Illiquidity Composite

The serial covariance might be written as:

The effective bid-ask spread (Roll’s estimator) can be expressed as:

When the sample covariance is positive, the formula is undefined. Thus, a modified

version of the Roll’s estimator presented by Goyenko et al. (2009) is defined as

vi) Amihud (2000)1 suggests the following ratio as an indicator of market impact:

where is the stock return at t and is the value turnover in Euro at

t. One drawback of this indicator is its non-definition for zero volume days. None-

theless, it is useful to capture the price impact of trades and is widely used as a li-

quidity proxy. Hasbrouck (2009) argues that among the daily proxies, the Amihud

illiquidity measure is most strongly correlated with the transactions and quotes

based price impact coefficient. On the other hand, the liquidity effect of asymmetric

information is most likely captured in the price impact of a trade (Glosten and Har-

ris, 1988). Acharya and Pederson (2005), Watanabe and Watanabe (2008), Spigiel

and Wang (2005), Avramov et al. (2006) and Kamara et al. (2008) use the Amihud

proxy to assess commonality in liquidity among stocks.

vii) Zeros. Lesmond et al. (1999) compute the proportion of days with zero

returns as a proxy for illiquidity. They present two reasons to support this

indicator: (i) securities with lower liquidity are more likely to have zero volume

days and thus more likely to have zero return days; (ii) stocks with higher

transaction costs have less private information acquisition (since it is more

difficult to overcome higher transaction costs) and thus, even on positive

volume days, they are more likely to have no-information-revelation, zero

return days.

1- Acharya and Pedersen (2004) also adopted this indicator.

W O R K I N G P A P E R N º 3 / 2 0 1 3

07

Page 8: WORKING PAPER CMVM - CMVM - Homepage · WORKING PAPER CMVM Modeling and Forecasting Liquidity using Principal Component Analysis and Dynamic Factor Models An Illiquidity Composite

viii) Hui-Heubel Liquidity ratio (HHL) attempts to capture the price impact, breadth

and resilience dimensions of liquidity. It relates the volumes of trades and

their impact on prices, and is computed as an average of 5-day periods in a

sample, in order to smooth volatility.

iThe Hui-Heubel Liquidity ratio uses the turnover ratio in the denominator,

scaling price movements by the speed of rotation of the equity in the markets.

The higher the liquidity of an asset, the lower will HHL be.

ix) The Market-Efficiency Coefficient (MEC) was proposed by Hasbrouck and

Schwartz (1988) to distinguish short-term from long-term price changes. In-

deed, price movements are more continuous in liquid markets, even if new in-

formation influences equilibrium prices and consequently, for a given perma-

nent price change, the transitory changes to that price should be minimal in

resilient markets.

where is the variance of returns over the longer period, is the vari-

ance of the return of the shorter period and T is the number of shorter periods

embedded in the longer period.

MEC should be close to one in more resilient markets (even though, slight lower

than one), in the sense that overreaction and underreaction to new information

should be minimal. Prices of assets with high market resilience may exhibit lower

volatility (less transitory changes) between periods in which the equilibrium price is

changing. Excessive short term volatility/overshooting leads to significantly lower

than one MEC figures.

2. EXTRACTING LATENT VARIABLES

A. The Principal Component Approach

Principal component analysis (PCA) is a method for detecting patterns in data and

to emphasize similarities and differences in variables. PCA reduces the dimension of

the data, that is, attempts to reduce the number of variables to analyse without

08

M O D E L I N G A N D F O R E C A S T I N G L I Q U I D I T Y . . .

Page 9: WORKING PAPER CMVM - CMVM - Homepage · WORKING PAPER CMVM Modeling and Forecasting Liquidity using Principal Component Analysis and Dynamic Factor Models An Illiquidity Composite

much loss of information. Put differently, it aims to explain the variability of a set of

variables through a new smaller set of new non-correlated/orthogonal latent varia-

bles. Thus, one may describe the variation in a set of correlated variables using a

smaller new set of uncorrelated factors.

Each component is computed in order to consider for the maximum possible varia-

tion of the initial dataset. The first component will be the most relevant. It denotes

the linear combination of the original variables that yields the higher sample vari-

ance (eigenvalues) among all the possible linear combinations. If the variables dis-

play a high correlation altogether, the first component usually denotes a common

trend. Fewer components ease the analyst task of providing an intuitive meaning to

the set of components. The interpretation of the components is usually guided by

the level of correlation of each variable with a particular component.

Principal component analysis consists in the spectral decomposition of a covariance

matrix or of a correlation matrix. Performing PCA is equivalent to determining the

eigenvalues and eigenvectors of the covariance (correlation) matrix. If PCA is calcu-

lated using the correlation matrix then the outcome will only be affected by the cor-

relations of variables, but if the input to PCA is the covariance matrix the results

will depend not only on the correlations of the variables but also on their standard

errors. Indeed, the representation of the principal components based on the covari-

ance (correlation) provides a linear representation for the (standardized) liquidity

proxies.

It can be shown that there is no general association between the spectral decompo-

sition of a covariance matrix and the spectral decomposition of the corresponding

correlation matrix. Accordingly, there is no general association between the princi-

pal components of a covariance matrix and those of its correlation matrix. Though,

if the variables have similar standard errors, both methods should yield similar re-

sults. PCA is sensitive to the units of measurement, which determine variances and

covariances. In our case, it is preferable to work with correlation matrix because

correlation is not affected by the scale of the variables.

One should also note that the PCA is one of the simplest of many dimension

reduction methodologies that transform a set of correlated variables into a set of

uncorrelated variables. The main difference between the PCA and other factor

analysis methods derives from the fact that the former seeks to identify a small

number of factors to explain the total variation of the dataset while the latter place

09

W O R K I N G P A P E R N º 3 / 2 0 1 3

Page 10: WORKING PAPER CMVM - CMVM - Homepage · WORKING PAPER CMVM Modeling and Forecasting Liquidity using Principal Component Analysis and Dynamic Factor Models An Illiquidity Composite

the emphasis on using a small number of hypothetical random variables to explain

the correlations or covariances in a multivariate dataset.

PCA can be applied to any set of stationary time series, regardless of the level of

correlation of the set of variables. The assumption that the variables are normally

distributed is not required, only that they have finite variances and covariances.

Standard variances and covariances are not robust and are sensitive to outliers.

In order to make PCA insensitive to outliers, robust versions of variances and

covariances are necessary.

PCA can be implemented in three steps:

1. Calculate the covariance or correlation matrix of the original dataset.

2. Derive the eigenvalues and the eigenvectors of that matrix. Next, rank/order the

eigenvalues by their value. The first principal component is associated to the higher

eigenvalue; the second principal component is associated to the second higher

eigenvalue, and so on.

The first component explains the most variation of the dataset. In very highly

correlated datasets, this component captures an almost parallel shift in all varia-

bles, and more generally it is labelled the common trend component (in our case it

captures the most often experienced type of common movement in all the liquidity

proxies). The second eigenvector belongs to the second largest eigenvalue, and

therefore the second component explains the second most variation in the dataset.

3 - Let X be the time series dataset, V the covariance matrix (correlation matrix)

and P the principal components. There is a representation of the data such that:

where W is a p-by-p matrix whose columns are the eigenvectors of XTX (factor

scores matrix).

PCA allows transforming the original data into a system of orthogonal factors.

Consider the following PCA representation of k liquidity proxies:

10

M O D E L I N G A N D F O R E C A S T I N G L I Q U I D I T Y . . .

Page 11: WORKING PAPER CMVM - CMVM - Homepage · WORKING PAPER CMVM Modeling and Forecasting Liquidity using Principal Component Analysis and Dynamic Factor Models An Illiquidity Composite

where denotes the liquidity proxy ( and is the factor loading of li-

quidity proxy . Thus, if the j principal components moves by , the liquidity proxy

will move by , ceteris paribus. Factor Score i is provided by the following

expression:

is the i largest eigenvalue and corresponds to the eigenvector associated to

eigenvalue i. The correlation/covariance matrix of the principal components is diag-

onal, given that the factors are uncorrelated. As for the variance of the principal

components it will be equal to the corresponding eigenvalue.

The total variation of the original time series is provided by the sum of the eigen-

values of the covariance (correlation) matrix. Consequently, one can assess the

contribution of each factor by dividing its eigenvalue by the sum of the eigenvalues:

The capacity of PCA to reduce dimensions, combined with the use of orthogonal

variables for risk factors, makes this technique an extremely attractive option for

Monte Carlo simulation and scenario analysis.

B. Dynamic Factor Models

Dynamic Factor Models (DFM) are flexible models for multivariate time series. DFM

aim to combine the cross-section analysis through Principal Components Analysis

and the time series dimension of data through linear regression modelling (Federici

and Mazzitelli, 2010). These models allow for serial and mutual correlation of the

idiosyncratic errors. One advantage of factor models lies in the fact that they may

use information from many variables without running into scarce degrees of free-

dom, which is a problem frequently faced in regression analyses. Because of their

ability to simultaneously and consistently model data sets in which the number of

series exceeds the number of time series observations, these types of models have

received considerable attention in the past decade. Breitung and Eickmeier (2005)

point two other reasons to use factor models: the idiosyncratic movements which

possibly include measurement errors and local shocks can be eliminated with this

technique and one does not need to rely on overly tight assumptions as is some-

times the case in structural models.

W O R K I N G P A P E R N º 3 / 2 0 1 3

11

Page 12: WORKING PAPER CMVM - CMVM - Homepage · WORKING PAPER CMVM Modeling and Forecasting Liquidity using Principal Component Analysis and Dynamic Factor Models An Illiquidity Composite

Coppi and Zannela (1978) introduced Dynamic Factor Models. They seek to decom-

pose the covariance matrix (V) of a set of time series variables into three distinct

covariance matrices:

where represents the variability of the data structure without taking the time

dimension into account (it equals the covariance matrix of the average of the units

with respect to time); reflects the variability, due to the time dimension, of the

average of the units, regardless of the dynamics of the single units; measures the

variability due to the difference between the dynamics of the overall average of the

units, that is the average dynamics, and the dynamics of the single units.

The observed endogenous variables are linear functions of exogenous covariates

and unobserved factors, which have a vector autoregressive structure, and thus are

persistent over time. In this framework, the unobserved factors can also be a func-

tion of exogenous covariates. The error terms in the equations for the dependent

variables may be autocorrelated.

Stock and Watson (2010) divide the time-domain estimation of DFM into three gen-

erations. The first generation is based in low-dimensional (small N) parametric

models estimated in the time domain using Gaussian maximum likelihood estima-

tion and the Kalman filter. This methodology provides optimal estimates of the fac-

tors (and optimal forecasts) under the model assumptions and parameters. Not-

withstanding, this estimation method requires nonlinear optimization, which may be

a serious drawback due to convergence issues. The second generation of estimators

involves nonparametric estimation with a large set of variables using cross-

sectional averaging methods, in particular principal components and related meth-

ods. The principal components estimator of the space spanned by the factors is

consistent. Moreover, if N is sufficiently large, then the factors are estimated pre-

cisely enough to be used as data in later regressions (Stock and Watson, 2010).

The third generation uses consistent nonparametric estimates of the factors to esti-

mate the parameters of the state space model used in the first generation solving

the dimensionality problem of first-generation models.

As Principal Components, latent factors estimated this way is sometimes referred to

as extracting or estimating an indicator. The principle of a dynamic factor model is

that a few latent dynamic factors lead the comovements of a high-dimensional

set of time-series variables, which is also influenced by a vector of mean-zero

idiosyncratic disturbances. The error term arises from measurement errors and

M O D E L I N G A N D F O R E C A S T I N G L I Q U I D I T Y . . .

12

Page 13: WORKING PAPER CMVM - CMVM - Homepage · WORKING PAPER CMVM Modeling and Forecasting Liquidity using Principal Component Analysis and Dynamic Factor Models An Illiquidity Composite

from special aspects of the individual series. The latent factors follow a time series

process, which is commonly taken to be a vector autoregression (VAR) (Stock and

Watson, 2010).

For each of the analyzed countries, I estimate the following dynamic factor model:

where denotes a latent variable that represents a common movement in

liquidity and follows an AR(1) process. The model is estimated in its state space

representation using stationary Kalman Filter. represents persistence in liquidity.

Thus, if liquidity is persistent it may also be foreseeable, which is an additional

advantage over the standard PCA method. represents the expected value for the

proxy i during normal periods and is the sensibility of proxy i to movements in the

latent variable. Notice that I use seasonally adjusted variables in the estimation,

and consequently there is no need to model seasonality.

4. APPLICATION TO FOUR WESTERN EUROPEAN COUNTRIES

A. Principal Components Analysis

In this section, I apply the PCA methodology to measure the liquidity of four west-

ern European equity markets. As said before, liquidity is a latent variable, that is, it

is not directly observed. Nine well documented proxies of liquidity are used to cap-

ture the movements of liquidity: bid-ask spread, effective bid-ask spread, Roll’s

modified measure, Amihud illiquidity indicator, HHL, Zeros, MEC, turnover and turn-

over ratio.

The analysis is based on monthly data, given that some of the proxies are only

available at a monthly basis (e.g. zeros, MEC, Roll’s modified measure). In order to

calculate the values of the proxies, I collect daily data from Bloomberg, namely last

trade prices, bid and ask prices, market capitalization and turnover. The data

collected covers the period that ranges between 2000 and 2012, and 2043 securi-

ties traded in France, Italy, Spain and Portugal. All the securities (active or inactive)

W O R K I N G P A P E R N º 3 / 2 0 1 3

13

Page 14: WORKING PAPER CMVM - CMVM - Homepage · WORKING PAPER CMVM Modeling and Forecasting Liquidity using Principal Component Analysis and Dynamic Factor Models An Illiquidity Composite

present in MiFid database at 31-12-2012 are included in the main sample.

In order to obtain country aggregate values for the liquidity proxies, the following

procedure is conducted:

i) Firstly, logs are introduced to smooth the path of some of the proxies (Amihud

indicator, HHL, turnover, bid-ask spread and effective bid-ask spread);

ii) next, the (average) monthly proxies for each of the individual securities is

calculated;

iii) at last, the weighted averages of the monthly liquidity proxies are computed

for each market, using the securities market capitalization as weights (a 20%

cap is introduced to reduce the dependency of the aggregate liquidity proxies

to a few set of securities).

The first step in PCA consists in the computation and analysis of the correlation

matrix. As expected, the reported results show some similarities in the data: the

correlation matrix shows a high linear statistical association between the (ln)

Amihud Indicator and the (ln) HHL; and between the Turnover Ratio and the

Turnover; MEC exhibits a low correlation with the other variables (Table 1).

In a first stage, the PCA method is applied to the nine variables (Table 2). The KMO

measure and Bartlett's Test of Sphericity suggest that the application of the PCA

method provides good results for Portugal and Italy (KMO higher than 0.7) and

acceptable for Spain and France (KMO higher than 0.5). In the case of Portugal, the

first component accounts for 44.9% of the variance of the data. The (ln) Amihud

Indicator, (ln) HHL, and (ln) Turnover are the variables with a higher percentage of

the variance explained by the extracted component. For Italy, the first component

explains 37.7% of the total variance, whereas for France and Spain that percentage

drops to 32.6% and 29.7%, respectively. In general, the (ln) Amihud Indicator, the

(ln) HHL and effective bid-ask spread are the proxies with higher contribution to the

first principal component.

In a second stage, I reduce the number of variables to six. Roll’s modified measure,

Zeros and Market-Efficiency coefficient seem to have little impact in the

co-movement of the proxies according to the factor scores. They are very indirect

measures of liquidity that account for little correlation with the first principal

component, and for that reason they are dropped in further analyses.

Notice that I am not taking seasonality into consideration. Two different approaches

M O D E L I N G A N D F O R E C A S T I N G L I Q U I D I T Y . . .

14

Page 15: WORKING PAPER CMVM - CMVM - Homepage · WORKING PAPER CMVM Modeling and Forecasting Liquidity using Principal Component Analysis and Dynamic Factor Models An Illiquidity Composite

are used to address seasonality. In the first, raw data is used to run the PCA and

then seasonality is extracted from the principal component using the additive meth-

od of seasonality decomposition. In the second, seasonality is removed from the

raw data using again the additive seasonality decomposition method. The different

approaches yield similar results. Although both approaches are performed simulta-

neously, I put more emphasis in the second one in the subsequent analysis.

After dropping Roll’s modified measure, Zeros and Market-Efficiency coefficient, the

performance of PCA increases dramatically for some of the analyzed equity markets

(Table 3). In Portugal and Italy the first principal component now represents 65.7%

and 46.5% of the variance of the sub dataset. In Spain and France, the total vari-

ance explained increases to 40.2% and 46.5% of the variance of the proxies, re-

spectively. Communality analysis shows that the Amihud indicator, HHL, bid-ask

spread or effective bid-ask spread are the proxies that capture a higher percentage

of the variation of the first principal component.

Figure 1 displays the first component evolution between January 2000 and Decem-

ber 2012 and allows identifying the pattern of the latent variable. For instance, all

the analyzed countries exhibit a decline of liquidity after 2008, due to the interna-

tional financial crisis. PCA indicates that at the end of 2012 the liquidity level was

still above the level displayed in 2008 in three countries (the exception is France).

In 2010, Spain had already recovered the liquidity level displayed before the crisis.

Notwithstanding, the recovery process was reverted in the end of 2011 with the

European sovereign debt crisis.

One of the advantages of using the PCA approach is its flexibility to model the cor-

relations between variables. The factor loadings obtained from PCA allow designing

a stress test approach, where the impact of aggregate liquidity shocks over the

proxies is simulated. This stress test exercise is presented in Figure 2. For example,

regarding Spain the effective bid-ask spread rises from 0.362% in normal times, to

0.841% in stress periods. In France and Italy, the HHL and Amihud indicators more

than double in highly stress periods, meaning that negative shocks in aggregate li-

quidity affects price impact measures in a large extent in these countries.

Instead of performing PCA in the liquidity proxies, one might consider as an alter-

native their changes over time. Table 4 shows the correlation matrix of the first

differences of the liquidity proxies. Changes in the Amihud indicator and HHL are

highly correlated in the four countries. The same occurs with the (ln) turnover and

W O R K I N G P A P E R N º 3 / 2 0 1 3

15

Page 16: WORKING PAPER CMVM - CMVM - Homepage · WORKING PAPER CMVM Modeling and Forecasting Liquidity using Principal Component Analysis and Dynamic Factor Models An Illiquidity Composite

the turnover ratio (except for Italy). Table 5 presents the total variance explained

by the first and second principal component, communalities, factor loadings and

factor scores for each proxy. With the exception of Italy, the first two principal

components account for more than 60% of the variance of the six liquidity proxies.

These principal components represent different dimensions of liquidity. One way of

assessing the economic interpretation of the principal components is through the

analysis of the factor loadings, which in some sense represent the correlation be-

tween the factors and the liquidity proxies. For instance, in the case of Spain the

first principal component is associated with breath (provided its correlation with the

turnover ratio and log turnover), whereas the second principal component denotes

price impact and transaction costs. In the case of France, the first principal compo-

nent represents price impact, whilst the second denotes breadth. At last, in Italy

and Portugal the first principal component is highly correlated with the price impact

measures.

Using differences instead of levels in PCA also permits simulating the impact of

shocks in the latent variables over the liquidity proxies. Figure 3 shows 95% confi-

dence intervals for the liquidity proxies (corrected for seasonality). In the case of

Portugal, stress testing indicates a decline of the turnover ratio of 0.37% in the

event of a 1.645 standard deviation shock in the liquidity latent variables. In Spain,

the turnover ratio is not particularly affected by changes in the latent variable, and

in France and Italy that impact is also of minor importance. However, a 1.645

standard deviation liquidity shock in the latent variables may have serious reper-

cussions in price impact measures: the HHL indicator more than triples in Spain and

the effective bid-ask spread increases by 30 basis points in France.

B. Dynamic Factor Models

DFM permit modelling the dynamics of our variables of interest and unobserved

components in a VAR framework. I estimate a DFM for the liquidity proxies of each

country assuming that the latent variable displays first order autocorrelation and

that the behaviour of the proxies is explained by this latent variable and a disturb-

ance term. Moreover, I am assuming that the dynamics of the proxies are solely

described by the dynamics of the unobserved liquidity. This analysis focuses on the

bid-ask spread, effective bid-ask spread, Roll’s modified measure, Amihud illiquidity

indicator, HHL and turnover ratio. Value turnover is excluded due to convergence

issues and possible non-stationary of the series.

M O D E L I N G A N D F O R E C A S T I N G L I Q U I D I T Y . . .

16

Page 17: WORKING PAPER CMVM - CMVM - Homepage · WORKING PAPER CMVM Modeling and Forecasting Liquidity using Principal Component Analysis and Dynamic Factor Models An Illiquidity Composite

The estimated models reveal that the unobserved component is very persistent

over time. The value of is statistically different from zero and ranges between

0.86 and 0.94. Moreover, coefficients associated to the variables ln(1+ Amihud

indicator), ln(1+ HHL), ln(1+ bid-ask spread) and ln(1+ effective bid-ask spread)

are statistically significant at the 5% significance level for all countries. The liquidity

component does not seem to explain turnover ratio in France, Spain and Italy.

The forecasting ability of the DFM is also tested. In order to do so, the sample is

divided in two subsamples: from January 2000 to December 2010 and from Janu-

ary 2011 to December 2012. I re-estimate the model for the first period and use

the second for out-of-sample forecast. Two different measures of forecasting accu-

racy are computed, namely MAPE and RMSE. I also compare the accuracy of DFM

with the use of the historical mean, in terms of forecasting. To do so, RMSE-R

squares is computed and the Diebold and Mariano test is run. DFM presents a re-

markable forecasting accuracy in the cases of Portugal and Spain, where the

Diebold and Mariano t-stat is statistically significant at the 90% level in all the li-

quidity proxies, with the exception of turnover ratio in Spain. In the cases of France

and Italy, DFM only appears to provide higher predictive accuracy than the histori-

cal average in the forecast of the bid-ask spread and the Amihud illiquidity indica-

tor, respectively. RMSE based R-squared also suggests that the forecasting ability

of DFM is higher amid price impact measures (Amihud indicator and HHL) than

transaction costs measures.

5. FINAL REMARKS

PCA methodology is widely used to capture unobserved variables through the anal-

ysis of other observable proxies. In this paper, I apply PCA to capture the evolution

of liquidity, which is not directly observed. In doing so, I use a set of nine liquidity

proxies. Concurrently I show how to simulate the impact of liquidity shocks in prox-

ies of liquidity such as bid-ask spread, turnover ratio and Amihud Indicator. In that

sense, PCA can be useful for measuring aggregate liquidity risk and for stress test

reporting. In terms of the results, the unobserved liquidity variable evidences a sig-

nificant downturn after the Lehman Brothers bankruptcy in the four analysed mar-

kets. Even though this event affected the liquidity of all markets, it was particularly

severe in Spain and France, but with transitory effects. Both markets recovered to

their long term liquidity level before mid-2010. Italy also exhibits a decline of the

liquidity component after the Lehman Brothers bankruptcy, but contrary to Spain

and France that liquidity shock assumes a more permanent effect.

17

W O R K I N G P A P E R N º 3 / 2 0 1 3

Page 18: WORKING PAPER CMVM - CMVM - Homepage · WORKING PAPER CMVM Modeling and Forecasting Liquidity using Principal Component Analysis and Dynamic Factor Models An Illiquidity Composite

Furthermore, I present an extension of PCA methodology, Dynamic Factor Models.

DFM reveal that the unobserved component is very persistent over time, and thus it

is predictable. DFM presents a remarkable forecasting accuracy, particularly in Por-

tugal and Spain. Even in the cases of France and Italy, DFM appears to provide

higher predictive accuracy than the historical average in forecasting the bid-ask

spread and the Amihud illiquidity indicator. This analysis also shows communalities

across the liquidity components of the four markets. In other words, the liquidity of

different European markets tends to co-move.

Comparing the two approaches, PCA has the advantage of being easier to imple-

ment and is more flexible, whereas DFM computation is slow and sometimes con-

vergence is not achieved. On the other hand, DFM permits modelling the time se-

ries structure of the proxies and to compute forecasts of the liquidity component

and of the proxies.

M O D E L I N G A N D F O R E C A S T I N G L I Q U I D I T Y . . .

18

Page 19: WORKING PAPER CMVM - CMVM - Homepage · WORKING PAPER CMVM Modeling and Forecasting Liquidity using Principal Component Analysis and Dynamic Factor Models An Illiquidity Composite

REFERENCES

Amihud, Y. (2002). “Illiquidity and stock returns: Cross-section and time-series

effects”. Journal of Financial Markets 5, 31-56.

Acharya, V. and L. H. Pedersen (2005). “Asset pricing with liquidity risk”. Journal of

Financial Economics 77, 375-410.

Avramov, D., T. Chordia and A. Goyal (2005). “Liquidity and autocorrelations in

individual stock returns”. Working Paper.

Benston, G. and R. Hagerman (1974). “Determinant of bid-asked spreads in the

over-the-counter market”. Journal of Financial Economics 1(4), 353-364.

Breitung, J. and S. Eickmeier (2009). “Testing for structural breaks in dynamic

factor models”. Deutsche Bundesbank Economic Studies Discussion Paper No.

Coppi, R. and F. Zanella (1978). “L’analisi fattoriale di una serie temporale múltipla

relativa allo stesso insieme di unità statistiche”. Società Italiana di Statistica, XXIX

riunione.

Federici, A. and A. Mazzitelli (2010). “Dynamic factor analysis with Stata”. 2nd

Italian Stata Users Group meeting.

Glosten, L. and L. Harris (1988). “Estimating the components of the bid/ask

spread”. Journal of Financial Economics, 21, 125-142.

Glosten, L. and P.R. Milgrom (1985). “Bid, ask and transaction prices in a specialist

market with heterogeneously informed traders”. Journal of Financial Economics,

14-71.

Glosten, L. (1987). “Components of the bid–ask spread and the statistical

properties of transaction prices”. Journal of Finance 42, 1293–1307.

Goyenko, R.Y., C.W. Holden and C.A. Trzcinka (2009). “Do liquidity measures

measure liquidity?”. Journal of Financial Economics 92, 153-181.

Hasbrouck, J. (2004). “Liquidity in the futures pits: inferring market dynamics from

incomplete data”. Journal of Financial and Quantitative Analysis 39, 305–326.

Hasbrouck, J. (2009).”Trading costs and returns for US equities: estimating

effective costs from daily data”. Journal of Finance.

Hasbrouck, J. and R. A. Schwartz (1988). “An assessment of stock exchange and

over-the-counter markets”. Journal of Portfolio Management 14, 10-16.

W O R K I N G P A P E R N º 3 / 2 0 1 3

19

Page 20: WORKING PAPER CMVM - CMVM - Homepage · WORKING PAPER CMVM Modeling and Forecasting Liquidity using Principal Component Analysis and Dynamic Factor Models An Illiquidity Composite

REFERENCES

Kamara, A., X. Lou and R. Sadka (2008). “The divergence of liquidity commonality

in the cross-section of stocks”. Journal of Financial Economics 89, 444–466.

Kyle, A. (1985). “Continuous auctions and insider trading”. Econometrica 53

(6), 1315-1335.

Lesmond, D. A., J. P. Ogden. and C. Trzcinka (1999). “A new estimate of

transaction costs”. Review of Financial Studies 12 (5).

Lesmond, D. (2005). “Liquidity of emerging markets”. Journal of Financial

Economics 77, 411–452.

Naes, R, J. A. Skjeltorp, B. A. Ødegaard (2011). “Stock market liquidity and the

business cycle”. The Journal of Finance 66, 139–176.

Roll, R. (1984). “A simple implicit measure of the effective bid–ask spread in an

efficient market”. Journal of Finance 39, 1127–1139.

Sarr, A. and T. Lybek (2002). “Measuring liquidity in financial markets”. IMF

Working Paper No. 02/232.

Spiegel, M. I. and X. Wang (2005). “Cross-sectional Variation in Stock Returns:

Liquidity and Idiosyncratic Risk”. Yale ICF Working Paper No. 05-13.

Stock, J. H. and M. W. Watson (2010). “Dynamic factor models”. Oxford Handbook

of Economic Forecasting.

Stoll, H. (1978). “The pricing of security dealers services: An empirical study of

NASDAQ Stocks”. Journal of Finance 33, 1153 – 1172.

Watanabe, A. and M. Watanabe (2008). “Time-varying liquidity risk and the cross

section of stock returns”. Review of Financial Studies, 21, 2449-2486.

Zhang, H. (2010). “Measuring liquidity in emerging markets”. Working Paper.

20

M O D E L I N G A N D F O R E C A S T I N G L I Q U I D I T Y . . .

Page 21: WORKING PAPER CMVM - CMVM - Homepage · WORKING PAPER CMVM Modeling and Forecasting Liquidity using Principal Component Analysis and Dynamic Factor Models An Illiquidity Composite

TABLES

Table 1

Correlation Matrix

Panel A – Portugal

Panel B – Spain

Panel C – Italy

Panel D – France

W O R K I N G P A P E R N º 3 / 2 0 1 3

21

Page 22: WORKING PAPER CMVM - CMVM - Homepage · WORKING PAPER CMVM Modeling and Forecasting Liquidity using Principal Component Analysis and Dynamic Factor Models An Illiquidity Composite

Table 2

Total Variance Explained by the First Principal Component

and Communalities Using the 9 Liquidity Proxies

Panel A - Portugal Panel C - Italy

Panel B - Spain Panel D - France

22

M O D E L I N G A N D F O R E C A S T I N G L I Q U I D I T Y . . .

Page 23: WORKING PAPER CMVM - CMVM - Homepage · WORKING PAPER CMVM Modeling and Forecasting Liquidity using Principal Component Analysis and Dynamic Factor Models An Illiquidity Composite

Table 3

Total Variance Explained by the First Principal Component

and Communalities Using 6 Liquidity Proxies

Portugal

Panel A - Raw Series Panel B - Seasonally Adjusted Series

Spain

Panel A - Raw Series Panel B - Seasonally Adjusted Series

Italy

Panel A - Raw Series Panel B - Seasonally Adjusted Series

W O R K I N G P A P E R N º 3 / 2 0 1 3

23

Page 24: WORKING PAPER CMVM - CMVM - Homepage · WORKING PAPER CMVM Modeling and Forecasting Liquidity using Principal Component Analysis and Dynamic Factor Models An Illiquidity Composite

France

Panel A - Raw Series Panel B - Seasonally Adjusted Series

Figure 1

First Principal Component Evolution Between 2000 and 2012

Panel A - Portugal

24

M O D E L I N G A N D F O R E C A S T I N G L I Q U I D I T Y . . .

Page 25: WORKING PAPER CMVM - CMVM - Homepage · WORKING PAPER CMVM Modeling and Forecasting Liquidity using Principal Component Analysis and Dynamic Factor Models An Illiquidity Composite

Panel B - Spain

Panel C - Italy

W O R K I N G P A P E R N º 3 / 2 0 1 3

25

Page 26: WORKING PAPER CMVM - CMVM - Homepage · WORKING PAPER CMVM Modeling and Forecasting Liquidity using Principal Component Analysis and Dynamic Factor Models An Illiquidity Composite

Panel D - France

Figure 2

Liquidity Shocks – Impact of a Liquidity Aggregate Factor Shock

on the Liquidity Proxies

Panel A - Portugal

Panel B - Spain

26

M O D E L I N G A N D F O R E C A S T I N G L I Q U I D I T Y . . .

Page 27: WORKING PAPER CMVM - CMVM - Homepage · WORKING PAPER CMVM Modeling and Forecasting Liquidity using Principal Component Analysis and Dynamic Factor Models An Illiquidity Composite

Panel C - Italy

Panel D - France

Table 4

Correlation Matrix of Differentiated Variables

Panel A - Portugal

W O R K I N G P A P E R N º 3 / 2 0 1 3

27

Page 28: WORKING PAPER CMVM - CMVM - Homepage · WORKING PAPER CMVM Modeling and Forecasting Liquidity using Principal Component Analysis and Dynamic Factor Models An Illiquidity Composite

Panel B - Spain

Panel C - Italy

Panel D - France

Table 5

Total Variance Explained by the First and Second Principal Component,

Factor Scores, Factor Loadings and Communalities Using 6 First-Differenced

Liquidity Proxies

Panel A - Portugal

28

M O D E L I N G A N D F O R E C A S T I N G L I Q U I D I T Y . . .

Page 29: WORKING PAPER CMVM - CMVM - Homepage · WORKING PAPER CMVM Modeling and Forecasting Liquidity using Principal Component Analysis and Dynamic Factor Models An Illiquidity Composite

Panel B - Spain

Panel C - Italy

Panel D - France

W O R K I N G P A P E R N º 3 / 2 0 1 3

29

Page 30: WORKING PAPER CMVM - CMVM - Homepage · WORKING PAPER CMVM Modeling and Forecasting Liquidity using Principal Component Analysis and Dynamic Factor Models An Illiquidity Composite

Figure 3

Liquidity Shocks – Impact of a Shock in the Liquidity Aggregate Factor

on the Liquidity Proxies, Corrected for Seasonality

Panel A - Portugal

Panel B - Spain

M O D E L I N G A N D F O R E C A S T I N G L I Q U I D I T Y . . .

30

Page 31: WORKING PAPER CMVM - CMVM - Homepage · WORKING PAPER CMVM Modeling and Forecasting Liquidity using Principal Component Analysis and Dynamic Factor Models An Illiquidity Composite

Panel C - Italy

Panel D - France

31

W O R K I N G P A P E R N º 3 / 2 0 1 3

Page 32: WORKING PAPER CMVM - CMVM - Homepage · WORKING PAPER CMVM Modeling and Forecasting Liquidity using Principal Component Analysis and Dynamic Factor Models An Illiquidity Composite

Table 6

Dynamic Factor Model Estimation

Figure 4

Latent Liquidity Derived From a Dynamic Factor Model

M O D E L I N G A N D F O R E C A S T I N G L I Q U I D I T Y . . .

32

Page 33: WORKING PAPER CMVM - CMVM - Homepage · WORKING PAPER CMVM Modeling and Forecasting Liquidity using Principal Component Analysis and Dynamic Factor Models An Illiquidity Composite

Table 7

Out-of-Sample Forecasting Accuracy: DFM versus Sample Average

Table 8

Out-of-Sample (One-Step-Ahead) Forecasting Accuracy:

DFM versus Historical Average

W O R K I N G P A P E R N º 3 / 2 0 1 3

33

Page 34: WORKING PAPER CMVM - CMVM - Homepage · WORKING PAPER CMVM Modeling and Forecasting Liquidity using Principal Component Analysis and Dynamic Factor Models An Illiquidity Composite

WORKING PAPER

CMVM COMISSÃO DO MERCADO DE VALORES MOBILIÁRIOS

Rua Laura Alves, n.º 4

Apartado 14258

1064-003 Lisboa . Portugal

Telefone 21 317 70 00 . Fax 21 353 70 77/ 78

Site: www.cmvm.pt

E-mail: [email protected]

APOIO AO INVESTIDOR

Linha verde: 800 205 339