Modelling dependence using skew t copulas: Bayesian inference and applications

23
JOURNAL OF APPLIED ECONOMETRICS J. Appl. Econ. 27: 500–522 (2012) Published online 13 October 2010 in Wiley Online Library (wileyonlinelibrary.com) DOI: 10.1002/jae.1215 MODELLING DEPENDENCE USING SKEW T COPULAS: BAYESIAN INFERENCE AND APPLICATIONS MICHAEL S. SMITH, a * QUAN GAN b AND ROBERT J. KOHN c a Melbourne Business School, University of Melbourne, Melbourne, VIC, Australia b Discipline of Finance, University of Sydney, Sydney, NSW, Australia c School of Economics, University of New South Wales, Sydney, NSW, Australia SUMMARY We construct a copula from the skew t distribution of Sahu et al. (2003). This copula can capture asymmetric and extreme dependence between variables, and is one of the few copulas that can do so and still be used in high dimensions effectively. However, it is difficult to estimate the copula model by maximum likelihood when the multivariate dimension is high, or when some or all of the marginal distributions are discrete-valued, or when the parameters in the marginal distributions and copula are estimated jointly. We therefore propose a Bayesian approach that overcomes all these problems. The computations are undertaken using a Markov chain Monte Carlo simulation method which exploits the conditionally Gaussian representation of the skew t distribution. We employ the approach in two contemporary econometric studies. The first is the modelling of regional spot prices in the Australian electricity market. Here, we observe complex non-Gaussian margins and nonlinear inter-regional dependence. Accurate characterization of this dependence is important for the study of market integration and risk management purposes. The second is the modelling of ordinal exposure measures for 15 major websites. Dependence between websites is important when measuring the impact of multi-site advertising campaigns. In both cases the skew t copula substantially outperforms symmetric elliptical copula alternatives, demonstrating that the skew t copula is a powerful modelling tool when coupled with Bayesian inference. Copyright 2010 John Wiley & Sons, Ltd. 1. INTRODUCTION Copula modelling is an effective and popular approach for constructing multivariate distributions. It involves two stages: first, the selection of the appropriate distributional form for each margin, which can be continuous or discrete; second, dependence between marginals is specified through a parametric copula function. This is a general and computationally tractable approach. Joe (1997) and Nelsen (2006) review copula models and their properties, while Trivedi and Zimmer (2005) discuss the use of copulas in microeconometric modelling. Copula models are particularly popular in finance and actuarial science; see Cherubini et al. (2004) and McNeil et al. (2005) for introductions to financial applications of copulas and Frees and Valdez (2008) for a recent application to actuarial science. One of the most promising uses of copulas is for the analysis of high-dimensional data. However, while there are many bivariate copulas, the number of higher-dimensional copulas is limited. In this situation copulas constructed from underlying elliptical distributions are popular, such as the Gaussian (Song, 2000) or t (Demarta and McNeil, 2005) copulas. They scale to higher dimensions because the number of dependence parameters increases only quadratically with dimension. Nevertheless, a major drawback of elliptical copulas is that the form of dependence can Ł Correspondence to: Michael S. Smith, Melbourne Business School, University of Melbourne, 200 Leicester Street, Carlton, VIC 3053, Australia. E-mail: [email protected] Copyright 2010 John Wiley & Sons, Ltd.

Transcript of Modelling dependence using skew t copulas: Bayesian inference and applications

JOURNAL OF APPLIED ECONOMETRICSJ. Appl. Econ. 27: 500–522 (2012)Published online 13 October 2010 in Wiley Online Library(wileyonlinelibrary.com) DOI: 10.1002/jae.1215

MODELLING DEPENDENCE USING SKEW T COPULAS:BAYESIAN INFERENCE AND APPLICATIONS

MICHAEL S. SMITH,a* QUAN GANb AND ROBERT J. KOHNc

a Melbourne Business School, University of Melbourne, Melbourne, VIC, Australiab Discipline of Finance, University of Sydney, Sydney, NSW, Australia

c School of Economics, University of New South Wales, Sydney, NSW, Australia

SUMMARYWe construct a copula from the skew t distribution of Sahu et al. (2003). This copula can capture asymmetricand extreme dependence between variables, and is one of the few copulas that can do so and still be usedin high dimensions effectively. However, it is difficult to estimate the copula model by maximum likelihoodwhen the multivariate dimension is high, or when some or all of the marginal distributions are discrete-valued,or when the parameters in the marginal distributions and copula are estimated jointly. We therefore proposea Bayesian approach that overcomes all these problems. The computations are undertaken using a Markovchain Monte Carlo simulation method which exploits the conditionally Gaussian representation of the skewt distribution. We employ the approach in two contemporary econometric studies. The first is the modellingof regional spot prices in the Australian electricity market. Here, we observe complex non-Gaussian marginsand nonlinear inter-regional dependence. Accurate characterization of this dependence is important for thestudy of market integration and risk management purposes. The second is the modelling of ordinal exposuremeasures for 15 major websites. Dependence between websites is important when measuring the impactof multi-site advertising campaigns. In both cases the skew t copula substantially outperforms symmetricelliptical copula alternatives, demonstrating that the skew t copula is a powerful modelling tool when coupledwith Bayesian inference. Copyright 2010 John Wiley & Sons, Ltd.

1. INTRODUCTION

Copula modelling is an effective and popular approach for constructing multivariate distributions.It involves two stages: first, the selection of the appropriate distributional form for each margin,which can be continuous or discrete; second, dependence between marginals is specified througha parametric copula function. This is a general and computationally tractable approach. Joe(1997) and Nelsen (2006) review copula models and their properties, while Trivedi and Zimmer(2005) discuss the use of copulas in microeconometric modelling. Copula models are particularlypopular in finance and actuarial science; see Cherubini et al. (2004) and McNeil et al. (2005)for introductions to financial applications of copulas and Frees and Valdez (2008) for a recentapplication to actuarial science.

One of the most promising uses of copulas is for the analysis of high-dimensional data.However, while there are many bivariate copulas, the number of higher-dimensional copulas islimited. In this situation copulas constructed from underlying elliptical distributions are popular,such as the Gaussian (Song, 2000) or t (Demarta and McNeil, 2005) copulas. They scale tohigher dimensions because the number of dependence parameters increases only quadratically withdimension. Nevertheless, a major drawback of elliptical copulas is that the form of dependence can

Ł Correspondence to: Michael S. Smith, Melbourne Business School, University of Melbourne, 200 Leicester Street,Carlton, VIC 3053, Australia. E-mail: [email protected]

Copyright 2010 John Wiley & Sons, Ltd.

MODELLING DEPENDENCE USING SKEW T COPULAS 501

be overly restrictive. For example, the Gaussian copula can fail to adequately capture dependencebetween extreme events; see Longin and Solnik (2001) and Bae et al. (2003) for financial examples.

One solution is to use a sequence of bivariate copulas to construct a multivariate copula (Bedfordand Cooke, 2002), which has recently been labelled a ‘pair-copula’ (Aas et al., 2009; Min andCzado, 2010). While these are flexible, an ordering of the multivariate vector is required, whichis not always obvious, except in specific cases such as for the analysis of longitudinal data (Smithet al., 2010). An alternative is to construct a copula using one of the various skew-ellipticaldistributions. Demarta and McNeil (2005) were the first to suggest this using a specific asymmetrict distribution. We extend this idea and construct a copula from skew t distributions that are formedfrom hidden truncation, which are currently the most popular type (Genton, 2004). In particular,we employ the multivariate skew t distribution of Sahu et al. (2003), although the method extendsreadily to that of Azzalini and Capitanio (2003) as well. We derive the resulting copula densityin closed form, but observe that the likelihood is computationally difficult to evaluate as thedimension increases, so that maximum likelihood estimation is difficult. Moreover, maximumlikelihood estimation is also difficult when one or more of the margins is discrete, or if estimationof the copula parameters is undertaken joint with the parameters of the marginal models.

Therefore, we develop a Bayesian approach to compute inference with Markov chain MonteCarlo (MCMC) sampling used to evaluate the posterior distribution. We follow Sahu et al. (2003)and represent the skew t distribution as Gaussian, conditional upon latent variables, which wegenerate explicitly in the sampling scheme. We observe that the copula function is a mixtureof Gaussian distribution functions over these latent variables, not a mixture of Gaussian copulafunctions as one might initially suspect. Our approach applies to any combination of continuous ordiscrete-valued marginals, where for the latter additional latent variables are generated as outlinedin Danaher and Smith (2010). We carefully derive the conditional posterior distributions of thecopula and marginal parameters separately when the margins are continuous or discrete-valued, sothat Metropolis–Hasting (MH) steps can be used. The method generalizes the work of Pitt et al.(2006), who consider Bayesian inference for the Gaussian copula. We show how to compute taildependence measures and Spearman correlations, and how to select from different copula modelsusing Bayesian predictive density cross-validation.

We demonstrate that the skew t copula can capture dependence more accurately than thewidely used symmetric t copula with two contemporary econometric studies. The first involvesconstructing the distribution of daily electricity spot prices in the Australian wholesale market. Thismarket has five regions, and we model each regional price using asymmetric and heteroscedasticmarginal regression models that capture the strong signal in the first and second moments ofprices that has been observed in recent studies (Weron and Misiorek, 2008; Koopman et al., 2007;Panagiotelis and Smith, 2008). If the five regions are well integrated, regional prices should behighly dependent. Using a skew t copula we measure the inter-regional dependence and findsome regions to have prices that are highly dependent, but others that are less so, indicating theexistence of major market imperfections. Importantly, because of both the physics of electricitytransmission and market dynamics, the dependence between regional prices is also likely to behighly nonlinear. Our analysis confirms this, with the skew t copula identifying strong asymmetryin the tail dependence. The improved fit of the skew t allows for more reliable computation of tailprobabilities, which are important for both assessing the level of market integration during highdemand periods, and also for improved risk management.

The second study involves multivariate ordinal-valued data. Univariate models for such dataare popular in applied econometric analysis (Cameron and Trivedi, 1998), and the use of copulasextends this to multivariate modelling (Cameron et al., 2004; Danaher and Smith, 2010). We modelonline exposure at the top 15 US websites of 2007 as marginally negative binomial, with a skewt copula to capture inter-website dependence. The focus is on the total exposure count across all

Copyright 2010 John Wiley & Sons, Ltd. J. Appl. Econ. 27: 500–522 (2012)DOI: 10.1002/jae

502 M. SMITH, Q. GAN AND R. KOHN

15 websites. This is because of the industry practice of placing banner advertisements on multiplewebsites, with the objective being the maximization of total exposure across them all. Danaher(2007) shows that capturing dependence between websites is crucial to the accurate modelling oftotal exposure. Our analysis shows that the skew t copula provides a substantial improvement inthe modelling of inter-site dependence, compared to the symmetric t copula. An important resultis the more accurate computation of marketing ‘reach’ (the probability of any exposure) across allsites.

The remainder of the paper is organized as follows. Section 2 introduces copulas, the skew tdistribution, and the conditionally Gaussian representation. The skew t copula is then defined andits mixture representation derived. Section 3 outlines Bayesian estimation and inference when themargins are continuous and Section 4 contains the Australian electricity market study. Section 5extends the Bayesian methodology to the case where the margins are discrete, and employs themethod to analyse the website exposure data. A discussion concludes, while the Appendix containssome implementation details.

2. SKEW T COPULA

2.1. Copula Functions

The function C�u1, . . . , ur� is called a copula function if it is a distribution function with each ofits margins uniformly distributed on [0, 1]. That is, C�u� D Pr�U1 � u1, . . . , Ur � ur�, with eachUj, for j D 1, . . . , r, uniformly distributed on [0,1] and u D �u1, . . . , ur�. Sklar (1959) proposedand named the concept of a copula and Joe (1997) and Nelsen (2006) discuss a wide range ofchoices for C and their properties.

While there are many suggestions for how to chose C in the bivariate case, for higher dimensionsthe choice is more limited. In this case, parametric copulas are often constructed from a continuousmultivariate distribution as follows. Suppose that X D �X1, . . . , Xr�0 is a random vector withcontinuous margins, distribution function F�x;��, and strictly monotonic marginal distributionfunctions Fj�xj;��, for j D 1, . . . , r, where x D �x1, . . . , xr�0 and � is a parameter vector. Then

C�u;�� D Pr�F1�X1;�� � u1, . . . , Fr�Xr ;�� � ur� D F�F�11 �u1;��, . . . , F�1

r �ur ;��;�� �1�

defines a copula function. If X has density f�x;�� and marginal densities fj�xj;��, then thecopula density on [0,1]r is

c�u;�� D f�x;��r∏jD1

fj�xj;��

�2�

where xj D F�1j �uj;��. The construction is motivated by Sklar’s theorem; see McNeil et al. (2005)

for an overview.When F in equation (1) is an elliptical distribution function, the resulting copula is called an

elliptical copula (Fang et al., 2002). An elliptical copula can be understood as a transformationfrom X 2 Rr , which is elliptically distributed, to a dependent random vector U on [0,1]r . Thecopula captures dependence via the correlation matrix of the elliptical distribution. In this paperwe instead construct a copula by employing a skew t distribution function for F. A skew t copulacan therefore be understood as a transformation from X, which is skew t distributed, to U, whichalso inherits the dependence structure.

Copyright 2010 John Wiley & Sons, Ltd. J. Appl. Econ. 27: 500–522 (2012)DOI: 10.1002/jae

MODELLING DEPENDENCE USING SKEW T COPULAS 503

2.2. Skew t Distribution and Copula

Elliptical distributions have been generalized to incorporate skewness by a number of authors;see Genton (2004). Here, we focus on the skew t distribution proposed by Sahu et al. (2003),which has the conditionally Gaussian representation outlined below. However, we note here thatthe method can also be applied to other skew-elliptical distributions with conditionally Gaussianrepresentations. Suppose X and Q are both (r ð 1) with a joint multivariate t distribution withzero mode, degrees of freedom �, and scale matrix �, which we write as(

XQ

)∼ t2r

((00

),� D

(C D2 DD I

), �

)�3�

Here, D D diag�υ1, . . . , υr� is a diagonal matrix and an (r ð r) positive definite matrix. Thenthe skew distribution for X is defined to be the distribution of X, conditional upon all the elementsof Q being positive, which we write as (XjQ > 0). Sahu et al. (2003) show that the resultingdensity is

fSt�x;,D, �� D 2r

det�C D2�1/2ft

(�C D2��

12 x; �

)Pr�V > 0; x� �4�

where ft�x; �� is a tr�0, I, �� density evaluated at x, V ∼ tr�D�C D2��1x, S�x�C �r C � �I� D�C

D2��1D�, r C ��, and S�x� D x0�C D2��1x; see also Panagiotelis and Smith (2010).In dimension j, υj 2 R controls the level of asymmetry, with symmetry occurring when υj D 0.

To evaluate equation (4), it is necessary to compute the term Pr�V > 0; x� numerically. Whilethere are developments in this area (Genz and Bretz, 2002), this is still difficult in even moderatedimensions. Sahu et al. (2003) give the first and second moments of the skew t distribution, whichare functions of υ D �υ1, . . . , υr�0, with

E�X� D �Ł D ���υ, where ��� D( �

)1/2 ��� � 1�/2�

��/2��5�

var�X� D Ł D �C D2��

� � 2�

[���2��0 � ���0 � I�

2�

�� � 2�

]þ υυ0 �6�

where � D �1, . . . , 1�0, ‘þ’ is the Hadamard matrix multiplication operator, and the expression forŁ is a correction given in Panagiotelis and Smith (2008). When � ! 1, the skew t convergesto a skew Gaussian with density

fSG�x;,D� D 2r

det�C D2�1/2 r

(�C D2��

12 x

)Pr�V > 0; x� �7�

where r is the density of an Nr�0, I� distribution and V ∼ Nr�D�C D2��1x, I� D�CD2��1D�. For the skew Gaussian the mean is �Ł D �2/�1/2υ and the variance Ł D C D2.

Let � D f,D, �g and diag�� D �1, . . . , 1� for the rest of the paper. It can be shown thateach margin j is also skew t with density fSt�xj; 1, υj, ��, which we denote more concisely asfSt,j�xj; υj, ��. Closure under marginalization is a useful feature for constructing a copula. Fromequations (1) and (2), the skew t copula function is CSt�u;�� D FSt�x;��, with density

cSt�u;�� D fSt�x;��r∏jD1

fSt,j�xj; υj, ��

�8�

Copyright 2010 John Wiley & Sons, Ltd. J. Appl. Econ. 27: 500–522 (2012)DOI: 10.1002/jae

504 M. SMITH, Q. GAN AND R. KOHN

Here, FSt�x;�� is the distribution function of the skew t in equation (4) and xj D F�1St,j�uj; υj, ��

is the inverse distribution function of margin j evaluated at uj. The latter is computed numericallyusing Newton’s method, which provides accurate estimates within three or four iterations.

Note that while the skew t distribution has non-zero mean �Ł, the parameters in the skew tcopula are identified by setting the elements along the leading diagonal of to ones. When theskew parameters υ D 0, so that D D 0, the skew t copula is a symmetric t copula. When � ! 1the skew t copula converges to a skew Gaussian copula, with a density function that we label cSG.When both � ! 1 and D D 0 the skew t copula corresponds to the standard Gaussian copula.

To illustrate the dependence structure, Figure 1 plots contours of bivariate distributions obtainedfrom skew t copulas with � D 5, the off-diagonal element 12 D 0.95 and N�0, 1� margins. Theparameters υ affect the dependence structure substantially, highlighting two points. First, whileυ controls the level of skewness in X D �X1, X2�, it determines (joint with and �) the form

y1

y 2

δ1=−0.8, δ2=−0.8

−2 0 2

−2

−1

0

1

2

y1

y 2

δ1=−0, δ2=−0.8

−2 0 2

−2

−1

0

1

2

y1

y 2

δ1=−0.8, δ2=−0.8

−2 0 2

−2

−1

0

1

2

y1

y 2

δ1=−0.8, δ2=0

−2 0 2−2

−1

0

1

2

y1

y 2

δ1=0, δ2=0

−2 0 2−2

−1

0

1

2

y1

y 2

δ1=0.8, δ2=0

−2 0 2−2

−1

0

1

2

y1

y 2

δ1=−0.8, δ2=0.8

−2 0 2

− 2

− 1

0

1

2

y1

y 2

δ1=0, δ2=0.8

−2 0 2

−2

−1

0

1

2

y1

y 2

δ1=0.8, δ2=0.8

−2 0 2

−2

−1

0

1

2

Figure 1. Contours of a bivariate distribution with N(0,1) margins and dependence captured by a skew tcopula with � D 5 degrees of freedom, off-diagonal element of set to 0.95 and differing values of υ. Thecentral panel corresponds to a symmetric t copula

Copyright 2010 John Wiley & Sons, Ltd. J. Appl. Econ. 27: 500–522 (2012)DOI: 10.1002/jae

MODELLING DEPENDENCE USING SKEW T COPULAS 505

of dependence between U D �U1,U2�. Second, the skew t copula accounts for a broader formof dependence than the symmetric t copula depicted in the central panel of Figure 1. Last, weemphasize that skew or heavy tails in a distribution do not result from adopting a skew t copula.These result from the choice of marginal distributions, which can also be skew t or somethingaltogether different.

2.3. Conditionally Gaussian Representation

Estimation and inference for the parameters of the skew t distribution are simplified by employingthe augmented distribution in equation (3), conditional upon the constraint Q > 0. That is,f�x, qjQ > 0� ∝ f�xjq�f�q�I�q > 0�, of which the margin in x is skew t. In this formulation

XjQ D q ∼ tr

(Dq, � C q0q

� C r , � C r

), the indicator function I�q > 0� D 1 if all the elements of

q are positive, and zero otherwise, and f�q) is a tr�0, I, �� density.It is possible to work directly with this conditionally t representation. However, to further

simplify the problem, we follow many authors and represent the t as a scale mixture of normals.If W ∼ Gamma��/2, �/2�, then the joint distribution of (X, Q, W), with the constraint Q > 0, hasdensity

f�x, q, wjQ > 0� ∝ f�xjq, w�f�qjw�I�q > 0�f�w� �9�

The margin in X is skew t, with density equal to fSt�x;�� D ∫f�x, q, wjQ > 0�d�q, w�. In

equation (9), �XjQ D q,W D w� ∼ Nr(Dq, 1

w)

and �QjW D w� ∼ Nr(

0, 1wI

).

If follows from equation (9) that the skew t copula function has a mixture representation

CSt�u;�� ∝∫CŁ�u; z, ��f�qjw�I�q > 0�f�w�d�w, q�, where

CŁ�u; z, �� D r�w1/2�1/2�x � Dq�� �10�

z D fq, wg, and r is a Nr�0, I� distribution function. Note that, conditional upon x, CSt is amixture of Gaussian distribution functions, not a mixture of Gaussian copula functions as onemight initially expect. Using this mixture representation at equation (10), an iterate of U withcorresponding distribution CSt can be generated as follows.

Algorithm 1. (Random iterate generation from CSt)

Step 1. Generate W ∼ Gamma��/2, �/2�.Step 2. Generate Q ∼ Nr

(0, 1WI

)constrained such that Q > 0.

Step 3. Generate X ∼ Nr(DQ, 1

W)

.Step 4. Set Uj D FSt,j�Xj; υj, �� for j D 1, . . . , r.

In Step 4 the marginal distribution function FSt,j�xj; υj, �� D ∫ xj�1 fSt,j�x0

j; υj, ��dx0j can be

computed rapidly and accurately using univariate numerical integration, while Step 2 can beundertaken by rejection sampling or other method.

Copyright 2010 John Wiley & Sons, Ltd. J. Appl. Econ. 27: 500–522 (2012)DOI: 10.1002/jae

506 M. SMITH, Q. GAN AND R. KOHN

3. BAYESIAN ESTIMATION

3.1. Priors

Bayesian estimation requires the specification of prior distributions for the copula parameters�. From equation (6), when D D 0, is the correlation matrix of X, but when D 6D 0, is nolonger the correlation matrix of X, but is still a matrix of r�r � 1�/2 dependence parameters. Wefollow Pitt et al. (2006) and adopt a prior on that is implied by a uniform prior for G, with�1 D FGF, where G is a correlation matrix and F is a diagonal matrix such that F2

ii D �G�1�ii.The reason we use this prior is that Wong et al. (2003) show this is equivalent to a uniformprior on the matrix of partial correlations, and because one-at-a-time sampling of the elementsof G is straightforward using accurate normal approximations. Barnard et al. (2000) and Danaherand Smith (2010) suggest alternative priors for , any one of which can be used instead of ourprior. We assume prior independence between , � and υ1, . . . , υr . For υj priors based on uniformskew coefficients (Panagiotelis and Smith, 2010) or reference priors along the lines of Liseo andLoperfido (2006) can be used, although we assume υj ∼ N�0, 52� here. For � we use a uniformprior on [2,50], which places non-zero weight on both heavy-tailed and near-Gaussian distributionsfor X. Because the marginal models are application specific, so are the priors on the marginalparameters, although we adopt either weak or non-informative priors in our empirical work

3.2. MCMC Sampling Scheme

We exploit the conditionally Gaussian representation of the skew t, and use MCMC to evaluate theposterior distribution. Let yi D �yi1, . . . , yir�0 be the ith observation of the data, and Hij�yij; �j�be the marginal distribution function of yij, with parameters �j that arise from some model. Forexample, they may be the parameters of a marginal heteroscedastic model, or the linear coefficientsand scale parameter in a regression model (in the latter case Hij D Hj). We propose a samplingscheme that generates from the posterior, augmented with latent variables from the conditionallyGaussian representation in equation (9).

Let zi D fqi, wig be the latent variables qi D �qi1, . . . , qir� and wi for observation i, xi D�xi1, . . . , xir� be the corresponding observation on X, y D fy1, . . . , yng, z D fz1, . . . , zng andx D fx1, . . . , xng. Then, from equation (9), the density of the latent variables is

f�zj�,� ∝n∏iD1

f�qijwi, ��f�wij��I�qi > 0� �11�

where qijwi, � ∼ N(

0, 1wi I

)and wij� ∼ Gamma��/2, �/2�. Let fAnBg denote the relative com-

plement of B in A, and D f�1, . . . , �rg the set of all marginal parameters. Then the followingsampling scheme is a Gibbs scheme with MH steps. It provides iterates from the augmentedposterior f�, �, zjy� when the margins are continuous.

Algorithm 2. MCMC scheme for skew t copula (continuous margins)

Step 1. Set xij D F�1St,j�Hij�yij; �j�; υj, �� for j D 1, . . . , r and i D 1, . . . , n.

Step 2. Sample one at a time from f�wijx, �, fznwig� for all i.Step 3. Sample one at a time from f�qijjx, �, fznqijg� for all i, j.Step 4. Sample υ1, . . . , υr, � each one at a time using random walk MH steps.Step 5. Sample from f�jx, f�ng, z� using the method of Pitt et al. (2006) or alternative.Step 6. For all j, sample from f��jjy, z, �, fn�jg� as a block using MH.

Copyright 2010 John Wiley & Sons, Ltd. J. Appl. Econ. 27: 500–522 (2012)DOI: 10.1002/jae

MODELLING DEPENDENCE USING SKEW T COPULAS 507

For Step 1, each univariate distribution function FSt,j is computed using numerical integration,and the inverse F�1

St,j using Newton’s method, both of which prove rapid and accurate. Conditionalupon x, Steps 2 and 3 are straightforward and given in the Appendix. Step 5 is discussed in Pittet al. (2006) and Wong et al. (2003), while Barnard et al. (2000) and Danaher and Smith (2010)provide sampling methods for their priors for .

See Robert and Cassella (2004, pp. 287–291) for an overview of the random walk MHalgorithm. At Step 4 this requires derivation of the conditional posterior of (υj, �) as follows.Because xij D F�1

St,j�Hij�yij; �j�; υj, ��,

dxijdyij

D hij�yij; �j�

fSt,j�xij; υj, ��

where hij D dHij/dyij. Then, by transforming variables,

f�yij�, zi,� D f�xij�, zi�r∏jD1

hij�yij; �j�

fSt,j�xij; υj, ���12�

where xij�, zi ∼ N(Dqi,

1wi

)and fSt,j�xij; υj, �� are univariate skew t as outlined in Section

2.2. From equation (12), the conditional posterior can then be derived as

f�υj, �jy, z, f�nυj, �g,� ∝n∏iD1

ff�yij�, zi,�gf�υj, �jz, f�nυj, �g�

∝n∏iD1

f�xij�, zi�r∏lD1

fSt,l�xil; υl, ��

n∏iD1

ff�qijwi, ��f�wij��g����υj� �13�

where ��� and �υj� are the prior densities for � and υj.Step 6 generates each �j as a block using an MH step with a multivariate t proposal. Following

Pitt et al. (2006) the mean of the proposal is set equal to the maximum conditional posterior,the scale equal to the negative inverse of the information matrix and 6 degrees of freedom. Themode and information matrix are both obtained using Newton–Raphson iteration. Assuming prior�� D ∏

j ��j�, from equation (12) the conditional posterior is

f��jjy, z, �, fn�jg� ∝n∏iD1

ff�yij�, zi,�g��j� ∝n∏iD1

{f�xij�, zi� hij�yij; �j�

fSt,j�xij; υj, ��

}��j�

3.3. Posterior Inference

The sampling scheme converges to produce iterates ��[k],[k], z[k]� ∼ f��,, zjy�, for k D1, . . . , K, which are used to estimate the posterior distribution of functionals of interest. We employthe posterior means of the model parameters as point estimates, along with the correspondingposterior probability intervals as measures of uncertainty. Of particular interest is the correlationmatrix Ł D diag�Ł��1/2Łdiag�Ł��1/2 of the skew t distribution from which the copulais formed, where Ł is given in equation (6). The off-diagonal elements measure pairwise

Copyright 2010 John Wiley & Sons, Ltd. J. Appl. Econ. 27: 500–522 (2012)DOI: 10.1002/jae

508 M. SMITH, Q. GAN AND R. KOHN

dependence, and we estimate their posterior mean by computing Ł at each sweep and thencompute their average.

Another measure of pairwise dependence is Spearman’s rho, which for margins j and l, is�jl D 12E�UjUl�� 3, where U D �U1, . . . , Ur� has distribution function CSt. To estimate theexpectation, Monte Carlo iterates fU[k], k D 1, . . . , Kg of U can be generated from the fittedcopula distribution function

∫CSt�U;��f��jy�d� using Algorithm 1 at the end of each sweep of

Algorithm 2. Using these, E�UjUljy� ³ �1/K�∑

k U[k]j U

[k]l and the value of �jl estimated from

the fitted distribution.Unlike symmetric elliptical copulas, the skew t copula has different upper and lower tail

dependence, which we measure by

�lowerjl �˛� D Pr�Uj < ˛jUl < ˛�, and �upper

jl �˛� D Pr�Uj > 1 � ˛jUl > 1 � ˛�

Denoting Unfj,lg as the elements of U omitting fUj,Ulg, the Bayesian estimate of the lower taildependence of the fitted distribution is

E��lowerjl �˛�jy� D

∫Pr�Uj < ˛jUl < ˛,Unfj,lg, ��f�Unfj,lg, �jy�d�Unfj,lg, ��

³ 1

K

K∑kD1

Pr�Uj < ˛jUl < ˛,U[k]nfj,lg, �

[k]�

where the integral is approximated using the Monte Carlo iterates of � and Unfj,lg. To evaluatethe conditional probability, note that Xj D F�1

St,j�Uj; υj, ��, j D 1, . . . , r, so that

Pr�Uj < ˛jUl < ˛,Unfj,lg, �� D Pr�Xj < ˛ŁjjXl < ˛Ł

l , Xnfj,lg, �� �14�

where ˛Łj D F�1

St,j�˛; υj, �� and Xnfj,lg is X D �X1, . . . , Xr�0 without the two elements fXj, Xlg.Because X ∼ Nr

(DQ, 1

W)

, the conditional probability in equation (14) is that of a univariatenormal.

The posterior mean E��upperjl �˛�jy� is estimated similarly. We note here that computing the indices

of tail dependence, which correspond to the dependence measures when ˛ ! 0, is difficult herebecause of the mixture representation of the copula function in equation (10).

3.4. Bayesian Cross-Validation

We use a fivefold predictive Bayesian cross-validation (BCV) criterion (Rust and Schmittlen,1985) to judge the fit of different copula models as follows. The data are partitioned at randominto five approximately equal-sized parts, setting the validation sample yV�l� to one part and thetraining sample yT�l� to the remaining data, for each partition l D 1, . . . , 5. For continuous data, iff�yV�l�jyT�l�� is the predictive density under model M of the validation sample given the trainingdata, and Of�yV�l�jyT�l�� is its approximation, then the fivefold predictive BCV criterion for modelM is the geometric mean

BCV�M� D(

5∏lD1

Of�yV�l�jyT�l��)1/5

We repeat this for 50 different randomly generated partitions and compute the geometric averageBCV�M� D �

∏50iD1 BCVi�M��1/50, where BCVi�M� is the value of BCV for the ith replicate.

Copyright 2010 John Wiley & Sons, Ltd. J. Appl. Econ. 27: 500–522 (2012)DOI: 10.1002/jae

MODELLING DEPENDENCE USING SKEW T COPULAS 509

Models with higher value for this criterion are preferred to those with a lower value. Using the 50replicates of the fivefold predictive BCV metric for two models, pairwise tests for the differencein the means are employed to confirm that any differences are significant.

For the skew t copula we compute the predictive density under model M as

f�yVjyT� D∫f�yVjyT, zV, �,�f��,, zVjyT�d��,, zV�

D∫�∏i2Vf�yijzi, �,��f�zVj��f��,jyT�d�zV, �,�

³∏i2V

{1

K

K∑kD1

f�yijz[k]i , �

[k],[k]�

}D Of�yVjyT�

using a Monte Carlo approximation. Here, ��[k],[k]� ∼ f��,jyT� are obtained using Algorithm2 based only on the training data yT, z[k]

V ∼ f�zVj�[k]� and f�yijzi, �,� is given in equation (12).

4. ELECTRICITY SPOT PRICE MODELLING

4.1. The Australian National Electricity Market

We fit a copula regression model for daily electricity spot prices in the Australian NationalElectricity Market (NEM). This is a wholesale market for the sale of all electricity in easternand southern Australia. It is geographically the largest interconnected power system in the world,covering a distance of over 4000 km. There are five regions in the NEM: New South Wales(NSW), Victoria (VIC), Queensland (QLD), South Australia (SA) and Tasmania (TAS). Becauseof the difficulty of long-distance transmission, each region has a separate power system andreference spot price. However, the regions are connected by high-voltage direct current (HVDC)interconnectors to allow inter-regional trade in electricity, with the objective of promoting morecost-efficient generation. For example, Victoria has a low cost of generation and is a net exporterto other regions, with total exports in 2009 equivalent to 8% of Victorian consumption. On theother hand, NSW is a major importer of electricity, importing approximately 7% of its electricityin 2009 from the other regions. An overview of the market is given in the annual report of theAustralian Energy Regulator (2009).

We model the five regional average daily electricity spot prices, measured in Australian dollarsper megawatt hour ($/MW h), between 1 January 2007 and 1 February 2010. Stronger dependencebetween these prices corresponds to a higher level of regional integration and competition. Figure 2provides pairwise scatter plots of the n D 1127 observations of the logarithm of prices, along withhistograms of the data. Even after a logarithmic transformation the prices are skewed, whilethe scatter plots show strong positive cross-sectional dependence between prices. This indicatesthat inter-regional trade goes some way to align prices, but nevertheless prices can still differsubstantially between regions. This is because inter-regional transmission can be limited by HVDCinterconnector capacity, and because of short-term bidding behaviour by generating utilities.

Throughout, the form of cross-sectional dependence between log-prices appears highly nonlin-ear. This is particularly apparent between TAS and the other four regions, but is also the case forother pairs of regions including NSW and VIC, which are the largest regions in the NEM. We aimto show that this nonlinear dependence is better captured by the skew t copula than the symmetriccopula alternative. Accurate characterization of this dependence is important for a variety of rea-sons. These include the hedging activities of participating utilities and third-party participants in

Copyright 2010 John Wiley & Sons, Ltd. J. Appl. Econ. 27: 500–522 (2012)DOI: 10.1002/jae

510 M. SMITH, Q. GAN AND R. KOHN

2 3 4 5 62 3 4 5 6

QLD

2 3 4 5 6

VIC

2 3 4 5 6

TAS

2 3 4 5 6

2

3

4

5

6

SA

NS

W

2

3

4

5

6

QLD

2

3

4

5

6

VIC

2

3

4

5

6

TA

S

2

3

4

5

6S

A

NSW

Figure 2. Pairwise scatter plot of the logarithm of observed average daily electricity prices for the five regionsin the Australian National Electricity Market (NEM). Panels on the leading diagonal contain histograms ofthe log-price data. This figure is available in color online at wileyonlinelibrary.com/journal/jae

the NEM. It can also highlight pairs of regions where there is potential for increased inter-regionaltrade in electricity due to higher levels of price misalignment.

4.2. Inter-regional Copula Heteroscedastic Regression Model

Because electricity is a flow commodity, the ability to arbitrage over time is severely limitedand the analysis of time-based returns makes little sense. Studies therefore focus on spot pricesthemselves; for example, see Knittel and Roberts (2005) and Koopman et al. (2007). Moreover,unlike equities, electricity spot prices are well known to have a very strong predictable component.This includes annual and weekly periodicity and day type effects in both the level and volatilityof average daily prices; see Nowicka-Zagrajek and Weron (2002), Panagiotelis and Smith (2008),Karakatsani and Bunn (2008) and Weron and Miserok (2008).

We model the logarithm of prices using an inter-regional copula with heteroscedastic regressionsfor each margin. For each region j the logarithm of price yij on day i follows the regression

yij D x0ijˇj C �uijeij, where logf��uij�2g D x0

ij˛j

Copyright 2010 John Wiley & Sons, Ltd. J. Appl. Econ. 27: 500–522 (2012)DOI: 10.1002/jae

MODELLING DEPENDENCE USING SKEW T COPULAS 511

Here, xij are exogenous variables and ˇj and ˛j are margin-specific linear coefficients forthe first and second moment of prices, respectively. We assume fe1j, . . . , enjg to be indepen-dently distributed either univariate symmetric or skew t, normalized so that var�eij� D 1 andE�eij� D 0. We denote the skew parameter and degrees of freedom for margin j as υuj and�uj. The exogenous variables we consider are time-based and it is through these that serialdependence in both moments is captured. They include an intercept and the seasonal harmon-ics fsin�2 k TOY�, cos�2k TOY�; k D 1, 2, 3, 4g, where 0 � TOY < 1 is the time of year. Alsoincluded are 17 or 18 day type dummy variables: one for each day of the week and either 10 or 11for the public holidays in each region. These five heteroscedastic regressions define the marginaldistributions Hij�yij; �j� with parameters �j D fˇj, ˛j, υuj, �ujg for the skew t, and �j D fˇj, ˛j, �ujgfor the symmetric t.

To account for inter-regional dependence in prices we consider the following four copula modelsfor �ei1, ei2, . . . , ei5�:

ž Model A: symmetric t margins with symmetric t copula;ž Model B: skew t margins with symmetric t copula;ž Model C: skew t margins with skew Gaussian copula; andž Model D: skew t margins with skew t copula.

These define four multivariate distributions for the vector yi D �yi1, . . . , yi5�0 of log-prices.Inter-regional price dependence is captured by the copula parameters �, marginal tail behaviourby �uj, and marginal asymmetry in Models B, C and D by υuj. Employing skew t margins in thesethree models is unrelated to the choice of copula. Model D is the most general and nests theother models. We are interested in the question of whether or not the models with skew-ellipticalcopulas capture inter-regional price dependence better than those with symmetric copulas.

4.3. Marginal Estimates

The four copula regression models are each fitted using the MCMC algorithm in Section 3, and wefirst focus on the estimates of the margins. Figure 3 plots the estimated posterior means and 95%probability intervals for prices, along with the observed data. For ease of exposition these are onlypresented for the two largest (NSW and VIC) and smallest (TAS) regions over the second half of2009 for Model D. Results over the whole period and other regions were similar. The probabilityintervals are computed analytically using the inverse marginal distribution functions H�1

ij ��; O�j�computed at � D 0.025 and � D 0.975, where O�j is the posterior mean estimate of �j. The fittedmarginal distributions exhibit accurate coverage; for example, over all margins and observations,empirical coverage of the 95% probability intervals for Model D was 95.18%. Using the posteriormeans as point estimates, we obtain estimates of the heteroscedastic-corrected disturbances Oeij. Wefind no evidence for serial dependence, nor for autoregressive heteroscedasticity, in any of thesefive residual series using lags up to 7 days at a 5% significance level.1 The serial dependence inthe first two moments of prices therefore appears well captured by the time-based covariates.

The posterior mean estimates of fυuj, �ujg are reported at the top of Table I. Even after alogarithmic transformation, all five marginal distributions are asymmetric and heavier tailed thana Gaussian2, which is consistent with other analyses of electricity price data (Panagiotelis and

1 The test for serial dependence is based on the MLE of the autoregressive coefficients. The test for autoregressiveheteroscedasticity is based on the commonly employed Lagrange multiplier test.2 One way to measure the heaviness of extreme tails is to use the power law exponent. The power tail behaviour of theskew t both of Azzalini and Capitanio (2003) and Sahu et al. (2003) is the same (Fung and Seneta, 2010, eqn 28), andhas power law exponent equal to the degrees of freedom �uj .

Copyright 2010 John Wiley & Sons, Ltd. J. Appl. Econ. 27: 500–522 (2012)DOI: 10.1002/jae

512 M. SMITH, Q. GAN AND R. KOHN

1−Jul 21−Jul 10−Aug 30−Aug 19−Sep 9−Oct 29−Oct 18−Nov 8−Dec

12.2

20.1

33.1

54.6

90.0

148.4

Day

Pric

e ($

MW

h)(a)

1−Jul 21−Jul 10−Aug 30−Aug 19−Sep 9−Oct 29−Oct 18−Nov 8−Dec

7.4

12.2

20.1

33.1

54.6

Day

Pric

e ($

MW

h)

(b)

1−Jul 21−Jul 10−Aug 30−Aug 19−Sep 9−Oct 29−Oct 18−Nov 8−Dec

4.5

7.4

12.2

20.1

33.1

54.6

Day

Pric

e ($

MW

h)

(c)

Figure 3. Plot of the estimated posterior mean (black line) and 95% posterior probability intervals (thingrey lines) from the estimated marginal distributions of electricity prices using Model D. (a) NSW; (b) VIC;(c) TAS. The estimates are produced for the last 6 months of 2009 on the logarithmic scale for ease ofexposition. The observed prices are also plotted as hollow circles. This figure is available in color online atwileyonlinelibrary.com/journal/jae

Smith, 2008; Weron and Misiorek, 2008). Histograms of the probability integral transformed datawere close to uniform, again suggesting the marginal models are suitable. The marginal parametersof Models B, C and D are perturbed only slightly by the choice of copula.

Copyright 2010 John Wiley & Sons, Ltd. J. Appl. Econ. 27: 500–522 (2012)DOI: 10.1002/jae

MODELLING DEPENDENCE USING SKEW T COPULAS 513

Table I. Monte Carlo estimates of the posterior means of the parameters, along with 90% posterior probabilityintervals in parentheses, for the electricity spot price model. The marginal and copula parameters are reportedseparately, as is the logarithm of the BCV criterion for each model (higher values are preferred). Eachmargin has separate univariate degrees-of-freedom parameter �uj , while there is only one degrees-of-freedom

parameter � for the multivariate skew t copula

Region Parameter Model A Model B Model C Model D

Marginal parametersSA υu1 — 0.72 0.79 0.78

(0.69, 0.78) (0.74,0.81) (0.76, 0.86)�u1 6.9 9.2 8.7 8.0

(6.1, 12.3) (7.9, 13.1) (7.6, 10.1) (6.5, 9.1)TAS υu2 — �0.95 �0.89 �0.92

(�1.03, �0.82) (�0.92, �0.85) (�0.94, �1.08)�u2 5.3 6.2 4.5 5.3

(4.1, 7.6) (6.0, 13.7) (3.3, 8.2) (4.5, 6.1)VIC υu3 — 1.07 1.09 1.03

(1.02, 1.19) (1.02, 1.19) (0.87, 1.09)�u3 8.4 10.2 11.2 10.8

(7.0, 12.1) (7.3, 15.5) (8.4, 12.9) (9.4, 12.0)QLD υu4 — 1.24 1.34 1.23

(1.22, 1.29) (1.32, 1.40) (1.17, 1.28)�u4 9.8 10.7 10.9 9.8

(7.9,11.3) (8.9, 12.4) (8.1, 11.8) (8.9, 10.1)NSW υu5 — 0.93 0.98 1.00

(0.89, 0.97) (0.95, 1.03) (0.89, 1.15)�u5 4.2 7.9 8.3 7.5

(3.5, 9.7) (6.2, 11.8) (7.2, 9.5) (6.4, 9.3)

Copula parametersSA υ1 — — �1.42 �1.38

(�1.49, �1.38) (�1.45, �1.29)TAS υ2 — — -1.92 �2.03

(�1.99, �1.84) (�2.18, �1.94)VIC υ3 — — 1.83 1.75

(1.72, 1.89) (1.70, 1.88)QLD υ4 — — 1.40 1.32

(1.31,1.48) (1.29, 1.51)NSW υ5 — — 1.91 1.98

(1.78, 1.97) (1.83, 2.10)� 5.4 4.8 — 5.7

(3.3, 11.9) (4.2, 7.8) — (4.1, 8.9)log�BCV�

�1032 �1003 �985 �941

4.4. Copula Estimates

The bottom of Table I provides estimates of the copula parameters fυ1, . . . , υ5, �g. The copula ofModel D has estimated degrees of freedom E��jy� D 5.7, suggesting the skew t copula dominatesthe skew Gaussian of Model C. For both skew copulas the estimate of υ is located far awayfrom the zero vector, indicating that the nonlinear dependence apparent in Figure 2 is better fitby a skew-elliptical copula than a symmetric elliptical copula. Also reported in the last row ofTable I is the BCV criterion for each model, which further suggests that Model D dominates theother three models. This is confirmed as significant by pairwise tests of the fivefold BCV metric,so that the extra flexibility of the skew t copula appears warranted. To highlight the substantialdifference between the fitted distributions, Figure 4 plots the contours of the bivariate distributionof log-price for NSW and VIC for all models on 16 February 2009, which is a regular working

Copyright 2010 John Wiley & Sons, Ltd. J. Appl. Econ. 27: 500–522 (2012)DOI: 10.1002/jae

514 M. SMITH, Q. GAN AND R. KOHN

NSW

VIC

Model A

1 2 3 4 5 61

2

3

4

5

6

NSW

VIC

Model B

1 2 3 4 5 61

2

3

4

5

6

NSW

VIC

Model C

1 2 3 4 5 61

2

3

4

5

6

NSW

VIC

Model D

1 2 3 4 5 61

2

3

4

5

6

(a) (b)

(d)(c)

Figure 4. Contour plots of the estimated bivariate density of log-prices in NSW and VIC on 16 February2009. The four panels contain the four densities arising from the four different models considered in Section4. This figure is available in color online at wileyonlinelibrary.com/journal/jae

Table II. Upper triangle: the estimated posterior means of the elements of the correlation matrix Ł, alongwith 90% probability intervals in parentheses, for the regional logarithmic price in the NEM. Lower triangle:

estimated posterior means of the corresponding Spearman correlations

SA TAS VIC QLD NSW

SA 0.70 0.89 0.72 0.85(0.65, 0.72) (0.85, 0.92) (0.69, 0.73) (0.83, 0.87)

TAS 0.68 0.77 0.69 0.63(0.75, 0.81) (0.66, 0.72) (0.60, 0.65)

VIC 0.84 0.72 0.82 0.90(0.80, 0.85) (0.89, 0.93)

QLD 0.70 0.66 0.80 0.94(0.91, 0.95)

NSW 0.82 0.60 0.91 0.92

weekday. The skew-elliptical copulas in Models C and D characterize strong nonlinear dependencein comparison to the symmetric copulas in Models A and B.

Table II reports the estimates of the elements of Ł for Model D on the upper triangle, and theSpearman correlations on the lower triangle. Higher values of both correspond to higher levels of

Copyright 2010 John Wiley & Sons, Ltd. J. Appl. Econ. 27: 500–522 (2012)DOI: 10.1002/jae

MODELLING DEPENDENCE USING SKEW T COPULAS 515

dependence. The NSW and QLD regions have the highest dependence, suggesting that they arewell integrated in the market and prices aligned. Dependence is lowest between TAS and the otherregions, so that prices are often misaligned and price competition imperfect. We note here thatthe TAS region was linked to the NEM with the opening of the Basslink HVDC interconnector inApril 2006. These results suggest that the TAS region remains poorly integrated into the NEM.

4.5. Inter-regional Dependence of High Prices

Of particular interest to participants in the NEM are high prices, which result from supply anddemand imbalances and are called ‘spikes’; see Kanamura and Ohashi (2007) and Smith (2010)for discussions on the formation of intraday prices. Table III provides the estimates of extremetail dependence �upper

jl �˛� and �lowerjl �˛� between prices in regions NSW and VIC for fitted Models

A and D and ˛ D 0.05, 0.01, 0.001. They are the same for Model A because the copula used issymmetric t, whereas the upper tail dependence is much greater than the lower for Model D.

Generators in low-price regions would like to export electricity to higher-price ones (once anyphysical loss due to transmission is taken into account). We can also compute the probabilityof high prices in one region, given low prices in another. That is, for two regional prices Pjand Pk , and for two dollar values A > B, Pr�Pj > AjPk < B� can be computed from the fittedmodel in the same manner as the tail dependence measures. Table III also reports Pr�PNSW >$75 jPVIC < $55) and Pr(PVIC > $75 jPNSW < $55), where a price gap of 20$/MW h is likelyto result in profitable export from the low-price region to the high-price region. The probabilitiesare not equal, and suggest that VIC is more likely to export electricity to NSW during periodsof high prices, rather than the other way around. The results differ by choice of copula, furtherhighlighting the importance of copula choice.

5. DISCRETE MARGINS

5.1. Extending the Bayesian Method

We outline how to extend the approach to the case where all margins are discrete-valued, althoughthe case where there is a combination of continuous and discrete margins is similar. Following

Table III. Estimated dependence measures between prices in the NSW and VIC regions for both the fittedsymmetric copula (Model A) and the skew t copula (Model D). Upper and lower tail dependence measuresare given, along with the probabilities of high prices in one region, conditional upon much lower prices in

the other

Model A Model D

Tail dependence�lower

NSW,VIC(0.05) 0.73 0.60

�lowerNSW,VIC(0.01) 0.76 0.42

�lowerNSW,VIC(0.001) 0.77 0.37

�upperNSW,VIC(0.05) 0.73 0.77

�upperNSW,VIC(0.01) 0.76 0.80

�upperNSW,VIC(0.001) 0.77 0.89

ProbabilityPr(PNSW > $75 jPVIC < $55) 0.008 0.012

Pr(PVIC > $75 jPNSW < $55) 0.005 0.006

Copyright 2010 John Wiley & Sons, Ltd. J. Appl. Econ. 27: 500–522 (2012)DOI: 10.1002/jae

516 M. SMITH, Q. GAN AND R. KOHN

Pitt et al. (2006) we treat xij as latent variables and generate their values as follows. When yij isdiscrete, Hij is many-to-one and

Hij�y�ij; �j� � FSt,j�xij; υj, �� < Hij�yij; �j� �15�

where Hij�y�ij; �j� is the left-hand limit of Hij at yij. For ordinal-valued data Hij�y

�ij; �j� D

Hij�yij � 1; �j�. Therefore, conditional upon xij, the density f�yijjxij, �,� D I�TLij � xij < TUij�,where the limits TLij D F�1

St,j�Hij�y�ij; �j�; υj, �� and TUij D F�1

St,j�Hij�yij; �j�; υj, ��. Therefore, ifxinj D fxinxijg, we generate one at a time from the density

f�xijjfxnxijg, z, �,, y� ∝ f�yijjxij, �,�f�xijjxinj, zi, ��∝ I�TLij � xij < TUij�f�xijjxinj, zi, �� �16�

From equation (9), xijzi, � ∼ N(Dqi,

1wi

), so that xij in equation (16) is N��ij, �2

ij� con-

strained between lower bound TLij and upper bound TUij, with �ij and �2ij the usual conditional

mean and variance.We provide the full sampling scheme in the Appendix, but highlight here the more involved

steps. Let xÐj D fx1j, . . . , xnjg and xÐnj D fxnxÐjg. We generate z, , � each conditional upon x, butgenerate υj, �j each conditional upon xÐnj to improve the mixing of the Markov chain. To derivethe reduced conditional distributions, we note that

f�yj�,, z, xÐnj� D∫f�yj�,, z, x�f�xÐjjxÐnj, z, �,�dxÐj

D∫

∏i,k

f�yikj�, xik, �k� ∏

i

f�xijjxinj, zi, ��dxÐj

D

∏i,k 6Dj

f�yikj�, xik, �k�

∏i

��TUij;�ij, �2ij���TLij;�ij, �

2ij��

where TLij, TUij, �ij and �2

ij are as given in equation (16) and �x;m, s2� is the distribution functionof a normal random variable with mean m and variance s2. Also, note that TLij, T

Uij are functions

of �υj, �, �j�, and that �ij, �ij are functions of (υ, qi, wi, ). Then we generate �j and υj using MHwith the same proposal for �j as in Section 3.2 and a random walk proposal for υj. The posterior

f�υj, �jjf�nυjg, xÐnj, fn�jg, z, y� ∝ f�yj�,, z, xÐnj�f�xÐnjjz, ��f�zj���υj���j�∝

∏i

��TUij;�ij, �2ij���TLij;�ij, �

2ij���υj���j�

When the margins are discrete we define BCV�M� D [∏l

OPr�YV D yV�l�jyT�l��]1/5, which usesthe predictive probability mass function under model M. As in the continuous case, we canapproximate the mass function using the Monte Carlo iterates, with

Pr�YV D yVjyT� ³∏i2V

{1

K

K∑kD1

Pr�Yi D yijz[k]i , �

[k],[k]�

}, where

Copyright 2010 John Wiley & Sons, Ltd. J. Appl. Econ. 27: 500–522 (2012)DOI: 10.1002/jae

MODELLING DEPENDENCE USING SKEW T COPULAS 517

Pr�Y D yjz, �,� D2∑

j1D1

Ð Ð Ð2∑

jrD1

��1��j1C...Cjr�CŁ� Qu1j1, . . . , Qurjr ; z, ��

Quj1 D Hj�yj; �j�, and Quj2 D Hj�y�j ; �j� is the left-hand limit of Hj at yj. Here, CŁ is the same

Gaussian distribution function as in equation (10).

5.2. Modelling Website Exposure

Discrete-valued regression and choice models are common tools in contemporary econometricanalysis (Cameron and Trivedi, 1998). Much existing analysis is univariate, but a multivariateanalysis using copulas has strong potential in many areas; see Cameron et al. (2004) for an examplein health economics and Danaher and Smith (2010) for several in marketing. We consider one suchmarketing example here, where we model ordinal-valued website exposure data. The data werecollected by ComScore Networks and made available by subscription via the Wharton ResearchData Service. It consists of the recorded internet activity from a randomly selected subsample of100,000 members of a panel of over two million Internet users from across the USA. We followDanaher (2007) and measure exposure to a website during an Internet session by the number ofpages viewed at each site. To demonstrate the advantages of using the skew t copula, we examinethe exposure to the top 15 websites,3 which are listed in Table IV.

The data are collected at the machine level (mostly home PCs and laptops) and we restrictattention here to all machines that had recorded Internet activity on 1 May 2007. This resultsin a dataset with n D 50, 217 observations (each of which corresponds to a machine) of thenumber of pages viewed at 15 websites (each of which corresponds to a margin). The samplemean and proportion of non-zeros in each margin is reported in the last two columns of Table IV.A popular model for ordinal exposure data, including for website page views, is the negativebinomial distribution (NBD); see Danaher (2007). We use this for the marginal models and estimatethem joint with the copula parameters.4 Table IV includes details on the posterior means of theNBD parameters when a skew t copula is used, along with the estimated mean exposure and theprobability of any exposure for each website (entitled the ‘reach’). The NBD provides an excellentfit.

Usually, advertisers are interested in functions of the vector of exposure counts y D�y1, . . . , y15�0, rather than the whole multivariate distribution of y. This is because in an advertisingcampaign banner ads are placed upon multiple websites, not just one. The most common functionis total exposure across all websites in an advertising campaign, denoted here for our 15 sitesas TE D ∑15

jD1 yj. Even though the managerial requirement is just a sum of the exposures, thefull multivariate model is necessary due to inter-website dependence. Models which capture themultivariate distribution first, then use this distribution to estimate TE, dominate simpler modelsthat directly estimate the distribution of TE (Danaher, 1991). Knowing the distribution of TE alsoenables the estimation of key advertising metrics, such as total reach, defined as Pr�TE ½ 1�.

Table V shows the distribution of TE over the 15 websites for both the fitted skew t copulaand symmetric t copula. These are computed by simulating from each parametric model at theend of each sweep of the sampling scheme to give a Monte Carlo iterate of TE. Also includedare the empirically observed relative frequencies in the second column, and model accuracy isassessed by comparing these with the probabilities obtained from the two parametric models. We

3 These were the 15 websites with the most pages viewed during 2007.4 The NBD has mass function f�y� D

(r C y � 1

y

)pr�1 � p�y , with r D 15 here.

Copyright 2010 John Wiley & Sons, Ltd. J. Appl. Econ. 27: 500–522 (2012)DOI: 10.1002/jae

518 M. SMITH, Q. GAN AND R. KOHN

Table IV. Top 15 websites ranked by number of pages viewed. Columns 2–5 give posterior estimates forthe fitted marginal NBDs, including the posterior means of the NBD parameters and the first moment andprobability of an exposure (entitled ‘reach’ in the marketing literature) for each website. Also reported for

comparison in the last two columns are the sample means and proportion of non-zeros in the data

Website/margin E�rjy� E�pjy� E�Yj� Pr�Yj > 0� yj #�yj > 0�

myspace.com 0.061 0.002 20.3 0.264 20.5 0.262(0.059 0.065) (0.001 0.003)

yahoo.com 0.151 0.013 10.0 0.472 10.3 0.474(0.148 0.156) (0.011 0.015)

facebook.com 0.014 0.002 5.19 0.072 5.23 0.069(0.011 0.015) (0.001 0.003)

msn.com 0.100 0.023 4.16 0.301 4.14 0.308(0.095 0.102) (0.021 0.026)

google.com 0.115 0.030 3.85 0.340 3.83 0.337(0.113 0.118) (0.027 0.032)

ebay.com 0.021 0.010 2.62 0.107 2.59 0.101(0.020 0.024) (0.007 0.011)

aol.com 0.053 0.025 1.95 0.172 1.93 0.169(0.048 0.054) (0.024 0.029)

craigslist.org 0.004 0.003 1.43 0.022 1.42 0.023(0.003 0.005) (0.002 0.004)

hotbar.com 0.017 0.011 1.34 0.060 1.30 0.067(0.016 0.020) (0.009 0.015)

youtube.com 0.011 0.010 1.32 0.075 1.29 0.073(0.008 0.015) (0.008 0.013)

starware.com 0.029 0.021 0.89 0.090 0.94 0.093(0.026 0.031) (0.019 0.025)

comcast.net 0.008 0.010 0.88 0.043 0.85 0.041(0.007 0.010) (0.008 0.013)

zango.com 0.009 0.016 0.58 0.038 0.55 0.036(0.007 0.012) (0.014 0.018)

go.com 0.012 0.029 0.46 0.049 0.48 0.047(0.011 0.016) (0.026 0.032)

photobucket.com 0.007 0.013 0.49 0.028 0.46 0.023(0.005 0.009) (0.011 0.014)

use two measures popular in the marketing literature (Leckenby and Kishi, 1984; Danaher, 1991).Denote fs D Pr�TE D s� and Ofs as the observed and estimated exposure distribution probabilities,respectively. The first metric is called the relative error in reach (RER), and the second is the errorin exposure probabilities over reach (EPOR), where

RER D j Of0 � f0j1 � f0

and EPOR D

15∑sD1

j Ofs � fsj

1 � f0

We limit the EPOR calculation to 15 exposures as there is little managerial interest in exposuresbeyond this range. The RER and EPOR measures are reported in Table V and are substantiallylower for the skew t copula. This suggests an improved fit, something confirmed by the BCVcriterion reported in Table V and pairwise tests between fivefold BCV metric values.

6. DISCUSSION

Various skew t distributions have been applied to problems in econometrics and elsewhere; forexample, see Aas and Haff (2006) and Panagiotelis and Smith (2008). However, their usefulness

Copyright 2010 John Wiley & Sons, Ltd. J. Appl. Econ. 27: 500–522 (2012)DOI: 10.1002/jae

MODELLING DEPENDENCE USING SKEW T COPULAS 519

Table V. Total exposure distributions for the website exposure example. These are reported for both thefitted symmetric and skew t copula models, and also for the observed empirical distribution. The marketingmetrics RER and EPOR are also reported for the two parametric models, with low values indicating improved

alignment with the observed empirical distribution. The BCV criteria is also reported

Total exposure count Observed relative frequency Symmetric t copula Skew t copula

0 0.07 0.11 0.091–5 0.19 0.11 0.176–10 0.10 0.12 0.1111–20 0.13 0.12 0.1321–30 0.09 0.07 0.0931–40 0.07 0.05 0.0534–50 0.05 0.07 0.0651–100 0.13 0.06 0.11101–200 0.10 0.07 0.11201–300 0.03 0.07 0.04301–400 0.02 0.06 0.02401–500 0.01 0.06 0.01>501 0.01 0.03 0.01

RER 0.04 0.02EPOR 0.25 0.11log(BCV) �163,180 �140,291

is restricted in empirical work because there is only a single degree-of-freedom parameter �.However, when used to form a copula they have great potential. Joe (1997) and Patton (2006)show that flexible bivariate copulas with asymmetric tail dependence are useful in applied analysis.A skew t copula can account for a wider form of nonlinear dependence than a symmetric t copula,but is one of only a few that can be used in practice in higher dimensions. For example, becauseit allows for asymmetric pairwise tail dependencies, it could be used in a multivariate extensionof the bivariate study of contagion in the financial markets by Rodriguez (2007). As far as weare aware, Demarta and McNeil (2005) were the first to suggest constructing a skew t copula,although they employ a different form of skew t distribution. Recently, Sun et al. (2008) used thiscopula to study dependence in the German equity markets, while Ammann and Suss (2009) usedit to account for dependence between changes in volatility and equity returns. Estimation is viaeither simulated maximum likelihood or the EM algorithm, and they find that this skew t copulais preferable to the t copula.

We instead construct a skew t copula using the distribution of Sahu et al. (2003). Extendingthe Bayesian method to other conditionally Gaussian constructions, such as that of Azzalini andCapitanio (2003), requires only minor modifications to Steps 2 and 3 of Algorithm 2. This type ofskew t distribution is currently the most popular form (see the discussions in Genton, 2004) anddifferent from that employed by Demarta and McNeil (2005). We show that Bayesian estimationis particularly attractive, with alternative approaches to estimation difficult to implement for ourproposed copula. This is also true for other skew t copulas when there are discrete margins, or whenjoint estimation of marginal and copula parameters is required. Moreover, when the dimensionis high, the Bayesian covariance selection approach outlined in Pitt et al. (2006) can be used toidentify parsimony in �1, and the Bayesian skew selection approach of Panagiotelis and Smith(2010) to identify zeros in υ.

Our analysis of market integration in the Australian electricity market shows that there is strongnonlinear dependence between regional prices. There is also evidence of nonlinear dependence inforeign exchange (Patton, 2006) and equity (Rodriguez, 2007) returns, so that our approach hasimmediate potential here, although research on the specification of the dynamics of dependence

Copyright 2010 John Wiley & Sons, Ltd. J. Appl. Econ. 27: 500–522 (2012)DOI: 10.1002/jae

520 M. SMITH, Q. GAN AND R. KOHN

is ongoing; see Asai and McAleer (2009) for a recent discussion. Last, we also show how toestimate the copula model when the margins are discrete-valued. Multivariate analysis of suchdata has strong potential in a number of areas, including microeconometrics, transport studies andmarketing science. We demonstrate that here in the latter case with a study of website audienceexposure, where a more accurate estimate of the dependence structure results in an improvedestimate of the distribution of total exposure.

ACKNOWLEDGEMENTS

The work of Michael Smith and Robert Kohn was partially supported by Australian ResearchCouncil Discovery grants DP0985505 and DP0988579, respectively. The authors thank ComScorefor providing the online panel data for the second empirical study and three referees and editorialteam for comments that helped improve the paper. The first author would also like to thank PeterDanaher for many useful discussions regarding the modelling of marketing data.

REFERENCES

Aas K, Haff I. 2006. The generalized hyperbolic skew Student’s t-distribution. Journal of Financial Econo-metrics 4(2): 275–309.

Aas K, Czado C, Frigessi A, Bakken H. 2009. Pair-copula constructions of multiple dependence. Insurance:Mathematics and Economics 44(2): 182–198.

Ammann M, Suss S. 2009. Asymmetric dependence patterns in financial time series. European Journal ofFinance 15(7): 703–719.

Asai M, McAleer M. 2009. Dynamic conditional correlations for asymmetric processes. Working paper(currently available at Social Science Research Network).

Azzalini A, Capitanio A. 2003. Distributions generated by perturbation of symmetry with emphasis on amultivariate skew t distribution. Journal of the Royal Statistical Society , Series B 65: 367–389.

Australian Energy Regulator. 2009. State of the Energy Market 2009. Available: www.aer.gov.au [14 February2010].

Bae K-H, Karolyi G, Stulz R. 2003. A new approach to measuring financial contagion. Review of FinancialStudies 16: 717–763.

Barnard J, McCulloch R, Meng X. 2000. Modeling covariance matrices in terms of standard deviations andcorrelations with application to shrinkage. Statistica Sinica 10: 1281–311.

Bedford T, Cooke R. 2002. Vines: a new graphical model for dependent random variables. Annals of Statistics30: 1031–1068.

Cameron A, Trivedi P. 1998. Regression Analysis of Count Data. Econometric Society Monographs . Cam-bridge University Press: Cambridge, UK.

Cameron A, Tong L, Trivedi P, Zimmer D. 2004. Modelling the differences in counted outcomes usingbivariate copula models with application to mismeasured counts. Econometrics Journal 7: 566–584.

Cherubini U, Luciano E, Vecchiato W. 2004. Copula Methods in Finance. Wiley: New York.Danaher P. 1991. A canonical expansion model for multivariate media exposure distributions: a generalization

of the duplication of viewing law. Journal of Marketing Research 28(3): 361–367.Danaher P. 2007. Modeling page views across multiple websites with an application to Internet reach and

frequency prediction. Marketing Science 26(3): 422–437.Danaher P, Smith M. 2010. Modeling multivariate distributions using copulas: applications in marketing

(with discussion). Marketing Science (forthcoming).Demarta S, McNeil A. 2005. The t copula and related copulas. International Statistical Review 73(1):

111–129.Fang HB, Fang KT, Kotz S. 2002. The meta-elliptical distributions with given marginals. Journal of

Multivariate Analysis 82: 1–16.Frees EW, Valdez EA. 2008. Hierarchical insurance claims modelling. Journal of the American Statistical

Association 103(484): 1457–1469.Fung T, Seneta E. 2010. Tail dependence for two skew t distributions. Statistics and Probability Letters 80:

784–791.

Copyright 2010 John Wiley & Sons, Ltd. J. Appl. Econ. 27: 500–522 (2012)DOI: 10.1002/jae

MODELLING DEPENDENCE USING SKEW T COPULAS 521

Genton M (ed.). 2004. Skew-Elliptical Distributions and their Applications. Chapman & Hall: London.Genz A, Bretz F. 2002. Methods for the computation of multivariate t-probabilities. Journal of Computational

and Graphical Statistics 11: 950–971.Joe H. 1997. Multivariate Models and Dependence Concepts. Chapman & Hall: London.Kanamura T, Ohashi K. 2007. A structural model for electricity prices with spikes: measurement of spike

risk and optimal policies for hydropower plant operation. Energy Economics 29: 1010–1032.Karakatsani N, Bunn D. 2008. Intra-day and regime-switching dynamics in electricity price formation. Energy

Economics 30: 1776–1797.Knittel C, Roberts M. 2005. An empirical examination of restructured electricity prices. Energy Economics

27: 791–817.Koopman S, Ooms M, Carnero M. 2007. Periodic seasonal Reg-ARFIMA-GARCH models for daily elec-

tricity spot prices. Journal of the American Statistical Association 102(477): 16–27.Leckenby J, Kishi S. 1984. The Dirichlet-multinomial distribution as a magazine exposure model. Journal

of Marketing Research 21: 100–106.Liseo B, Loperfido N. 2006. A note on reference priors for the scalar skew-normal distribution. Journal of

Statistical Planning and Inference 136(2): 373–389.Longin F, Solnik B. 2001. Extreme correlation of international equity markets. Journal of Finance 56(2):

649–676.McNeil AJ, Frey R, Embrechts R. 2005. Quantitative Risk Management: Concepts, Techniques and Tools.

Princeton University Press: Princeton, NJ.Min A, Czado C. 2010. Bayesian inference for multivariate copulas using pair-copula constructions. Journal

of Financial Econometrics 8(4): 511–546.Nelsen R. 2006. An Introduction to Copulas (2nd edn). Springer: Berlin.Nowicka-Zagrajek J, Weron R. 2002. Modeling electricity loads in California: ARMA models with hyper-

bolic noise. Signal Processing 82: 1903–1915.Panagiotelis A, Smith M. 2008. Bayesian density forecasting of intraday electricity prices using multivariate

skew t distributions. International Journal of Forecasting 24: 710–727.Panagiotelis A, Smith M. 2010. Bayesian skew selection for multivariate models. Computational Statistics

and Data Analysis 54: 1824–1839.Patton AJ. 2006. Modelling asymmetric exchange rate dependence. International Economic Review 47:

527–556.Pitt M, Chan D, Kohn R. 2006. Efficient Bayesian inference for Gaussian copula regression models.

Biometrika 93(3): 537–554.Robert C, Casella G. 2004. Monte Carlo Statistical Methods (2nd edn). Springer: New York.Rodriguez J. 2007. Measuring financial contagion: a copula approach. Journal of Empirical Finance 14(3):

401–423.Rust R, Schmittlein D. 1985. A Bayesian cross-validated method for comparing alternative specifications of

quantitative models. Marketing Science 4(1): 20–50.Sahu SK, Dey DK, Branco MD. 2003. A new class of multivariate skew distributions with applications to

Bayesian regression models. Canadian Journal of Statistics 31: 129–150.Sklar A. 1959. Fonctions de repartition a n dimensions et leurs marges. Publications de l’Institut de Statistique

de L’Universite de Paris 8: 229–231.Smith M. 2010. Bayesian inference for a periodic stochastic volatility model of intraday electricity prices.

In Statistical Modelling and Regression Structures: Festschrift in Honour of Ludwig Fahrmeir, Kneib T,Tutz G (eds). Springer: Berlin; 353–376.

Smith M, Min A, Almeida A, Czado C. 2010. Modeling longitudinal data using a pair-copula decompositionof serial dependence. Journal of the American Statistical Association doi:10.1198/jasa.2010.tm09572.

Song P. 2000. Multivariate dispersion models generated from Gaussian copula. Scandinavian Journal ofStatistics 27: 305–320.

Sun W, Svetlozar R, Stoyanov SV, Fabozzi F. 2008. Multivariate skewed Student’s t copula in the analysisof nonlinear and asymmetric dependence in the German equity market. Studies in Nonlinear Dynamicsand Econometrics 12(2).

Trivedi P, Zimmer D. 2005. Copula modeling: an introduction for practitioners. Foundations and Trends inEconometrics 1(1): 1–110.

Weron R, Misiorek A. 2008. Forecasting spot electricity prices: a comparison of parametric and semipara-metric time series models. International Journal of Forecasting 24(4): 744–763.

Wong F, Carter C, Kohn R. 2003. Efficient estimation of covariance selection models. Biometrika 90(4):809–830.

Copyright 2010 John Wiley & Sons, Ltd. J. Appl. Econ. 27: 500–522 (2012)DOI: 10.1002/jae

522 M. SMITH, Q. GAN AND R. KOHN

APPENDIX

Section A

We first outline Steps 2 and 3 of Algorithm 2. Step 2 draws wi ∼ Gamma�a, b�, with a D �� C2r�/2 and b D 1

2[�xi � Dqi�0�1�xi � Dqi�C q0

iqi C �]. In Step 3, qijx, y,, � ∼ N�P�1

i ai, P�1i �

constrained so that qi > 0, where Pi D IC wiD�1D and ai D wiD�1xi. We sample qij one at atime, conditional upon the other elements of qi, from univariate constrained normals.

Section B

We now outline the sampling scheme used in Section 5.

Algorithm 3. MCMC scheme for skew t copula (discrete margins)

For j D 1, . . . , r:

Step 1. Sample from f�υjjf�nυjg, xÐnj,, z, y� using MH.Step 2. Sample from f��jj�, xÐnj, fn�jg, z, y� using MH.Step 3. Sample from f�xjj�, xÐnj,, z, y� D ∏n

iD1 f�xijj�, fxnxijg,, z, y� for all i.Step 4. Sample one at a time from f�wijx, �,, fznwig� for all i.Step 5. Sample one at a time from f�qijjx, �,, fznqijg� for all i, j.Step 6. Sample from f�jx, f�ng,, z� using the method of Pitt et al. (2006) or alternative.Step 7. Sample from f��jx, f�n�g,, z, y� using random walk MH.

Steps 1, 2 and 3 are outlined in Section 5, while Steps 4, 5 and 6 are the same as Steps 2, 3and 5 of Algorithm 2, respectively. To implement the MH scheme in Step 7 the posterior is

f��jx, f�n�g,, z, y� ∝ f�yjx, �,�f�xj�,, z�f�zj�,����

∝∏i,j

f�yijjxij, �, �j�n∏iD1

f�zij�,����

∝∏i,j

I�TLij � xij < TUij�n∏iD1

f�wij�����

where f�wij�� is a Gamma density and the first product is over all i, j. Algorithm 3 featuressampling from reduced conditionals in Steps 1 and 2, so that the order of Steps 1–3 is important.However, because Algorithm 2 is a MH-within-Gibbs scheme, the order in which parameters andlatents are sampled can change.

Section C

Last, we provide some computational details. We run the sampling scheme for a burn-in periodof 100,000 iterates, after which a further 100,000 iterates are collected to form the Monte Carlosample. The scheme took 342 s and 481 s to generate 1000 iterates for the electricity and Internetexamples, respectively, coded in Matlab and running in serial on a standard 3 GHz Intel chip.

Copyright 2010 John Wiley & Sons, Ltd. J. Appl. Econ. 27: 500–522 (2012)DOI: 10.1002/jae