Bayesian statistics and MCMC methods for portfolio selection

Bayesian statistics and MCMC methods for portfo-lio selection

Facoltá di Ingegneria dell’Informazione, Informatica e Statistica

Corso di Laurea Magistrale in Scienze Statistiche e Decisionali

Candidate

Jacopo PrimaveraID number 1219046

Thesis Advisor

Prof. Pierpaolo Brutti

Academic Year 2012/2013

Thesis not yet defended

Bayesian statistics and MCMC methods for portfolio selectionMaster thesis. Sapienza – University of Rome

© 2013 Jacopo Primavera. All rights reserved

This thesis has been typeset by LATEX and the Sapthesis class.

Author’s email: [email protected]

iii

Contents

1 Introduction 1

2 Portfolio selection 32.1 Mean-variance portfolio . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.1.1 Diversification and hedging . . . . . . . . . . . . . . . . . . . 42.1.2 Two-assets case . . . . . . . . . . . . . . . . . . . . . . . . . . 52.1.3 Mean-variance model . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 The importance of estimation error . . . . . . . . . . . . . . . . . . . 82.3 Bayesian inference for portfolio selection . . . . . . . . . . . . . . . . 9

2.3.1 Bayesian theory review . . . . . . . . . . . . . . . . . . . . . 102.4 Allocation as decision . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.4.1 Choice under uncertainty . . . . . . . . . . . . . . . . . . . . 112.4.2 Maximum expected utility allocation . . . . . . . . . . . . . . 132.4.3 Bayesian allocation decision . . . . . . . . . . . . . . . . . . . 142.4.4 Bayesian paradigm justification . . . . . . . . . . . . . . . . . 20

3 Non-normal financial markets 233.1 Skewness and portfolio selection . . . . . . . . . . . . . . . . . . . . . 243.2 Skewed-elliptical models . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.2.1 Skewed-normal model . . . . . . . . . . . . . . . . . . . . . . 263.3 Simulation-based inference . . . . . . . . . . . . . . . . . . . . . . . . 27

3.3.1 Gibbs sampler . . . . . . . . . . . . . . . . . . . . . . . . . . 283.3.2 Sampling from the Bayesian skewed-normal model . . . . . . 29

4 Hedge fund portfolio application 314.1 Data description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.1.1 Univariate statistics . . . . . . . . . . . . . . . . . . . . . . . 334.1.2 Multivariate statistics . . . . . . . . . . . . . . . . . . . . . . 34

4.2 Model implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 354.3 Portfolio weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.4 Out-of-sample performance . . . . . . . . . . . . . . . . . . . . . . . 37

5 Conclusions 45

Appendices 47

iv Contents

A MCMC diagnostics 49A.1 Gelman-Rubin diagnostic . . . . . . . . . . . . . . . . . . . . . . . . 49A.2 Geweke diagnostic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

v

List of Figures

2.1 Two-assets mean-variance . . . . . . . . . . . . . . . . . . . . . . . . 62.2 Efficient frontier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.3 Mean-variance allocation instability . . . . . . . . . . . . . . . . . . 92.4 Non-informative Bayesian allocation . . . . . . . . . . . . . . . . . . 162.5 Sample and conjugate Bayesian frontiers . . . . . . . . . . . . . . . . 19

4.1 Hedge Fund Rate-of-Return time series . . . . . . . . . . . . . . . . . 324.2 Univariate graphical summaries . . . . . . . . . . . . . . . . . . . . . 344.3 QQ-plot univariate . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.4 QQ-plot multivariate . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.5 Bivariate normal level curves . . . . . . . . . . . . . . . . . . . . . . 374.6 Traceplots MCMC posterior mean vector mmm . . . . . . . . . . . . . . 384.7 Traceplots MCMC posterior variance VVV . . . . . . . . . . . . . . . . 394.8 Kernel MCMC densities for mmm . . . . . . . . . . . . . . . . . . . . . 394.9 Kernel MCMC densities for VVV . . . . . . . . . . . . . . . . . . . . . . 404.10 Allocation plot 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404.11 Allocation plot 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414.12 Out-of-sample MV allocations . . . . . . . . . . . . . . . . . . . . . . 424.13 Out-of-sample MV-skewed allocations . . . . . . . . . . . . . . . . . 424.14 Out-of-sample analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 434.15 Out-of-sample analysis (one plot) . . . . . . . . . . . . . . . . . . . . 44

A.1 Gelman plot for mmm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50A.2 Gelman plot for VVV . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51A.3 Geweke plot for mmm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52A.4 Geweke plot for VVV . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

vii

List of Tables

2.1 True parameters for non-informative analysis . . . . . . . . . . . . . 16

4.1 Univariate descriptive statistics . . . . . . . . . . . . . . . . . . . . . 334.2 Shapiro-Wilk normality test . . . . . . . . . . . . . . . . . . . . . . . 334.3 Multivariate Shapiro test . . . . . . . . . . . . . . . . . . . . . . . . 344.4 WinBugs summary statistics . . . . . . . . . . . . . . . . . . . . . . 41

A.1 Gelman-Rubin diagnostic for mmm . . . . . . . . . . . . . . . . . . . . . 50A.2 Gelman-Rubin diagnostic for VVV . . . . . . . . . . . . . . . . . . . . . 51A.3 Geweke diagnostic for mmm . . . . . . . . . . . . . . . . . . . . . . . . . 51A.4 Geweke diagnostic for VVV . . . . . . . . . . . . . . . . . . . . . . . . . 51

1

Chapter 1

Introduction

The attempt to rationalize an investment strategy has a long history, but it is onlysince the 1950s and the pioneering work of Markowitz that the study makes thenecessary quantum leap and begins to be properly framed as a statistical decisionproblem. The dramatic intuitiveness of the Markowitz model has greatly influencedthe entire financial industry and the mean-variance (MV) paradigm still representsa milestone in modern finance theory (Meucci, 2005). Although it conserves a greatimportance for academics and it is a guideline for practitioners, its actual use inthe financial industry is rare by now (see McNeil (2005)). This thesis addressestwo fundamental drawbacks of the classical mean-variance portfolio which havelead the financial industry to distrust it, and how to overcome them through theuse of Bayesian statistics and Markov chain Monte Carlo (MCMC) methods. Theshortcomings of the classical mean-variance portfolio have even suggested someauthors to recover to pure qualitative selection strategies, such as the naive equallyweighted portfolio, which allocates equal parts of the initial budget into each financialopportunity (see De Miguel (2007) for a discussion on the performance of the equallyweighted portfolio with respect to the mean-variance allocation). The limitationsof Markowitz portfolio can be cast into two major categories: (1) allocation resultsinstability and (2) empirically violated assumptions. The instability of the mean-variance optimization is the result of the highly non-linear relations existing betweenthe estimated inputs and the allocation results. The allocations turn out to beextremely sensitive to the estimates plugged in the optimizer and they can be twistedeven for minor changes in the initial inputs. It is therefore evident that accountingfor estimation error in portfolio selection is fundamental since the optimizationstep may exacerbate an error committed in the estimation step. But the classicalimplementation of the mean-variance optimizer involves a naive plug-in of the samplemarket estimates, then accounting only for the variability in the data (i.e. financialrisk). Considerable effort has been devoted to this issue with the goal of improving onthe performance of classical models. A prominent role in this vast literature is playedby the Bayesian Statistics, which can be a viable alternative to traditional methodsof implementation, naturally providing more robust results (i.e. more stable), as wellas more consistent solutions from a statistical decision point of view. Concerning theassumptions underlying the mean-variance paradigm, they turn out to be too simpleto capture the proven complexity of financial systems. Indeed, in the mean-variance

2 1. Introduction

portfolio, the prices are assumed to follow a Brownian motion, which is the standardmodel of finance theory and whose increments (i.e. financial returns) are normallydistributed; but rather than this simple model, return distributions resemble more anon-linear, chaotic system (see Mandelbrot (2006)). For instance they begin to bestudied by a new interdisciplinary research field called Econophysics, which attemptsto use the concepts from statistical physics to describe financial market behaviors(see Mantegna R. (2000) for an introduction to econophysics). These findings arenot surprising given that the financial markets are the result of the interactionsof millions of users worldwide, who act according to the most disparate criteria ofrationality. Thus, it is evident that the standard Gaussian model is not suitableto explain the actual movements of the markets and from this point of view moredetailed models should be used. There have been numerous proposed solutions tothis issue, with different levels of complexity, here a simple extension of the normalmodel will be investigated, which explicitly allows for asymmetry in the financialreturn series. In particular the model considered is an elliptical skewed-normalin the class of skewed distributions developed by Sahu (2003). Issues (1) and (2)have renewed the interest in portfolio choice problems and they constitute the mainmotivation of this study. To jointly account for estimation error and non-normalmarkets, a Bayesian framework is adapted to the skewed-normal model. The ensuingmodel is analytically intractable, as it is often the case when departing from standardassumptions, and it needs to be studied by means of a stochastic simulation method,such as an MCMC method. The thesis is structured as follows. The first partpresents the mean-variance portfolio developed by Markowitz, highlighting the maindrawbacks coming from its classical implementation. It follows an introduction to theBayesian methods and their applications to the mean-variance portfolio. The secondpart presents the elliptical-skewed models and the skewed-normal distribution. Itfollows a presentation of the MCMC methods needed to deal with a Bayesian versionof the skewed-normal model with particular focus on the Gibbs sampler. The thirdpart proposes an application for hedge fund industry data to empirically comparethe classical mean-variance strategy with that deriving from fitting the Bayesianskewed-normal model to the data and accounting for skewness in the investor’sutility function.

3

Chapter 2

Portfolio selection

In a classical portfolio choice framework the investor has access to N risky financialopportunities such as stocks, bonds, currencies, mutual funds and hedge funds.Denote as rrrt the rates of return (or returns) at time t for this N -dimensionalfinancial market. For a given initial wealth the investor wants to select a profitablecombination of these financial opportunities. The investor’s market is assumed tobe opened only at the time T when the allocation decision is made and at the timeT + τ when the investment horizon expires. In a classical single-period frameworkwe can set τ = 1 without loss of generality. This simple setting implies that hedgingactivities are not allowed and no trading can be made between times T and T + 1.Let W0 be the initial wealth at disposal of the investor and αk the number of units(shares in the case of equities, contracts in the case of futures, etc.) of k-th asset thatthe investor decides to hold in his portfolio. The ensuing portfolio for the investor isthe vector,

ααα = α1, α2, ..., αN (2.1)

A portfolio with a not-null allocation vector ααα can be equally represented by thepercentages wk (k = 1, ..., N) of initial wealth allocated in each asset,

www = w1, w2, ..., wN (2.2)

The main concern of the investor is given by the return rPT+1 on the portfolio,which is defined as the linear combination,

RPT+1 ≡ www′rrrT+1 =∑

wkrkT+1 (2.3)

and in particular the investor will be interested in his final wealth, given by:

W = W0(1 +www′rrrT+1) (2.4)

Now consider an investor starting to observe the N -dimensional market at timet = 1. The information collected up to time T is denoted as ΦΦΦT and contains the Ntime series of length T for a total of N × T single observations. The standard modelin finance theory implies that each of the N time series is:

• independent, i.e. each change price appears independently from the last;

4 2. Portfolio selection

• stationary, i.e. the process generating price changes, whatever it may be, staysthe same over time;

• normally distributed, i.e. price changes follow the proportions of the bell curve -most changes are small, an extremely few are large - in predictable and rapidlydeclining frequency

Besides, the joint distribution of the returns is multivariate normal and, asthe marginal distributions, have constant parameters over time. Summarizing, thereturns rrrt are distributed as a multivariate normal with mean vector µµµ and covariancematrix ΣΣΣ for every t = (1, ...T ):

rrrt ∼ N(µµµ,ΣΣΣ), (t = 1, ..., T ) (2.5)For the linear properties of the normal distribution from 2.5 follows that,

rPt ∼ N(w′µµµ,w′ΣΣΣw), (t = 1, ..., T ). (2.6)A normal distribution has the characteristic property of being completely specified

by its first two moments. It turns out that an investor with an objective based on2.3 would discriminate the potential satisfactions ensuing from an investment onlyby means of µµµ and ΣΣΣ.

2.1 Mean-variance portfolioLoosely speaking the mean-variance portfolio assess that an investor likes returnand dislikes risk and recognizes these two components in the mean and the varianceof portfolio return distribution 2.6. Given this brief preamble the mean-varianceportfolio developed by Markowitz (1952) does not seem particularly striking. Totell the truth, before 1950s the desirability of an investment was mainly equatedto its return component, neglecting the risk counterpart as a significant drive forinvestment decisions (see McNeil (2005) for a discussion).

2.1.1 Diversification and hedging

The main idea underlying the mean-variance optimization (MVO) is that, while thereturn of an investment is given by the sum of its parts, the risk of the investmentcan be less than the sum of its parts. This is due to the additivity of the expectedvalue operator and the subadditivity of the standard deviation operator. The mean-variance optimization gets advantage of that and doing so explicitly formalizes thefundamental concepts in finance of diversification and hedging. To better see thisfundamental aspects of portfolio selection theory let denote as µP the portfolioreturn, σ2

P the variance of portfolio return, σ2j the variance of the return on the j-th

asset and σjk the covariance between the returns of j-th and k-th asset:µP ≡

∑j µj j = 1, ..., N

σ2j ≡ (ΣΣΣ)jj j = 1, ..., Nσjk ≡ (ΣΣΣ)jk j 6= k

σ2P ≡ www′ΣΣΣwww =

∑k w

2kσ

2k +

∑k

∑j 6=k wkwjσjk

(2.7)

2.1 Mean-variance portfolio 5

Moreover, denote as ρjk the Pearson correlation coefficient between assets’ returnsj-th and k-th,

ρjk ≡σjkσjσk

j, k = 1, ...N (2.8)

so that the expression of portfolio variance (last equation in 2.7), can be refor-mulated as,

σ2P ≡

∑j

w2jσ

2j +

∑j

∑k 6=j

wkwjσkσj (2.9)

Equation 2.9 has an expressive geometric interpretation since it can be viewedas a reformulation of Cantor’s theorem (see Castellani G. (2005)), which expressesthe length of one side of a triangle starting from the length of the other two and themagnitude of the (external) angle between them. From a mean-variance portfoliopoint of view Cantor’s theorem highlights the interdependencies between the sumof all the covariance matrix elements (i.e. portfolio variance in the general case ofnon-null correlations between the assets) and the sum of the diagonal covariancematrix (i.e. portfolio variance in the particular case of perfectly uncorrelatedassets). In particular it is interesting to see how different values of ρjk lead themean-variance investor to adopt diversification strategies (i.e. all long positions) orhedging strategies (i.e. long-short positions).

2.1.2 Two-assets case

Consider a market composed only of two assets (a1, a2) with expected returns andvariances respectively denoted as (µ1, µ2) and (σ2

1, σ22). For a non-trivial example let

us suppose that,

µ1 > µ2 > 0 and σ21 > σ2

2

In this scenario there is no stochastic dominance (always assuming a normaldistributions for the assets’ returns) and the mean-variance analysis will providethe desired interesting insights. Assuming a budget constraint (i.e. the investorinvests all the available wealth at his disposal) the portfolio www = (w1, w2) is such thatw1 + w2 = 1 and we can set w = w1 and w2 = 1− w without loss of generality. Theexpressions of the expected portfolio return and portfolio variance can be written as:

µP = µ2 + (µ1 − µ2)wσ2P = w2σ2

1 + (1− w)2σ22 + 2w(1− w)ρ12σ1σ2.

(2.10)

which highlights the linear dependence of µP from the quote w invested in a1and the non-linearity of σ2

P . The plots in Figure 2.1 draw the functions for portfolioreturn and variance against the allocation weight assigned to the first asset. As wecan see the first plot shows the linearity between portfolio return and allocationweights. Indeed the investor wishing to maximize portfolio return, whatever theensuing variance, would allocate all his wealth in the asset yielding the maximumexpected return. Besides, portfolio variance draws a parabola with axis parallel tothe ordinates axis and upward concavity. Indeed, the investor wishing to minimize


portfolio variance, whatever the ensuing expected return, would hold a portfoliolying on the vertex of the parabola. Due to the non-linearity of variance this pointis not necessarily the portfolio concentrating all the initial budget on the minimumvariance asset. As shown by the expression identifying the vertex, it depends on thecorrelation coefficient:

w∗ = σ22 − ρ12σ1σ2

σ21 + σ2

2 − 2ρ12σ1σ2. (2.11)

Letting the correlation coefficient vary, the minimum-variance investor will beprompted to either diversify or hedge. In particular, from 2.11 follows w∗ ≥ 0 ⇔ρ12 ≤ σ2

σ1.

−0.2 0.4 1.0

02

46

810

w

Por

tf ex

pect

ed r

etur

n

−0.2 0.4 1.0

0.0

0.5

1.0

1.5

2.0

w

Por

tf re

turn

var

ianc

e

Figure 2.1. Two-assets portfolio expected return (left) and portfolio return variance withρ ≤ σ2

σ1(right) as functions of w. Red points indicate µ2 and σ2

2 ; blue points indicate µ1

and σ21 ; violet point indicates minimum variance portfolio

Portfolios characterized by values of w less than zero (or equivalently greaterthan one) represent situations of short-selling 1. This means that the investor triesto reduce the overall risk of his investment by holding strongly positively correlatedassets with opposite signs (i.e. hedging), in an attempt to (statistically) offsetsimultaneous unfavorable events. Portfolios characterized by values of w greaterthan zero represent long positions, meaning that the investor hold positive quantitiesof all the assets composing the portfolio (i.e. diversification). It follows that themean-variance investor is induced to diversify as long as there is no strong positive

1The selling of a security that the seller does not own, or any sale that is completed by thedelivery of a security borrowed by the seller. Short sellers assume that they will be able to buy thestock at a lower amount than the price at which they sold short (www.investopedia.com).

2.1 Mean-variance portfolio 7

correlation between the assets, while he prefers to hedge when this positive correlationbecomes more and more clear.

2.1.3 Mean-variance model

The mean variance portfolio is formalized by means a two-step procedure. First, theinvestor draws the efficient portfolios in the plane (µP , σ2

P ) conditioning to a targetvalue µ for portfolio return.

www∗ = argminwww: www′µµµ=µ, www′111=1, www≥0

www′ΣΣΣwww (2.12)

When short selling is allowed, the constraint www ≥ 0 in 2.12 can be removed,yielding the following problem that has an explicit solution:

www∗ = argminwww: www′µµµ=µ, www′111=1

www′ΣΣΣwww

= BΣΣΣ−1111−AΣΣΣ−1µµµ+ µ(CΣΣΣ−1µµµ−AΣΣΣ−1

)/D,

(2.13)

where A = µµµ′ΣΣΣ−1111 = 111′ΣΣΣ−1µµµ, B = µµµ′ΣΣΣ−1µµµ, C = 111′ΣΣΣ−1111, D = BC −A2.The optimal portfolios in 2.13 are paretian optima, in the sense that they are

attainable portfolios whose expected return cannot be better off without makingthe risk worse off and vice versa. For different values of µ they form a continuum ofportfolios usually referred as efficient frontier (see Figure 2.2).

Figure 2.2. Simulated Markowitz efficient frontier with N=8 assets.

Each point on the frontier gives on the abscissa the minimum portfolio risk fora given portfolio expected return on the ordinate. The points inside the frontier


represent the eight single asset portfolios from which the frontier is built. The factthat these points are on the right-side of the graph testify the effect of diversification:the single-asset portfolios can be mixed to create returns with the same or lower riskand the same or higher expected return. The figure also shows the crucial trade-offbetween risk and return. From the most diversified (i.e. minimum risk portfolio)at the left angle of the figure, higher expected returns can be gained only at thecost of greater risk. Once drawn the efficient frontier the investor can switch to thesecond step, that is to pick the portfolio lying on the frontier which best suits hispreferences (in terms of aversion to risk). When a short-selling constraint www > 0 isadded in 2.13 the optimization does not have anymore an analytical solution, butthe efficient frontier can be efficiently computed numerically when the constraintsremain affine.

2.2 The importance of estimation error

Markowitz’s theory assumes known µµµ and ΣΣΣ. That is, in the mean-variance op-timization the process of selecting a portfolio is posed as a problem whose firstand second-order market characteristics are given, and one has only to choose theallocation vector according to the predetermined optimality criteria. Since in practiceµµµ and ΣΣΣ are unknown, a commonly used approach is to estimate µµµ and ΣΣΣ from his-torical data, under the assumption that returns are i.i.d. Under the standard model,maximum likelihood estimates of µµµ and ΣΣΣ are the sample mean µµµ and the samplecovariance matrix ΣΣΣ. Asymptotic results guarantee that under the assumption of anormal model in an infinite sample dataset the above estimators converge to thetrue market parameters. Although asymptotic results can be useful to characterizethe uncertainty of sample estimates, from a practitioner point of view the real issueis obviously the finite-sample performance. Moreover, the reliability of these resultsdeteriorates drastically with the number of assets held in the portfolio. Empiricaltests have found that replacing µµµ and ΣΣΣ by their sample counterparts µµµ and ΣΣΣmay yield bad results and a major guideline in the literature is to find alternativeestimators that produce better portfolios when they are plugged into the optimizers(Chen, 2011). The way how Markowitz portfolio is actually implemented is a crucialissue since mean-variance portfolios are sensitive to small changes in the input dataand errors committed in the estimation step may be multiplied in the allocationresults. This is well documented in the literature. Chopra (1993) shows that even forminor changes in the estimates of expected return or risks MVO can produce verydifferent results. Best (1991) analyzes the sensitivity of optimal portfolios to changesin expected return estimates. Other related literature is in Jobson and Korbie (1981)and Broadie (1993) among others. Michaud (1989) shows the error-maximizationattitude of the mean-variance optimizer, which exacerbates estimation error tendingto overweight overestimated assets and underweight underestimated ones. Moreover,it is widely accepted that most of the estimation risk in optimal portfolios comes fromerrors in the estimates of expected returns, rather than in the estimates of portfoliovariance. To graphically address the sensitivity of the mean-variance optimizationin Figure 2.3 we have randomly generated 100 Monte Carlo time-series of normallydistributed returns and numerically computed the ensuing efficient frontiers replacing

2.3 Bayesian inference for portfolio selection 9

the market parameters by their sample analogues. These estimated frontiers arecompared to the true efficient frontier computed using the true covariance matrixand expected return vector.

10 15 20 25 30 35 40

05

1015

2025

Standard deviation (%)

Exp

ecte

d re

turn

(%

)

sample frontiers

Figure 2.3. Simulated experiment with T = 60 N -dimensional normal observations: 100MC efficient frontiers (blu lines), true frontier (red line)

Summarizing, when an investor runs the mean-variance optimization implementedusing the sample counterparts for the unknown mean and variance of portfolio, heis actually selecting portfolios lying on the estimated frontier (the only availableto him) which is likely to be far from the true efficient frontier. Moreover, thedistance between the estimated and the true frontier increases more than linearlywith respect to the estimation error. Due to the sensitivity of the optimal mean-variance allocation with respect to the inputs, it is evident why estimation error hasplayed such an important role in the related literature. Starting with Markowitzhimself, all authors dealing with portfolio selection seem to agree upon the need ofmarket estimators with an overall better performance with respect to the classicalestimators. In this sense, Bayesian statistics is widely accepted as a convenient toolwhich adds flexibility to the inferential machine and allows to calibrate estimatorsaccording to the specific, real-life application.

2.3 Bayesian inference for portfolio selectionIn the Bayesian framework accounting for the intrinsic uncertainty affecting anyestimation process is naturally assessed. Indeed, within a Bayesian approach theinvestor eliminates the dependence of the optimization problem on the unknownparameters by replacing the true (and unknown) values by a probability distribution,


rather than a point estimate. This probability law depends only on the data theinvestor observes and on personal ex-ante (prior) beliefs the investor may havehad about the unknown parameters before analyzing the experiment. The ensuingportfolio weights derived within this paradigm are optimal with respect to thisparameter distribution but sub-optimal with respect to the true parameter values.However, this sub-optimality is not relevant since the truth is never revealed anyway.To the extent that this parameter distribution incorporates all of the availableinformation (as opposed to just a point estimate), this approach is to many themost appealing. This kind of inferential framework falls into the Bayesian methods.The appellative "Bayesian" was named after Thomas Bayes and his pioneering workpublished in 1763 titled "On An Essay towards solving a Problem in the Doctrine ofChances".

2.3.1 Bayesian theory review

Recalling that the purpose of an inferential analysis is to retrieve causes (formalizedby parameters of statistical models) from the effects (summarized by the observations)(Robert, 2007) the Bayesian scheme implicitly assumes that the most conclusiveanswer to this research has to be a probability law (Piccinato, 2009). Let denote theparameter of interest as θ and the observations as yyy. The output of the bayesianinference has to be a probability distribution on the possible values that θ canassume. Before observing yyy a prior distribution π on θ is assigned. This distributiondescribes the uncertainty around the true value of θ. The prior distribution canbe derived from personal beliefs and/or expert opinions. Once observed the data,through Bayes’ theorem, one actualizes the prior distribution obtaining a posteriordistribution for θ. In this context, Bayes’ theorem allows to combine the informationon θ contained in the prior distribution with that provided by the experiment. Froma Bayesian point of view θ is a random variable with prior distribution π(θ), θ ∈ Θand the data yyy coming from the experiment are like a state variable upon whichto condition the final distribution of θ. The Bayes’ theorem provides the posteriordistribution:

π(θ|yyy) = f(yyy|θ)π(θ)m(yyy) (2.14)

where m(yyy) =∫f(yyy|θ)π(θ) is the marginal density of yyy. Since the derivation of

a posterior distribution is generally done through proportionally relations, Bayes’theorem is often reported in the form

π(θ|yyy) ∝ L(θ|yyy)π(θ) (2.15)

where L(θ|yyy) is the likelihood function.Bayesian inference results particularly interesting in predictive problems, where

the statistician tries to infer conclusion on the realization of a future experimentyyyT+1 governed by the same laws regulating past (observed) data yyy = yyyT , yyyT−1, ....The "parameter" of interest in this case is the future realization yyyT+1. Adopting thebayesian reasoning we have a natural candidate to describe the initial uncertainty onthis quantity, that is the marginal distributionm(yyyT+1) appearing in the denominator

2.4 Allocation as decision 11

of the Bayes’ theorem 2.14 and formalizing the information available on yyyT+1 beforeobserving the data:

m(yyyT+1) =∫f(yyyT+1|θθθ)π(θθθ)dθθθ (2.16)

Once the experiment has drawn, m(yyyT+1) is updated with the new informationcontained in the data. This process is done via the substitution of the prior withthe posterior distribution for θθθ in the equation 2.16:

m(yyyT+1|yyy) =∫f(yyyT+1|θθθ)π(θθθ|yyy)dθθθ (2.17)

Coherently with the Bayesian paradigm, the final predictive distribution reflectsthe uncertainty around the true value of the future experiment, which is treated asa parameter, rather than a naive projection of the sample distribution conditionedon what we have learned on the parameters up to the time when the inference takesplace.

2.4 Allocation as decision

Reformulating portfolio selection in terms of the theory of choice under uncertaintyallows to generalize the mean-variance rule and take advantage of an axiomaticframework to rigorously categorize investor’s preferences. After this generalizationthe Bayesian inference will turn out to be the most coherent strategy to deal withportfolio selection. Previous to the formal introduction of the Bayesian allocationwe briefly review the fundamentals of statistical decision theory.

2.4.1 Choice under uncertainty

The foundations of a decision theoretic approach to statistical problems is formulatedin important manuals such as Wald (1950), Savage (1954), and Ferguson (1967)among others. In its most simple formulation a generic decision problem describesthe following idealized situation: at a given time a person has to choose an elementδ among a set ∆. The choice of δ ∈ ∆ determines a loss Lδ(ω) which evaluates thepenalty associated with decision δ and the realization ω of a random variable Ωcalled state of nature. The state of nature formalizes the uncertainty around thephenomenon affecting the consequence of the decision. In the subjective formalizationof decision theory this quantity is considered a random variable. Then a probabilitylaw P is defined over (Ω,AΩ), where AΩ is an appropriate σ-algebra of subsets of Ω.It is worth noting that the "consequences" arising from the couple (δ, ω) are genericquantities. They can be anything affecting the decision-maker and they are notnecessarily expressed in a numerical scale. I will denote this "primitive" concept ofconsequence as γ = Cδ(ω). Therefore the loss function Lδ(ω) has to be intendedas an application L : Γ → R1 that maps the consequences into a numerical scaleand allows for a formal comparison between two generic consequences. When it ispossible to compare any two elements in a set, then this set is said to contain a totalpreordering. A preordering on a set X is a binary relation R enjoying the followingproperties:


1 xRx for every x ∈ X (Reflectivity)

2 xRx′ and x′Rx′′ ⇒ xRx′′ (Transitivity)

If, in addition to (1) and (2) it satisfies the property:

3 xRx′ and x′Rx⇒ x = x′ (Antisymmetry)

then it is said an ordering. Let assume that Γ contains a total preorderingdenoted as ("weak preference") and the space of all loss function is denoted as L.Then, the relation ≤ ("less or equal to") on L is said to induce the relation on Γ,

γ′ γ′′ ⇔ Lγ′ ≤ Lγ′′ (2.18)

The relation meets a generic concept of preference, but it takes a preciseinterpretation with respect to the losses. That is, the loss function allows toformalize the qualitative concept of preference on the consequences. The left sidepart of the expression 2.18 can be read as "γ′ is preferable at least as much as γ′′", or"γ′ is weakly preferable to γ′′". The wish of the decision-maker is to choose an elementδ∗ such that the corresponding consequence is the most preferable, or equivalently,the corresponding loss function Lδ∗ in Lδ (the set of all possible loss function givenδ) is minimum. Once defined the loss function the search for a minimum withrespect to δ is not a trivial problem since a decision does not determine uniquelythe corresponding loss, which is affected by the state of nature as well. To tacklethe issue of the uncertainty about the state of nature the decision-maker needs tospecify a functional K : Lδ → R1, which reduces the conditional loss to a singlenumber. Finally the decision-maker can determine his optimal decision solving thefollowing minimization problem:

δ∗ ≡ argminδ∈∆

K(Lδ) (2.19)

The functional K is called optimal criterion. As we have seen the classicalframework of decision theory assumes the existence of a loss function through whichthe generic consequences (possibly non numeric) are quantified. As pointed outin Robert (2007) the actual determination of the loss function is often awkwardin practice. This is true especially when ∆ or Ω are set with an infinite numberof elements. In general the quantification of the consequence of each decision ischallenging since it has to do with the human perception of some "loss" or "gain"and this is typically non linear and not intuitive. Another important issue in theframework of decision theory is the determination of the optimal criterion.

Utility theory A solution to determine both the loss function and the optimalcriterion is given by the utility theory developed by Von Neumann (1944). Utility isdefined as the opposite of loss, then the theory could be defined with respect to thetwo measures equivalently. Utility is by far more adopted in economics where the"consequences" are usually measured as "rewards" of an action rather than "losses".As pointed out in Piccinato (2009) the solution to the problem of specifying a utilityfunction (or, equivalently a loss function) can be summarized as follows. Adopting


the subjective framework of decision theory for every δ ∈ ∆ corresponds a particularprobability distribution on Γ, denoted as Qδ; in this context the Qδ are usually calledlotteries, in order to highlight the fact that the final realization is uncertain. Thenassuming a "coherent" set of preferences on the lotteries, a unique utility function isdetermined, which implies the use of the expected value as the optimal criterion.The approach followed by Von Neumann (1944) is to some extent to consider theproblem of determining a utility function from an indirect point of view. Thatis, they first formalize a set of assumptions concerning the utility function whichappeal intuition and can describe a large spectrum of scenarios with reasonableapproximation, second they accept these "rationale" assumptions as axioms andderive the ensuing conclusions about the relative utility function. The main resultof utility theory is described by the expected utility principle which states that whenthe ordering on the preferences satisfies precise requirements of rationality, then thisordering can always be represented by means of the expected value operator. Undercertain circumstances 2 the requirements can be grouped on the following threecategories: (1) algebraic (2) archimedean and (3) of substitution. Requirements (1)contains the properties already stated in statements 2.4.1-2.4.1, namely of reflectivity,transitivity and antisymmetry. They represent natural coherence criteria askingthat a preference has to be indifferent with respect to itself (reflectivity); thatstarting from a three-element chain relation, the relation between the elementspositioning at the extremes can be derived by the relations affecting the commonelement (transitivity); finally that there can never be undecidability (antisymmetry).Requirement (2) reads,

4 if x x′ x′′ then ∃α, β ∈ (0, 1) such that: xαx′ x′ xβx′′

and has the intuitive meaning that if x′′ is less preferable than x′ and if x′ isless preferable than x, then element x′′ will never be so non-preferable respect tox′ to be not used in a mixture with x resulting more preferable than x′. Finallyrequirement (3) reads,

5 if x x′ whatever x′′ ∈ X and ∀α ∈ (0, 1], must be: xαx′′ x′αx′′

Intuitively, if element x is preferable to x′, the preference must be still valid"substituting" a part of x and x′ with any other element x′′. While these requirementsneed necessarily to be satisfied in order to "adoperate" the expected utility principle,numerous others characteristics are usually requested with the aim to provide thebest description of the actual decision-maker’s preference scale (see Castellani G.(2005) for a review of utility functions classes).

2.4.2 Maximum expected utility allocation

Given this theoretical framework it is possible to restate the problem of assetallocation in terms of a decision problem. The action that the investor has to makeis formalized by the vector www ∈ RN and the "consequence" of the action is the final

2when the set of opportunities is composed of random variables with a finite number of determi-nations


wealth W . Denoting the utility function describing the preferences of the investor asU(·), then the problem of selecting the optimal allocation reduces to the followingexpected utility maximization:

www∗ = argmaxwwwE[U(W )] (2.20)

.Given the constraints imposed by the rationality axioms of the expected utility

principle there are infinite choices to describe the preferences of an investor. Avast literature have focused on the derivation of utility functions starting from theimposition of specific desirable properties, such as those referring to the absoluterisk aversion:

r(x) = −U(2)(x)

U (1)(x)(2.21)

where U (k)(·) denotes the k-th derivative. Besides, under fairly mild regularityconditions (i.e. infinite differentiability) a utility function U(W ) can be approximatedby an infinite Taylor expansion around a wealth W0:

U(W ) =∞∑k=0

[U (k)(W0)

k! (W −W0)k]

(2.22)

This result is important since it allows to link the mean-variance criterion toa maximization of expected utility. Indeed, assuming that W is the end-of-periodreturn for an investment, W0 is the expected return, and the investor has quadraticutility reading:

U(W ) = W − a

2W2 (2.23)

,then the derivatives of order higher than two are null and from 2.22 follows that

the investor is concerned only about the expected return and the variance of return,as it is the case in the mean-variance paradigm.

2.4.3 Bayesian allocation decision

The expected utility in 2.20 is taken with respect to the probability distributionfunction of the source of uncertainty in the portfolio choice problem, namely the nextperiod’s returns of the asset composing the portfolio. Then 2.20 can be re-writtenas follows:

www∗ = argmaxwww∫RRRT+1

U(rPT+1)f(rrrT+1|µµµ,ΣΣΣ)drrrT+1 (2.24)

Formally, since the distribution in 2.24 will necessarily depend on the unknownmarket parameters (µµµ,ΣΣΣ) the investor needs to integrate out this dependence toactually compute the expected utility. As we have seen in the classical implementationthe investor naively use estimates of the parameters in place of the true parametervalues, i.e. he ignores estimation risk. The importance of estimation risk in portfolio


optimization has been already discussed, but here we get another insight about thedrawbacks of the classical asset allocation. Indeed, ignoring estimation risk theinvestor also fails to ignore the modification of his probabilistic knowledges aboutnext-period returns, which are different respect to the conditional sample distribution.That is, the uncertainty concerning the true values of the market parameters shouldspread apart probabilities and thickens the tails of the distribution for predictingthe next period assets’ returns. In other words the investor considering marketparameters as unknown, but fixed, entities fail to account for both parameteruncertainty and the ensuing model uncertainty. Both these issues are naturallytaken into account in a Bayesian formulation of the same problem. As we have seenin the Bayesian framework the unknown parameters are a random variable whosepossible outcomes are described by the posterior probability density function π(θθθ|yyy).Now consider the allocation problem 2.24. In order to smoothen the sensitivity ofthe allocation function to the parameters it is quite natural to consider the weightedaverage of the argument of the optimization 2.20 over all the possible outcomes ofthe market parameters:

www ≡ argmaxwww

∫

E[U(rPT+1)]π((µµµ,ΣΣΣ)|yyy)d(µµµ,ΣΣΣ) (2.25)

Consider now the posterior predictive distribution 2.17, which is defined in termsof the posterior distribution of the parameters. It describes the statistical features ofthe market, keeping into account that the value of the parameter is not known withcertainty, i.e. accounting for estimation risk. This thickened predictive distributionmay represents structural uncertainty about bad events. Using the definition ofposterior predictive density in the average allocation 2.25 and exchanging the orderof integration it is immediate to check that the average allocation can be written asfollows:

www∗B = argmaxwww∫RRRT+1

U(rPT+1)f(RRRT+1|ΦΦΦT )dRRRT+1 (2.26)

where ΦΦΦT represents all information available up to time T . Expression 2.26is called Bayesian allocation decision. It can be argued that a rationale decisionmaker chooses an action by maximizing expected utility, the expectation being withrespect to the posterior predictive distribution of the future returns. Because ofcomputational issues and because the moments of the predictive distribution areapproximated by the moments of the posterior distribution, predictive returns areoften ignored and utility is frequently stated in terms of the model parameters.Since an effective determination of the predictive distribution can be obtained onlyallowing for conditioning on the observations, Bayesian inference turns to be themost effective methodology in this context. Polson (2000) highlights the differencebetween the posterior variance and the posterior predictive variance. Kan and Zhou(2007) provide a thorough discussion of the difference of plug-in and Bayes estimatorof the optimal decision under the parameter-based utility. Their discussion highlightsthe difference between a proper Bayes rule, defined as the decision that maximizesexpected utility, versus a rule that plugs in the Bayes estimate for the weights orthe parameters in the sampling distribution.


Table 2.1. True normal parameters for simulated data used in the non-informative analysisallocation example

asset A B C Dmean 2.88 6.50 10.88 16.00

sd 10.00 20.00 30.00 40.000.

00.

20.

40.

60.

81.

0

Val

ue

100 94 89 84 79 74 69 64 59 54 49 44 39 34 29 24

Asset A Asset B Asset C Asset D

Figure 2.4. Bayesian non-informative mean-variance allocation for varying sample size.The rule suggest to allocates increasing wealth on the asset with lowest variance as longas less observations are available

Non-informative analysis In principle the initial probabilities have to constitutethe synthesis of the informations actually available on the unknown parameter, butoften prior information about the model is too vague or unreliable, and then a fullsubjective derivation of the prior distribution is obviously impossible. In generalthere cannot be any rule providing a predetermined mathematical form for theprior distribution. Consequently the prior is usually an element determined caseby case, focusing essentially on the content of the specific problem. With the goalof developing a Bayesian inferential tool independent from the definition of theprior law a major direction in Bayesian literature has focused on the construction ofnon-informative (or reference, default) priors (see for example Jeffreys (1961) andBernardo (2005)). While any development of a non-informative distribution hasnever lead to general consensus Piccinato (2009), there are some relevant results inthis direction (see Robert (2007) for a review).

For T observations of a N -dimensional normal random vector with mean µµµ andcovariance matrix ΣΣΣ,


YYY (t) ∼ Np(µµµ,ΣΣΣ), (t = 1, ..., T ) (2.27)

the probability density function reads,

f(YYY t|µµµ,ΣΣΣ−1) = |ΣΣΣ−1|

12

(2π)p2

exp −12(XXXt −µµµ)′ΣΣΣ−1(XXXt −µµµ), (2.28)

for all t = 1, ..., T + 1. The typical non-informative distribution for this modelreads,

π(µµµ,ΣΣΣ) ∝ |ΣΣΣ|−N+1

2 (2.29)

and the ensuing posterior predictive distribution for rrrT+1 is a multivariatet-student with T −N degrees of freedom and first two moments reading,

E(rrrT+1|ΦΦΦT ) = µµµ

V(rrrT+1|ΦΦΦT ) = T+1T−N−2ΣΣΣ

(2.30)

where (µµµ, ΣΣΣ) are the sample estimators defined as:µµµ = 1

T

∑Tt=1YYY t

ΣΣΣ = 1T−1

∑Tt=1(YYY t − YYY )(YYY t − YYY )′

(2.31)

In this case, the classical and the Bayesian portfolios are not very different fromeach other, but the gap increases as long as the sample dataset gets smaller (seeTable 2.1 and Figure 2.4) reflecting the smooth process of immunization from riskas a response to the increased ignorance on the market generating mechanism. Tothis extent non-informative analysis is interesting since it allows to isolate the effectof the new Bayesian framework on portfolio selection (i.e. the two frameworks differonly at the conceptual level of considering the parameters as random variables, butare comparable otherwise in terms of information inputted in the process) (see Kanand Zhou(2003), Klein and Bawa (1976) and Zellner (1971) for non-informativeBayesian allocation).

Conjugate Analysis Raiffa and Schlaifer (1961) are among the first to useconjugate models for portfolio selection. In a conjugate analysis for the derivationof a prior distribution one recovers to the choice of an element within a specific(i.e. conjugate) parametric class. A class of prior probability distributions is saidto be conjugate to a sample model, if the ensuing posterior distribution is of thesame form as the sample model. This transformation is in general easier to interpretsince it affects only the parameters of the distribution, which are "updated" by thesample evidence. A conjugate model fitting the mean-variance framework is theNormal-inverse-Wishart model (NIW). Denoting as (µ0µ0µ0,ΣΣΣ0) the prior knowledge ofthe market parameters, following Meucci (2006) the distribution of (µµµ,ΣΣΣ) is a Normal-Wishart centered on (µ0µ0µ0,ΣΣΣ0) with uncertainty parameters (T0, ν0). Explicitly, thismeans that the density of µµµ|ΣΣΣ−1 is normal:


π(µµµ|ΣΣΣ−1) = |T0ΣΣΣ−1|12

(2π)p2

exp −12(µµµ−µ0µ0µ0)′T0ΣΣΣ−1(µµµ−µ0µ0µ0), (2.32)

The density is centered on our prior belief:

E(µµµ|ΣΣΣ−1) = µµµ0

and the parameter T0 > 0 indicates the confidence level in the prior expectedvalue: the larger T0, the more confident we are that the true model parameter isclose to our input µµµ0.

As for the covariance matrix ΣΣΣ, we assume that the density of ΣΣΣ−1 is distributedas a Wishart:

π(ΣΣΣ−1) =|12ν0ΣΣΣ0|

ν02 |ΣΣΣ−1|

ν0−p−12 exp −1

2Tr(ν0ΣΣΣ0ΣΣΣ−1)Γp(1

2ν0)(2.33)

This density is also centered on our prior:

E(ΣΣΣ−1) = ΣΣΣ−10 ,

and the parameter ν0 > p−1 indicates the confidence level in the prior covariance:the larger ν0, the more confident we are that the true model parameter ΣΣΣ is close toour input ΣΣΣ0. The NIW prior distribution can be written as follows:µµµ|ΣΣΣ ∼ Np(µµµ0,

ΣΣΣT0

)

ΣΣΣ−1 ∼Wp(ν0,ΣΣΣ(−1)

0ν0

)(2.34)

The posteriors for the parameters µµµ|ΣΣΣ and ΣΣΣ−1 owe to the same class of thepriors, but with updated hyper-parameters:

T1 = T0 + T

µµµ1 = µµµ0T0+xxxTT0+T

ν1 = ν0 + T

ΣΣΣ1 = ν0ΣΣΣ0 + TS + TT0T+T0

(xxx−µµµ0)(xxx−µµµ0)T

(2.35)

The posterior mean µµµ1 shows how an investor may include in his decision modeladditional views about the generating process underlying the market and obtainposterior distributions which blend this new information with that coming from thesample data, and smoothly averaging it with respect to the relative confidence inthese views.

While the recurse to conjugate analysis cannot be always justified from a con-ceptual point of view, in practice it constitutes an effective tool to account forestimation risk and obtain flexible allocation, suiting the confidence of the investorin his informations about the market. Moreover, in real-life application the investorusually needs to find optimal portfolios comprising many asset with insufficientinformation. To cope this practical issue interesting researches have been proposed.The goal here is to achieve a working solution, with the scarce information avail-able. For example Frost and Savarino (1986) assume a conjugate prior distribution


0 20 40

05

1015

2025


Exp

ecte

d re

turn

(%

)sample frontiers

0 20 400

510

1520

25


Exp

ecte

d re

turn

(%

)

bayes frontiers

Figure 2.5. Simulated experiment with T = 60 N -dimensional normal observations: 100Monte Carlo sample (right) and Bayesian (left) estimated frontiers (blue lines), trueefficient frontier (red lines). The Bayesian estimated frontiers are the result of theconjugate patterned prior of Frost&Savarino.

on the market parameters which impose identical means, variances and patternedcovariances for all assets’ returns distributions, leading to a drastic reduction of theparameters to estimate 3. These authors have showed that such a prior improvesout-of-sample performance. In Figure 2.5 the Bayesian estimated frontiers ensuingfrom the model of Frost and Savarino are compared to those ensuing from the use ofsample estimators, highlighting the great improvement gained in terms of stabilityof the mean-variance allocations.

Following this direction, there have been numerous purposes for prior distributionsimposing a specific, homogeneous pattern for the parameters. These approachesaim to super-impose a predefined structure, which contains the minimum neededinformation to yield a relevant reduction to the complexity of the problem (i.e. thenumber of parameters to estimate) and achieve more stable results, with a modest,controlled, biasness. This is the well-known trade-off between the efficiency and theunbiasedness of an estimator, which in the Bayesian context can be calibrated tosuit specific, real-life applications.

3The number of parameters to estimate in a mean-variance context increase more than linearlywith the number of assets since a covariance matrix has n(n+1)

2 unique parameters


2.4.4 Bayesian paradigm justification

As suggested in Robert (2007) the impact of Bayes’ theorem is based on the novelmove that puts experimental data and parameters on the same conceptual level,since both of them are considered random entities. This is indeed revolutionaryand has lead to a clean fracture in the statistical research between those gettingadvantage of the "rediscovered" framework and those who refuse it to some extent.The latter often appeal to the apparent lack of objectivity of the Bayesian paradigm,caused by the introduction of a "subjective" distribution into the inferential process.Undoubtedly, the definition of the prior is one of the most critical point of Bayesianstatistics. As we have discussed previously, usually the information contained in thesample data is not sufficient to achieve a working stability for the results and an apriori structure imposition generate a flexible framework potentially yielding betterresults in terms of overall performance of an estimator. This point is relevant infinancial decisions and is well summarized by a passage in Weitzman (2007):

"[...] The peso problem is defined as the financial equilibrium of asmall-sample situation having a remote chance of a disastrous out-of-sample happening. In a peso problem, possible future realization oflow-probability bad events that are not included in the too-small sampleare taken into account by real-world investors who conjecture on thetrue data-generating process. Naturally, these rare out-of-sample disasterpossibilities are not calibrated since there is no historical data to reallyaverage them. A Bayesian translation of a peso problem is that thereare no enough data to build a reliable posterior distribution based solelyupon sample frequencies (i.e. a posterior that is independent of imposedpriors). In a Bayesian-learning equilibrium where hidden structuralparameters are evolving stochastically, it turns out that asset pricesalways depend critically upon subjective prior beliefs and there are neverenough data on frequencies of rare tail events for asset prices to dependonly upon the empirical distribution of past observations [...]"

Having too-small sample dataset is the norm rather than the exception in financeand frequentist estimators’ performance strictly rely on long-run (i.e. asymptotic)results. While a frequentist investor may claim the increasing availability of financialdata, it is usually neglected that such long time series hardly satisfy the requiredstationarity conditions and they should be processed either by fitting a more complextime-varying model (e.g. GARCH models) or considering the sample as a collectionof smaller datasets with different probability properties. To reinforce the argumentsin favor of a Bayesian "treating" of the inferential problems, it should be recalled thatfollowing the statistical decision framework presented in Subsection 2.4.1, frequentistoptimal criteria naturally lead to adopt Bayesian decisions. That is, in a decision-theoretic framework every "good" decision turns out to be a Bayesian decision, evenwhen there is no sufficient prior knowledge to formalize ex-ante a fully subjectivedistribution and an objective ad-hoc one is used just as a technical tool. Moreoverthe critics moved on the Bayesian reasoning usually focus on inevitable deficiencies ofany inferential procedure and which are common to every statistical toolkits, ratherthan being a prerogative of the Bayesian one. For instance the criticisms which find


inadmissible the introduction of external elements, along with the experiment, inthe inferential process seem somewhat futile since neither the experiment alone issufficient to reach actual, quantitative conclusions about anything without a formaland "subjective" definition of the overall inferential mechanism (i.e. the selection ofthe sample model) (Piccinato, 2009).

23

Chapter 3

Non-normal financial markets

To reconcile expected utility maximization with mean-variance optimization, it isusually assumed that financial markets follow a multivariate gaussian distribution.The standard model for financial returns is a sufficient approximation for a largeclass of financial phenomena, such as regional stock indexes, mutual funds at longinvestment horizons, etc. The mean-variance framework is still a good approximationfor elliptical markets, that is a class of probability laws generalizing the multivariatenormal case. An elliptical distribution retains the important feature of beingcompletely characterized by a location and a scale parameter, while they do notnecessarily need to correspond to the mean vector and covariance matrix of thedistribution, as in the multivariate normal case. In this cases the mean-variance andutility optimizations are compatible, no matter the shape of the utility function, sincethe infinite-dimensional space of moments is reduced to a two-dimensional manifoldparametrized by expected value and variance. Nevertheless, the assumption that amarket is elliptical is very strong. For instance, in high-frequency, derivative or hedgefund markets, the elliptical assumption cannot be accepted. Besides, with skewedmarkets the investor preferences are no more well described by a quadratic utility andin general the investor would care about higher-order moments of the distributionof the portfolio return, such as skewness. For example, in non-elliptical markets,for a given variance, an investor may further trade expected return with positiveskewness, like buying a lottery. Classical mean-variance portfolio optimization doesnot provide solutions in the presence of skewness in the market. In a recent studyXiong (2011) point out that most asset classes returns are not normally distributed,but the typical Markowitz Mean-Variance Optimization (MVO) framework that hasdominated the asset allocation process for more than 50 years relies on only the firsttwo moments of the return distribution. Equally important, considerable evidenceshows that investor preferences go beyond mean and variance to higher moments:skewness and kurtosis. Investors are particularly concerned about significant losses,that is, downside risk, which is a function of skewness and kurtosis. Empirically,almost all asset classes and portfolios have returns that are not normally distributed.Many assets’ return distributions are asymmetrical. In other words, the distributionis skewed to the left (or occasionally the right) of the mean (expected) value. Inaddition, most asset return distributions are more leptokurtic, or fatter tailed, thanare normal distributions. The normal distribution assigns what most people would

24 3. Non-normal financial markets

characterize as meaninglessly small probabilities to extreme events that empiricallyseem to occur approximately 10 times more often than the normal distributionpredicts. Many statistical models have been put forth to account for fat tails. Well-known examples are the Lévy stable hypothesis (Mandelbrot, 1963), the Student’st-distribution, and a mixture-of-Gaussian-distributions among others. Summarizing,the strong empirical evidence against the normality of the returns suggests that theassumption of elliptically distributed asset returns is empirically violated and themean-variance analysis needs to be extended. In particular here the focus is given tothe asymmetry in the return distribution, which will have an impact on the portfolioselection task for the investors that have preference for skewness. Several researchershave proposed advances to the traditional mean variance theory in order to includehigher moments in the portfolio optimization task (see Athayde (2004) for example).

3.1 Skewness and portfolio selection

Although evidence of skewness and other higher moments in financial data areabundant, it is common for skewness to be ignored entirely in practice. Typicallyskewness is ignored both in the sampling models and in the assumed utility functionswhile it can be claimed that it actually adds a dimension to the mean-varianceframework developed by Markowitz. Consider the two-asset example proposed inthe previous chapter. For each of these two asset portfolios the mean and standarddeviation behave as we would expect, the linear combination of the mean and themean of the linear combination are equal and the linear combination of the standarddeviation is greater than the standard deviation of the linear combination (i.e. themean is additive and the standard deviation is sub-additive). The skewness is adifferent matter, as the skewness of the linear combination can be above or belowthe linear combination of the skewness. This suggests that an investor that isinterested in skewness must consider an "extended efficient frontier" which includesthe additional dimension of skewness. Indeed, empirically there is strong evidencethat skewness matters in portfolio selection (Harvey, 2010). One effective methodto measure the consequence of including higher moments in the asset allocationdecisions with respect to the classical mean-variance criterion, is to approximate theexpected utility by a Taylor expansion as in 2.22. In particular, as long as concernskewness, it is possible to consider an investor with cubic utility, which reduces theinfinite Taylor expansion to a third-order approximation,

U(W ) = U(W ) +3∑

k=1

[U (k)(Wk! (W − W )k

](3.1)

easy to integrate and to compare with the second-order counterpart. Applyingthe expectation operator to both sides of 3.1 and assuming that the investor’s"wealth" is completely determined by the end-of-period rate of return rPT+1 we obtainthe following expression:

E[U(rPT+1)] = U(E(rPT+1)) +U (2)(E(rPT+1))

2 µ(2) +U (3)(E(rPT+1))

6 µ(3) (3.2)

3.2 Skewed-elliptical models 25

where µ(i) is the i-th centered moment of rPT+1. According to the prescriptions ofa Bayesian allocation decision, the expected value in 3.2 is taken with respect to theposterior predictive distribution of rrrT+1. Denoting as mp, Vp and Sp the predictiveexpected value, covariance matrix and third-order tensor matrix of next-periodreturns for the assets composing the portfolio, the first three moments of rPT+1 canbe written as linear combinations of mp, Vp, Sp and the allocation vector www and theexpected utility ensuing from a third-order Taylor expansion approximation is givenby:

E(U(rPT+1)|ΦΦΦT ) = www′mp −U (2)(rPT+1)

2! www′Vpwww +U (3)(rPT+1)

3! www′Sp(www ⊗www) (3.3)

.where ⊗ denotes the kronecker product. Thus the expected utility is related to the

investor’s preferences (or aversions) towards the second and third (central) momentof the predictive distribution, whose contribute is directly given by derivatives ofthe utility function. Scott and Horvart (1980) have put forward that, under theassumption of positive marginal utility, decreasing absolute risk aversion at all wealthlevels together with strict consistency for moment preference, one has:

U (k)(W ) > 0 ∀W if k is oddU (k)(W ) < 0 ∀W if k is even

Therefore 3.3 can be conveniently rewritten in terms of a risk aversion coefficient(γ) and a preference for skewness coefficient (λ), so that the ensuing optimizationproblem reads:

maxwww′111=1 www>0

www′mp −γ

2www′Vpwww + λ

6www′Sp(www ⊗www) (3.4)

.

3.2 Skewed-elliptical models

Skewness is an indicator of the asymmetry of a distribution. In a financial returndistribution it is a fundamental feature since it can indicate a non-negligible likelihoodfor extreme events such as economic downturns (or upturns). The existence ofskewed distributions in the financial markets makes unreasonable to rely solely onsymmetrical laws which do not capture potential skewness revealed by the data. Aspointed out in Sahu (2003) the class of multivariate elliptical skewed distributionsis convenient to model financial markets, which are known to be non-normal andaffected by turbulence in both the left and right side. Azzalini (1996) was amongthe first to study extensions of the normal distribution incorporating skewness.Since then there has been a vast literature studying skewed models and proposingeffective solutions to their definition and implementation (see Ferreira (2007) forrecent developments). One effective class of skewed distributions has been developedin Sahu (2003). The distributions within this class have the fundamental advantageto be practically implementable, since they are defined with a convenient hierarchical


structure. The definition of the elements within this class is based on the distributionof a generic multivariate symmetric elliptical random vector. Suppose ΩΩΩ is a d× dpositive definite matrix, θθθ is a vector in Rd. The random vector XXX is ellipticallydistributed with location parameter θθθ and scale parameter ΩΩΩ:

XXX ∼ El(θθθ,ΩΩΩ; g(d)) (3.5)

if its probability density function reads,

f(xxx|θθθ,ΩΩΩ; g(d)) = |ΩΩΩ|−12 g(d)

[(xxx− θθθTΩΩΩ−1(xxx− θθθ))

], xxx ∈ Rd (3.6)

where g(d)(u) is an application g : R+ → R+ defined by:

g(d)(u) =Γ(d2)πd2

g(u; d)∫∞0 r

d2−1g(r; d)dr

(3.7)

and where g(u; d) is a function g : R+ → R+ such that the integral in thedenominator of 3.7 exists. Then a skewed random vector YYY within the class of Sahuis defined as:

YYY = DDDZZZ + εεε (3.8)

where ZZZ ∼ Eld(000, III; g(d))εεε ∼ Eld(µµµ,ΣΣΣ; g(d))

(3.9)

D is a d× d (diagonal) matrix controlling the skewness of the distribution and(µµµ,ΣΣΣ) are the canonical location and scale parameters for the underlying symmetricelliptical distribution. Marginalizing YYY with respect to ZZZI(ZZZ>0) we obtain the desiredgeneric multivariate skewed elliptical distribution:

YYY ∼ SEd(µµµ,ΣΣΣ,DDD; g(d)) (3.10)

3.2.1 Skewed-normal model

To obtain the skewed normal distribution it is sufficient to set

g(d)(u) = (2π)−d/2 exp (−u/2) (3.11)

in 3.9, which equals to assume a multivariate normal distribution on both theerror term εεε and the latent variable ZZZ in 3.8. Marginalizing with respect to thedistribution of ZZZ truncated in the positive real axis we have (see Sahu (2003) for ademonstration),

YYY ∼ SN(µµµ,ΣΣΣ,DDD) (3.12)

In the original derivation of Sahu et Al. (2003) the matrix DDD is diagonal (fora generalization to a full-rank matrix see Harvey (2010)). Here we will retain theoriginal definition with a diagonal skewness matrix denoted asDDD(δδδ). A d-dimensional

3.3 Simulation-based inference 27

random vector YYY follows an m-variate skewed-normal distribution if its pdf is givenby,

f(yyy|µµµ,ΣΣΣ,DDD(δδδ)) =2d|ΣΣΣ +DDD(δδδ)2|−12φd

[(ΣΣΣ +DDD(δδδ)2)−

12 (yyy −µµµ)

]×

Φd

[(III −DDD(δδδ)′(ΣΣΣ +DDD(δδδ)2)−1DDD(δδδ))−

12DDD(δδδ)′(ΣΣΣ +DDD(δδδ)2)−1(yyy −µµµ)

](3.13)

where φd is the multivariate normal density function with mean zero and identitycovariance, and Φd is a multivariate normal cumulative distribution also with meanzero and identity covariance. The mean and the covariance matrix are given by,E(YYY ) = µµµ+

√2πδδδ

C(YYY ) = ΣΣΣ + (1− 2π )DDD(δδδ)2

(3.14)

It is noted that when δδδ = 0, the SN distribution reduces to usual normaldistribution. Concerning the third order of the SN distribution, when DDD is diagonal,it can be represented by a d× d2 tensor matrix with non-zero entries only for the(iii)-th coordinates, given by,

s(3)iii (YYY ) = −

√2δ3i (π − 4)√π3

(3.15)

The SN model has the desirable property that marginal distributions of subsetsof skew normal variables are skew normal (see Sahu (2003) for a proof). Unlike themultivariate normal density, linear combinations of variables from a multivariateskewed normal density are not skew normal. While the skew normal is similarin concept to a mixture of normal random variables, it is fundamentally different.A mixture takes on the value of one of the underlying distributions with someprobability and a mixture of normal random variables results in a Lèvy stabledistribution. The skew normal is not a mixture of normal distributions, but it isthe sum of two normal random variables, one of which is truncated, and results ina distribution that is not Lèvy stable. Though it is not stable, the skew normalhas several attractive properties. Because it is the sum of two distributions, it canaccommodate heavy tails for instance and the marginal distribution of any subsetof assets is also skew normal. This is important in the portfolio selection settingbecause it insures consistency in selecting optimal portfolio weights. For example,with short selling not allowed, if optimal portfolio weights for a set of assets aresuch that the weight is zero for one of the assets then removing that asset fromthe selection process and re-optimizing will not change the portfolio weights for theremaining assets.

3.3 Simulation-based inferenceThe resort to simulation-based algorithms is dictated by the complexity of theposterior distributions ensuing from the assumption of a skewed-normal model forthe sample data. The main idea behind the simulation-based algorithms may beresumed by the brief historical note reported by Andrieu (2003):


While convalescing from an illness in 1946, Stan Ulam was playingsolitaire. It, then, occurred to him to try to compute the chances that aparticular solitaire laid out with 52 cards would come out successfully.After attempting exhaustive combinatorial calculations, he decided to gofor the more practical approach of laying out several solitaires at randomand then observing and counting the number of successful plays. Thisidea of selecting a statistical sample to approximate a hard combinatorialproblem by a much simpler problem is at the heart of modern MonteCarlo simulation.

Formally, the idea of Monte Carlo simulation is to draw an IID set of samplesx(i)Ni=1 from a target density π(x) defined on a high-dimensional space X (e.g. thesupport of the random variable with pdf π(·)). These N samples can be used toapproximate the target density with the following empirical point-mass function

π(x)N = 1N

∑i

δx(i)(x) (3.16)

where δx(i)(x) denotes the delta-Dirac mass located at x(i).Simulation-based approaches turn out to be preferable to deterministic numerical

techniques when the researcher needs to study the details of a likelihood surfaceor posterior distribution, or needs to simultaneously estimate several features ofthese functions. The fundamental ingredient in the Monte Carlo simulation is theability to draw uniform pseudo-random values, a feature by now implemented inmost computer packages. Indeed, starting from this basic random simulation it ispossible to derive draws from the most common probability distribution functionsince those distributions can be represented as a deterministic transformation ofuniform random variables. Nevertheless there are many distributions from which itis difficult, or even impossible, to directly simulate starting from uniform randomdeviates. Moreover, in some cases, we are not even able to represent the distributionin a usable form, such as a transformation or a mixture. In such settings it is possibleto turn to another class of simulation techniques, i.e. an extension of MC methods.This class is named MCMC (Markov Chain Monte Carlo) and adopt the followingstrategy: it generates samples x(i) while exploring the state space X using a Markovchain mechanism. This mechanism is constructed so that the chain spends moretime in the most important regions. In particular, it is constructed so that thesamples x(i) mimic samples drawn from the target distribution π(x).

3.3.1 Gibbs sampler

The use of Markov Chain Monte Carlo (MCMC) is most often in cases where the con-struction of an IID sample of points under a target distribution π is impossible. Thena MCMC algorithm points to generate values from an arbitrary target distribution,starting from the trajectory of Markov chain whose stationary distribution is indeedthe target function. The main idea underlying MCMC is to choose an easy-to-handleproposal distribution, simulate from it and accept or reject the proposed value de-pending on how likely is the event that the value has been generated from the targetdistribution. This scheme is very general and its most popular implementation is

3.3 Simulation-based inference 29

known as the Metropolis-Hastings algorithm (MH). MH imposes minimal regularityconditions on both the target and the proposal distribution in play, which make ita universal algorithm, applying to a multitude of scenarios. Almost every MCMCalgorhitm can be thought as a special case of MH. Besides, the use of more particularmethods may be preferable due to their peculiarities which may better suit theproblem under exam. This is the case of the Gibbs sampler which is particularlyconvenient for the inference of multivariate Bayesian hierarchical models. Supposewe want to draw simulations from a multivariate random vector:

XXX = (X1, ..., Xp) (3.17)

where the Xi’s are either uni- or multidimensional. Moreover we are not able todraw simulations from its probability density function f , while we do from the fullconditionals fi(·) (i = 1, ...p) ofXXX, defined as the probability density functions of thesingle components in the random vector conditional on all the remaining elements:

Xi|x1, ..., xi−1, xi+1, ..., xp ∼ fi(xi|x1, ..., xi−1, xi+1, ..., xp), i = 1, ..., p (3.18)

Following Robert (2004) the associated Gibbs sampler is given by:

• Given xxx(t) = (x(t)1 , ..., x

(t)p ) generate

1 XXX(t+1)1 ∼ f1(x1|x(t)

2 , ..., x(t)p );

2 XXX(t+1)2 ∼ f2(x2|x(t+1)

1 , x(t)3 , ..., x

(t)p ),

...

p XXX(t+1)p ∼ fp(xp|x(t+1)

1 , ..., x(t+1)p ).

The transition from XXX(t) to XXX(t+1) described by this algorithm builds a Markovchain whose stationary probability distribution π exists by construction and π ∼ f .Under fairly general conditions, the chains produced by this algorithm are ergodic,therefore the use of the chain is fundamentally identical to the use of an IID samplefrom f in the sense that the empirical average converges to the actual expectedvalue:

1T

∑h(X(t)) ≈ E[h(X)] (3.19)

The actual convergence of the chain to this target distribution is guaranteed byspecific properties of the chain such as the irreducibility, recurrence, and aperiodicity,which can be tested empirically using the R package CODA.

3.3.2 Sampling from the Bayesian skewed-normal model

Since the parameters in 3.13 are not known in practice they should be consideredas random variables in order to account for the uncertainty about their true valuesin the optimization process. For this task one possible solution is to build aconjugate Bayesian model with low confidence in the prior parameters (unless


subjective information is available). Let (βββ,ΣΣΣ) be the parameters of interest, whereβββ′ = (µµµ′, vec(DDD)′) and vec(·) forms a vector by stacking the columns of a matrix. Ina non-informative setting the conjugate model may read:

βββ ∼ Nd(d+1)(000, 100IIId(d+1))ΣΣΣ ∼ IW (d, dIIId)

(3.20)

The ensuing posterior distributions [βββ|yyy], [ΣΣΣ|yyy] are not analytically tractable,but they can be approximated by means of stochastic simulation methods. In aGibbs sampling framework what is needed to draw multivariate draws from thisrandom vector is the ability to draw from the full conditionals [βββ|yyy,ΣΣΣ] and [ΣΣΣ|, yyy,βββ].Combining the posterior distribution it is possible to obtain an estimate for themean, the covariance matrix and the (iii)-th entries of the tensor matrix usingformulas 3.14 and 3.15. Since in a Bayesian framework the expected utility is takenwith respect with the posterior predictive distribution what is actually needed to runthe optimization are the predictive moments mp, Vp and Sp, which can be writtenin terms of the posterior means of the parameters as shown in Harvey (2010):

mp = m

Vp = V + V(m|yyy)Sp = S + 3E(V ⊗m|yyy)− 3E(V |yyy)⊗mp − E[(m−mp)⊗ (m−mp)|yyy]

(3.21)

Due to the non-standard form of the likelihood for the SN model the posteriordistributions cannot be directly simulated by a common random generator andone would need to approximate it with an embedded MCMC step within the mainiteration cycle 1. Nevertheless, the full conditionals become known probabilitylaws (due to the conjugacy) conditioning them on the auxiliary random variable ZZZ,which is the reason why the distribution within the class of Sahu (2003) are easy toimplement. The Gibbs sampler algorithm presented above can be generalized by a"demarginalisation" (Robert, 2004) construction. This practice can be referred asdata augmentation and it represents the fundamental idea behind the constructionof the elliptical-skewed class. Given two probability density functions f and g suchthat ∫

Zg(x, z)dz = f(x) (3.22)

density g is said the completion of f and it can be chosen in such a way thatthe full conditionals of g are easy to simulate from. Following this scheme it isstraightforward to obtain draws from the marginal posteriors of βββ and ΣΣΣ.

1These methodologies can be referred as hybrid Gibbs samplers

31

Chapter 4

Hedge fund portfolioapplication

This example wants to address the new perspectives opened by the simulation-based methods to solve portfolio optimization problems that adequately account forestimation risk and non-normal distribution of the reference market. The datasetused has been downloaded from the HFRI website and is composed of HFRI hedgefund’s indices. The methodology to construct the HFRI Hedge Fund Indices is basedon defined and predetermined rules and objective criteria to select and rebalancecomponents to maximize representation of the Hedge Fund Universe 1.

4.1 Data description

Hedge Funds are private investment vehicles where the manager is free to operate ina variety of markets using investment strategies not restricted in short exposuresor leverage. a traditional mutual fund can be characterised as operating in equityand/or bond markets, having a buy and hold strategy and no leverage. Hedgefunds offer more variety than a traditional mutual fund and therefore the hedgefund universe is usually segmented in styles. The hedge fund’s return generatingprocess is strictly linked to both the location and the style or strategy followed bythe manager. Due to the strategies used, hedge funds returns exhibit typically strongdeviations from the normal distribution. Here we consider four indexes provided byHFRI which represent different investment strategies. The dataset contains monthlyperformance observations from August-2001 to August-2011.

HFRX Equity Hedge Index This index is a proxy for the performance of hedgefund’s managers who maintain positions both long and short in primarily equity andequity derivative securities. EH managers would typically maintain at least 50%exposure to, and may in some cases be entirely invested in, equities, both long andshort.

1Refer to https://www.hedgefundresearch.com/ for more details on the data used in this study.

32 4. Hedge fund portfolio application

HFRX Event-Driven Index This index is a proxy for the performance of hedgefund’s whose managers maintain positions in companies currently or prospectively in-volved in corporate transactions of a wide variety including but not limited to mergers,restructurings, financial distress, tender offers, shareholder buybacks, debt exchanges,security issuance or other capital structure adjustments. Security types can rangefrom most senior in the capital structure to most junior or subordinated, and fre-quently involve additional derivative securities. Event Driven exposure includesa combination of sensitivities to equity markets, credit markets and idiosyncratic,company specific developments.

HFRI Relative Value Index This index is a proxy for the performance ofhedge fund’s whose managers maintain positions in which the investment thesisis predicated on realization of a valuation discrepancy in the relationship betweenmultiple securities. Managers employ a variety of fundamental and quantitativetechniques to establish investment theses, and security types range broadly acrossequity, fixed income, derivative or other security types.

HFRI Macro Index This index is a proxy for the performance of hedge fund’swhose managers trade a broad range of strategies in which the investment process ispredicated on movements in underlying economic variables and the impact these haveon equity, fixed income, hard currency and commodity markets. Managers employ avariety of techniques, both discretionary and systematic analysis, combinations oftop down and bottom up theses, quantitative and fundamental approaches and longand short term holding periods.

Time

EqH

2002−01−01 2010−01−01

−10

0

Time

Rel

Val

2002−01−01 2010−01−01

−8

0

Time

EvD

riven

2002−01−01 2010−01−01

−8

0

Time

Mac

ro

2002−01−01 2010−01−01

−2

26

Figure 4.1. Hedge Fund Strategies Rates-of-Return from August-2001 to August-2011

4.1 Data description 33

4.1.1 Univariate statistics

We first investigate the statistical properties of the marginal distributions for thefour time series of hedge fund performance data. In Table 4.1 we have reported themean, standard deviation and skewness of the single variables. Except for the Macrostrategy they all exhibit asymmetry on the left tail, that is there is an "abnormal"concentration of very poor performances with respect to exceptional ones. In otherterms the first three distributions are left-skewed, which means that an extremelow performance is far more likely than an extreme high performance. Recall thatthe dataset covers the turbulent period of the global financial recession, then theseresults are not surprising. On the other hand the Macro strategy exhibit a slightright-skewed distribution, which is the result of a good response to the crisis, wherethe index does not fall down dramatically. Asymmetries are rather common infinancial markets and especially in the hedge fund environment, given that thestrategies are usually built to outperform (rather than mimic, as it is often the casefor a mutual fund for example) a benchmark index and this is done biasing theallocations, following thesis which try to discover hidden values and inevitably leadto strong departures from neutral positions.

Table 4.1. Mean, standard deviation and skewness for monthly rate-of-return time seriesfrom August 2001 to August 2011

EqH RelVal EvDriven Macromean 0.44 0.54 0.61 0.65

sd 2.44 1.38 2.02 1.53skewness -0.97 -2.85 -1.25 0.22

In Figure 4.2 the histograms clearly show these features and the ensuing poorfitting provided by the normal laws on the tails of the empirical distributions.

To test the normality in Table 4.2 we have reported the Shapiro-Wilk statisticand the relative p-values, which test the null hypothesis that a sample came from anormally distributed population. Given the p-values close to zero for the first threedistributions we can fairly reject the assumption of normality, while this cannot berejected for the Macro index. The last distribution is indeed the one which betterresemble a bell curve and the (positive) asymmetry is not so pronounced to return anegative normality test. Nevertheless, considered jointly, the non-normality is ratherpronounced in this dataset.

Table 4.2. Shapiro-Wilk normality test

Statistic p-valueEqH 0.9479 0.000142

RelVal 0.7614 9.656× 10−13

EvDriven 0.9261 5.140× 10−6

Macro 0.9238 0.8712


EqH

Den

sity

−10 −5 0 50.

000.

20

RelVal

Den

sity

−8 −4 0 2 4

0.0

0.4

EvDriven

Den

sity

−8 −4 0 4

0.00

0.25

Macro

Den

sity

−2 0 2 4 6

0.00

0.20

Figure 4.2. Univariate histograms, kernel densities (blue solid line) and fitted normaldensities (red dashed lines)

The QQ-plots corroborate the results obtained with the normality tests.

4.1.2 Multivariate statistics

To test the likelihood of the assumptions underlying the mean-variance framework itis necessary to go beyond the univariate statistics and check the way how the samplesco-move together. In Figure 4.4 the multivariate normality is tested graphicallyusing a QQ-plot for the Mahalanobis distance d2 = (xxx − µµµ)′ΣΣΣ−1(xxx − µµµ), which inthe case of multivariate normality has an asymptotic chi-square distribution. Inaddition a multivariate version of the Shapiro test is provided in Table 4.3. Boththe graphical and the hypothetical test lead to amply reject the multivariate normaldistribution.

Table 4.3. Multivariate Shapiro test

Statistic p-value0.7319 1.424× 10−13

To further investigate the multivariate characteristics of the sample in 4.5 wehave reported the fitted bivariate normal level curves against the actual data points.As we can see the bivariate normal level curves fail to include many points, whichlie in the corners of the graphs. In particular the scatterplots including the Macro

4.2 Model implementation 35

−2 0 1 2

−10

0

EqH

Theoretical Quantiles

Sam

ple

Qua

ntile

s

−2 0 1 2

−8

0

RelVal


Sam

ple

Qua

ntile

s

−2 0 1 2

−8

0

EvDriven


Sam

ple

Qua

ntile

s

−2 0 1 2−

22

6

Macro


Sam

ple

Qua

ntile

s

Figure 4.3. QQ-plots for testing univariate normality

index contain points in the top-right corner, meaning that positive extreme eventsfor this index are correlated with the exceptional performance of the remainingindexes. Moreover, the graphs not including the Macro index exhibit the oppositebehavior, that is a left-tail dependence. This can be explained by the existence ofco-skewness and/or much fatter tails than the normal approximation. Thereforethe graphical tool may suggest to consider more complex models, accounting forco-skewness and/or kurtosis in the sample data. These models will not be consideredhere.

4.2 Model implementation

The empirical evidence of a non-normal market is formalized adapting the skewed-normal model 3.12-3.13 to the data. In order to account for estimation risk andcoherently base the final optimal decision on the (posterior) predictive distribution ofthe investor’s objective we choose to use a Bayesian model within the non-informativeframework 3.20. To obtain the needed posterior distributions for the parametersof interest we implement the model using the BUGS software. We ran a total of150000 iterations, discarding the first fifty thousand (burn-in = 50000) and pickingone iteration out of five (thin = 5), for a total of 20000 final draws from the posteriordistributions. Two chains are run in parallel in order to check for convergence. InFigures 4.6-4.7 we have reported the traceplots for the couples of parallel chainsobtained for the posterior mean [mmm|yyy] and variances [VVV |yyy] of the model. The chainsare obtained combining the chains of µµµ and DDD according to the formulas for the


0 5 10 15

010

2030

40

Theoretical quantiles χ2ν=4

Sam

ple

quan

tiles

Figure 4.4. QQ-plot for the Mahalanobis distances in the sample data

moments of the skewed-normal distribution given in 3.14. In both figures the parallelchains are fairly identical indicating that they "forgot" their initial values (i.e. thechains suggest recurrence). Besides, the chains exhibit a good mixing indicatingthat they switch from one state to another easily, possibly visiting all the importantregions of the definition set (i.e. the chains suggest irreducibility). These graphicaldiagnostics are important to assess the convergence of the MCMC algorithm andjustify the use of the simulated draws as an IID sample from the posterior distributionof interest. Along with the graphical diagnostics there exist more quantitative toolsto assess the convergence of the algorithm. These statistical tests are provided in theAppendix A while the summaries of the MCMC posteriors are listed in Table 4.4.

4.3 Portfolio weights

Since we are primarily concerned in measuring the effect of including asymmetryin the asset allocation, we assume a third-order expected utility as presented in3.4. The predictive moments needed to compute such expected utility are retrievedfrom the (MCMC) posterior means using formulas 3.14-3.15 and 3.21. The expectedutility is then maximized using numerical methods implemented in the software R2. In Figures 4.10a-4.10b and 4.11a-4.11b we have reported the barplots indicatingthe maximum expected utility allocation for an investor with fixed risk aversioncoefficient and varying preference for skewness for different values of the risk aversioncoefficient. The left-most bar corresponds to the allocation with zero preference

2the package used to run the non-linear constrained optimization is named alabama

4.4 Out-of-sample performance 37

−10 0 5

−8

−4

04

EqH

Rel

Val

−10 0 5

−8

−4

04

EqH

EvD

riven

−10 0 5

−2

24

6

EqH

Mac

ro

−8 −4 0 4

−8

−4

04

RelVal

EvD

riven

−8 −4 0 4

−2

24

6

RelVal

Mac

ro

−8 −4 0 4−

22

46

EvDriven

Mac

ro

Figure 4.5. Fitted bivariate normal level curves (up to the 99-th percentile) againstscatter-plot of actual data points for all the six couple of variables

for skewness, then it coincides with the mean-variance allocation. As far as theinvestor is more inclined to trade skewness, the allocation changes with respect tothe mean-variance one, favoring the asset with the highest skewness value (Macroindex in this case). This behavior is more clear when the coefficient of preference forskewness is big with respect to the aversion to risk coefficient.

4.4 Out-of-sample performance

To empirically test the strategy explicitly including the third order predictive momentin the allocation decision, we ran an out-of-sample performance analysis over 60months, using a rolling estimation window of 60 monthly observations (i.e. 5 years).For each period we derive the minimum variance portfolio and the best variance-skewness portfolio with identical coefficient for risk aversion and skewness preference(γ = λ = 10). This means that we are neglecting the contribute of the expectedvalue here. Therefore the two investors are assumed to be indifferent with respectto the expected return of the investment, while they are fully concerned aboutthe risk of the investment, identified in the variance for the MV investor and in acombination of variance and skewness for the skewness-sensitive investor. It is worth


10000 20000 300000.

0

Iterations

10000 20000 30000

0.0

0.6

Iterations

10000 20000 30000

0.0

1.0

Iterations

10000 20000 30000

0.2

1.0

Iterations

Figure 4.6. Traceplots for two parallel MCMC simulations from the posterior distributionof the mean vector m = µµµ+

√2πδδδ: m1 (top-left), m2 (top-right), m3 (bottom-left),

m4 (bottom-right).

noting that we are considering agents with an investment horizon of one month,which is a reasonable holding period in an asset allocation problem. In Figure 4.14we have reported the sixty realized returns obtained for the two investors and thedevelopment of the compounded return assuming an initial budget of 1€. The graphsare positioned horizontally and they are on the same scale, then it is easy to checkthat the mean-variance investor incurs in much lower realized returns during the fiveyears of trading, which do not seem to be compensated by much greater realizedreturns with respect to the skewness-sensitive investor. In terms of the compoundedreturns, the picture is much more clear, showing how the mean-variance strategycould be very unstable, with very high profits followed by very dramatic losses, whileaccounting for skewness in this case turns out to be a much more calm strategy,immune to big downturns of the economy, at least with respect to the Markowitzstrategy. In Figure 4.15 we plotted the compounded returns at a different scaling,and in a unique graph, in order to show how the MV strategy starts to lose earlierand how the MVS strategy follows to some extent the behavior of the MV strategy,but with a much higher degree if immunization from extreme left-tail events.


10000 20000 30000

47

10

Iterations

10000 20000 30000

1.0

2.5

Iterations

10000 20000 30000

35

7

Iterations

10000 20000 30000

1.5

3.5

Iterations

Figure 4.7. Traceplots for two parallel MCMC simulations from the posterior distributionof the diagonal components of the covariance matrix: Σ + (1− 2

π )DDD(δδδ)2: V11 (top-left),V22 (top-right), V33 (bottom-left), V44 (bottom-right).

−0.5 0.0 0.5 1.0 1.5

0.0

1.0

N = 40000 Bandwidth = 0.0239

Den

sity

0.0 0.2 0.4 0.6 0.8 1.0

0.0

1.5

3.0

N = 40000 Bandwidth = 0.01264

Den

sity

0.0 0.5 1.0

0.0

1.0

2.0

N = 40000 Bandwidth = 0.02

Den

sity

0.0 0.4 0.8 1.2

0.0

1.0

2.0

N = 40000 Bandwidth = 0.01603

Den

sity

Figure 4.8. Kernel densities for the aggregated MCMC chain for mmm: m1 (top-left), m2(top-right), m3 (bottom-left), m4 (bottom-right).


4 5 6 7 8 9 10

0.0

0.2

0.4

N = 40000 Bandwidth = 0.08328

Den

sity

1.0 1.5 2.0 2.5 3.0

0.0

1.0

N = 40000 Bandwidth = 0.02296

Den

sity

3 4 5 6 7

0.0

0.4

0.8

N = 40000 Bandwidth = 0.05704

Den

sity

2 3 4 5

0.0

0.4

0.8

N = 40000 Bandwidth = 0.03994

Den

sity

Figure 4.9. Kernel densities for the aggregated chain for diagonal components of VVV : V11(top-left), V22 (top-right), V33 (bottom-left), V44 (bottom-right).

Allocation with γ = 5Allocation with γ = 5

0.0

0.2

0.4

0.6

0.8

1.0

Val

ue

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30

EqH RelVal EvDriven Macro

(a)


0.0

0.2

0.4

0.6

0.8

1.0

Val

ue

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30


(b)

Figure 4.10. Asset allocation for increasing values of preference for skewness λ (abscissa)and fixed aversion to risk γ


Table 4.4. WinBugs summary statistics for the parameters of interest

Mean SD Naive SE Time-series SEδ1 -0.41 0.63 0.01 0.08δ2 -0.96 0.14 0.00 0.00δ3 -0.81 0.24 0.00 0.02δ4 1.68 0.60 0.01 0.06

Σ11 5.83 0.79 0.01 0.01Σ12 2.45 0.37 0.00 0.01Σ13 4.62 0.63 0.01 0.01Σ14 1.90 0.41 0.00 0.01Σ21 2.45 0.37 0.00 0.01Σ22 1.28 0.24 0.00 0.00Σ23 2.07 0.31 0.00 0.00Σ24 0.61 0.20 0.00 0.00Σ31 4.62 0.63 0.01 0.01Σ32 2.07 0.31 0.00 0.00Σ33 3.85 0.54 0.01 0.01Σ34 1.35 0.33 0.00 0.00Σ41 1.90 0.41 0.00 0.01Σ42 0.61 0.20 0.00 0.00Σ43 1.35 0.33 0.00 0.00Σ44 1.41 0.39 0.00 0.01µ1 0.76 0.53 0.01 0.06µ2 1.28 0.15 0.00 0.01µ3 1.25 0.26 0.00 0.02µ4 -0.70 0.49 0.00 0.05

deviance 1692.82 107.85 1.08 4.51


0.0

0.2

0.4

0.6

0.8

1.0

Val

ue

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30


(a)


0.0

0.2

0.4

0.6

0.8

1.0

Val

ue

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30


(b)

Figure 4.11. Asset allocation for increasing values of preference for skewness λ (abscissa)and fixed aversion to risk γ


MV allocation 2006/08 − 2009/02MV allocation 2006/08 − 2009/02

0.0

0.2

0.4

0.6

0.8

1.0

Val

ue

2006/08 2006/12 2007/04 2007/08 2007/12 2008/04 2008/08 2008/12


(a)

MV allocation 2009/03 − 2011/08MV allocation 2009/03 − 2011/08

0.0

0.2

0.4

0.6

0.8

1.0

Val

ue

2009/02 2009/06 2009/10 2010/02 2010/06 2010/10 2011/02 2011/06


(b)

Figure 4.12. Minimum-variance asset allocation along a period of five years using a rollingestimation window of W = 60 months: from August-2006 to February-2009 (left), fromMarch-2009 to August-2011 (right)

MV−skewed allocation 2006/08 − 2009/02MV−skewed allocation 2006/08 − 2009/02

0.0

0.2

0.4

0.6

0.8

1.0

Val

ue

2006/08 2006/12 2007/04 2007/08 2007/12 2008/04 2008/08 2008/12


(a)

MV−skewed allocation 2009/03 − 2011/08MV−skewed allocation 2009/03 − 2011/08

0.0

0.2

0.4

0.6

0.8

1.0

Val

ue

2009/02 2009/06 2009/10 2010/02 2010/06 2010/10 2011/02 2011/06


(b)

Figure 4.13. Minimum-variance-skewness asset allocation with γ = λ = 10 along a periodof five years using a rolling estimation window of W = 60 months: from August-2006 toFebruary-2009 (left), from March-2009 to August-2011 (right)


0 10 30 50

−80

0−

200

400

Index

Rea

lized

ret

urn

0 10 30 50

−80

0−

200

400

Index

Rea

lized

ret

urn

0 10 30 50

−2e

+08

4e+

08

Index

Com

poun

ded

retu

rn

0 10 30 50

−2e

+08

4e+

08

Index

Com

poun

ded

retu

rn

Figure 4.14. Out-of-sample analysis: realized returns for the MV (top-left) and the MVS(top-right) strategies, and the compounded returns for the MV (bottom-left) and MVS(bottom-right) strategies, over the period 08/2006 - 08/2011


0 10 20 30 40 50 60

−1e

+06

0e+

001e

+06

2e+

063e

+06

Index

Com

poun

ded

retu

rn

Figure 4.15. Compounded returns traceplot at lower scale: MVS profit&loss (darkredsolid line), MV profit&loss (green points)

45

Chapter 5

Conclusions

The analysis provided in this study contributes to gather empirical evidence onthe need to account for higher-order moments in asset allocation decisions. This isdone by elaborating an in-sample and an out-of-sample analysis for a real hedge-fund dataset. The computational issues arising when departing from the standardnormal model are overcome using a simulation-based algorithm able to providevalid approximations to inferential problems otherwise intractable. Besides, thethesis contributes to argue in favor of the Bayesian paradigm as a firm point in theportfolio optimization process due to the coherent framework that provides to handlestatistical decision problems and the fair generalization of the classical/frequentistinferential procedures, vital to build operative tools to solve complex problemsand for the inclusion of private information in the optimization process. In thisthesis some simplifications have been made. First, our information is restricted topast returns. That is, investors make decisions based on past returns and do notuse other conditioning information such as macro-economic variables. Second, theportfolio choice problem examined is a static one. There is a growing literaturethat considers the more challenging dynamic asset allocation problem that allowsfor portfolio weights to change with investment horizon, labor income and othereconomic variables. Third, the potential presence of co-skewness and kurtosis inthe sample data and in the investor’s preferences drivers are not taken into account.Therefore many progresses can still be made in future researches.

47

Appendices

49

Appendix A

MCMC diagnostics

A.1 Gelman-Rubin diagnostic

Gelman and Rubin’s (1992) approach to monitoring convergence is based on de-tecting when the Markov chains have forgotten their starting points, by comparingseveral sequences drawn from different starting points and checking that they areindistinguishable. Using overlaid traceplots is possible to gain a qualitative infor-mation, while this test constitutes a more quantitative approach. Approximateconvergence is diagnosed when the variance between the different sequences is nolarger than the variance within each individual sequence. In the limit both variancesapproach the true variance of the distribution, but from opposite directions. Onecan now monitor the convergence of the Markov chain by estimating the factor bywhich the conservative estimate of the distribution might be reduced: that is, theratio between the estimated upper and lower bounds for the standard deviationof random variable, which is called estimated potential scale reduction or shrinkfactor. As the simulation converges, this quantity declines to 1, meaning that theparallel Markov chains are essentially overlapping. If the shrink factor is high, thenone should proceed with further simulations. The Gelman and Rubin diagnosticscalculated by the R package CODA are the 50% and 97.5% quantiles of the samplingdistribution for the shrink factor. These quantiles are estimated from the secondhalf of each chain only. In Tables A.1-A.2 and Figures A.1-A.2 we have reported theresults of Gelman-Rubin diagnostic test for two parallel chains of the mean vectormmm and the diagonal elements of the covariance matrix VVV .

A.2 Geweke diagnostic

Geweke (1992) proposes a convergence diagnostic based on standard time-seriesmethods. The chain is divided into 2 parts containing the first 10% and the last50% of the iterations. If the whole chain is stationary, the means of the values earlyand late in the sequence should be similar. The convergence diagnostic Z is thedifference between the 2 means divided by the asymptotic standard error of theirdifference. As n→∞, the sampling distribution of Z goes to N(0; 1) if the chainhas converged. Hence values of Z which fall in the extreme tails of N(0; 1) indicatethat the chain has not yet converged. In A.3-A.4 and A.3-A.4 we have reported the

50 A. MCMC diagnostics

Geweke diagnostic results for the aggregated chains of the mean vector mmm and thediagonal elements of the covariance matrix VVV .

Table A.1. Gelman and Rubin convergence diagnostic for mmm: shrink factor point estimate(left column), upper bound confidence interval (right column)

Point est. Upper C.I.m1 1.00 1.01m2 1.00 1.00m3 1.00 1.01m4 1.00 1.00

10000 20000 30000

1.0

1.6

last iteration in chain

shrin

k fa

ctor

median97.5%

m1

10000 20000 30000

1.0

1.8


shrin

k fa

ctor

median97.5%

m2

10000 20000 30000

1.0

1.4


shrin

k fa

ctor

median97.5%

m3

10000 20000 30000

1.00

01.

030


shrin

k fa

ctor

median97.5%

m4

Figure A.1. Gelman plot for mmm: median and 97.5-th quantile of shrink factor sampledistribution against number of iterations

A.2 Geweke diagnostic 51

Table A.2. Gelman-Rubin diagnostic for VVV : shrink factor point estimate (left column),Upper bound confidence interval (right column)

Point est. Upper C.I.v1 1.00 1.00v2 1.00 1.00v3 1.00 1.00v4 1.00 1.01

Table A.3. Geweke diagnostic for mmm: average Z-scores

m1 m2 m3 m40.3163 0.1699 0.1470 -0.1334

0 5000 15000

1.00

1.20


shrin

k fa

ctor

median97.5%

v1

0 5000 15000

1.0

2.0


shrin

k fa

ctor

median97.5%

v2

0 5000 15000

1.00

1.04


shrin

k fa

ctor

median97.5%

v3

0 5000 15000

13

5


shrin

k fa

ctor

median97.5%

v4

Figure A.2. Gelman plot for VVV : median and 97.5-th quantile of shrink factor sampledistribution against number of iterations

Table A.4. Geweke diagnostic test VVV

V1 V2 V3 V4-1.893 -2.911 -1.233 1.014

52 A. MCMC diagnostics

0 5000 15000

−2

02

First iteration in segment

Z−

scor

e

m1

0 5000 15000

−2

02


Z−

scor

e

m2

0 5000 15000

−2

02


Z−

scor

e

m3

0 5000 15000−

20

2First iteration in segment

Z−

scor

e

m4

Figure A.3. Geweke plot for mmm: Z-scores against number of discarded iterations from thebeginning of the chain

0 5000 15000

−2

02


Z−

scor

e

v1

0 5000 15000

−3

03


Z−

scor

e

v2

0 5000 15000

−2

02


Z−

scor

e

v3

0 5000 15000

−2

2


Z−

scor

e

v4

Figure A.4. Geweke Z-scores plot for VVV : Z-scores against number of discarded iterationsfrom the beginning of the chain

53

Bibliography

Athayde, F. (2004). Finding the maximum skewness portfolio - a general solution tothree moments portfolio choice. Journal of Economic Dynamics and Control.

Azzalini, D. V. (1996). The multivariate skewed-normal distribution. Biometrika.

Bernardo, J. (2005). Reference analysis. Handbook of Statistics.

Best, M.J., G. R. (1991). On the sensitivity of mean-variance-efficient portfolios tochanges in asset means: some analytical and computational results. Review ofFinancial Studies.

Castellani G., De Felice M., M. F. (2005). Manuale di finanza. Il Mulino.

Chen, L. X. (2011). Mean-variance portfolio optimization when means and covari-ances are unknown. The Annals of applied statistics.

Chopra, V.K., Z. W. (1993). The effect of errors in means, variances, and covarianceson optimal choice. The journal of portfolio management.

De Miguel, V., G. L. U. R. (2007). Optimal versus naive diversification: howinefficient is the 1/n portfolio strategy? The review of financial studies.

Ferguson (1967). Mathematical statistics. A decision theoretic approach. AcademicPress.

Ferreira, S. (2007). A new class of skewed multivariate distributions with applicationsto regression analysis. Statistica sinica.

Harvey, C. (2010). Portfolio selection with higher moments. Quantitative finance.

Jeffreys (1961). Theory of probability. Clarendon Press.

Mandelbrot, B. (1963). The variation of certain speculative prices. The journal ofbusiness.

Mandelbrot, B.; Hudson, R. (2006). The (mis)behavior of markets. A fractal view offinancial turbulence. Basic Books.

Mantegna R., S. H. (2000). An introduction to econophysics: correlation andcomplexity in finance. Biddies Ltd, Guildford & King’s Lynn.

Markowitz, H. (1952). Portfolio selection. The Journal of Finance, 7:77–91.

54 Bibliography

McNeil, A. J., F. R. E. P. (2005). Quantitative Risk Management: Concepts,Techniques and Tools. Princeton Uuniversity Press.

Meucci, A. (2005). Risk and Asset Allocation. Springer-Verlang.

Michaud (1989). The markowitz optimization enigma: Is "optmized" optimal?

Piccinato, L. (2009). Metodi per le Decisioni Statistiche. Springer-Verlag.

Polson, T. (2000). Bayesian portfolio selection: an empirical analysis of the s&p500index 1970-1996. Journal of Business & Economic Statistics.

Robert, C. P., C. G. (2004). Monte Carlo Statistical Methods. Springer.

Robert, C. P. (2007). The Bayesian Choice. Springer.

Sahu, D. (2003). A new class of multivariate skew distributions with applications tobayesian regression models.

Savage, L. J. (1954). The Foundations of Statistics. John Wiley & Sons.

Von Neumann, J., M. O. (1944). Theory of games and economic behavior. PrincetonUniversity Press.

Wald (1950). Statistical decision functions. Wiley.

Weitzman (2007). Subjective expectations and asset-return puzzles.

Xiong, J.X., I. T. (2011). The impact of skewness and fat tails on the asset allocationdecision. Financial Analyst Journal.

Bayesian statistics and MCMC methods for portfolio selection

Documents

Transcript of Bayesian statistics and MCMC methods for portfolio selection