An Algorithm for Computing Risk Parity Weights - F. Spinu

download An Algorithm for Computing Risk Parity Weights - F. Spinu

of 6

Transcript of An Algorithm for Computing Risk Parity Weights - F. Spinu

  • 7/29/2019 An Algorithm for Computing Risk Parity Weights - F. Spinu

    1/6

    AN ALGORITHM FOR COMPUTING RISK PARITY WEIGHTS

    FLORIN SPINU

    Abstract. In this paper we introduce an algorithm for computing the solutionto the risk budget allocation problem. The algorithm is based on Newtons

    method for self-concordant functions.

    1. Introduction

    For N

    2 a positive integer, let RN+ =

    {x = (x1, . . . , xN)

    T

    RN, xi > 0

    }the

    positive cone, and SN+ the set of symmetric, positive definite NN matrices. Forx, y RN, xy := (x1y1, . . . , xNyN)T, and diag(x) is the diagonal NN matrix withxi on the diagonal. For x RN+ , x1 := (1/x1, . . . , 1/xN) RN+ . We will also makeuse of the norms x2 := (

    i x

    2i )

    1/2, x := maxi |xi|, and xC := (xTCx)1/2.For a given C SN+ and b RN+ , we consider the following equation

    (1) Cx = bx1, x RN+ .Theorem 1.1. The equation (1) has a unique solution x RN+ .

    As a consequence, we have the following corollary (cf. [BR12]):

    Corollary 1.2. The vector x/

    i xi is the unique solution to the system

    (2)

    (Cx)ixi

    xTCx =

    biNj=1 bj , 1 i N ,

    satisfying

    (3) xi > 0,Ni=1

    xi = 1 .

    1.1. Financial interpretation. Given a set of N financial assets with covariancematrix C, a long-only, fully invested portfolio in the N assets is determined by a setof allocations x = (x1, . . . , xN)

    T satisfying (3). The risk of the portfolio is usuallydefined as the standard deviation of the portfolio returns

    (4) P(x) :=

    xTCx .

    As a homogeneous function of degree one, P(x) satisfies Eulers identity

    (5)Ni=1

    1

    Pxi

    Pxi

    (x) = 1 .

    The ith summation term is sometimes considered to be the contribution of asset ito the risk of the total portfolio. Since

    (6)1

    Pxi

    Pxi

    (x) =(Cx)ixi

    xTCx,

    1

  • 7/29/2019 An Algorithm for Computing Risk Parity Weights - F. Spinu

    2/6

    2 FLORIN SPINU

    equation (2) matches the risk contribution of asset i with the predetermined weightbi/

    j bj . For this reason, the solution to (2) is referred to as the risk budget

    allocation in [BR12]. When the weights bi are equal, the term risk parity wasintroduced earlier in [Q05]. The reader is referred to these papers for an in-depthdiscussion of the properties of portfolios constructed using these weights.

    2. Proof of Theorem 1.1

    The proof, as well as the choice of the function below, is similar to the one in[BR12]. Let

    (7) F(x) :=1

    2xTCx

    Ni=1

    bi log(xi) .

    The gradient and Hessian of F are given by, respectively,

    (8)

    F(x) = Cx

    bx1,

    2F(x) = C+ diag(bx2) .

    In particular, a solution to (1) is a critical point of F. Since F is strictly convex(2F(x) C), it has at most one critical point (this proves the uniqueness part ofthe Theorem). To prove that F has at least one critical point, it suffices to showthat F has a a point of global minimum x RN+ . This is a consequence of the factthat F(x) + as x RN+ (see Lemma 4.1 in the Appendix).

    2.1. Proof of Corollary 1.2. If x satisfies (1), then x

    i x

    isatisfies (2) and (3).

    Conversely, if x satisfies (2), 1x satisfies (1), with = (xTCx)1/2

    (

    i bi)1/2 . This implies

    that 1x = x, the unique solution to (1). Furthermore, the constraint (3) impliesthat = 1

    i x

    i, which determines x uniquely.

    3. Newtons algorithm

    3.1. Simplifying assumptions. Let D the diagonal ofC, and R = D1/2CD1/2

    the associated correlation matrix: R is positive definite with 1 on the diagonal. Ifx is the solution to (1), y := D1/2x is the unique solution to the equation

    (9) Ry = by1

    Therefore, we may assume from now on that C is a correlation matrix:

    (10) C SN+ , Cii = 1 .In particular, |Cij | 1, i, j, by the Cauchy-Schwartz inequality.

    The next observation is that if x satisfies (1) with weights bi, t1/2x satisfiesthe same equation with bi replaced by tbi. Therefore we may rescale bi so that

    (11) min1iNbi = 1 .

    In what follows, we refer to [N98, Chap.4] for the notion of self-concordant func-tions. The set of self-concordant functions is closed under addition and multiplica-tion by a scalar greater than one, and it counts among its members all the quadraticfunctions, as well as the functions log(xi). We conclude thatProposition 3.1. Under the assumptions (10) and (11), F(x) is self-concordant.

  • 7/29/2019 An Algorithm for Computing Risk Parity Weights - F. Spinu

    3/6

    AN ALGORITHM FOR COMPUTI NG RISK PARITY WEIGHTS 3

    3.2. Convex optimization. We remarked in Section 2 that the solution to (1) isthe global minimum of the convex function F, defined at (7)

    (12) x = argminxRN+ F(x) .

    Let x := (2F(x))1F(x) the iteration step in Newtons algorithm. If x0 isclose enough to x, the sequence

    (13) xk+1 = xk + xk, k 0,converges quadratically to x. A difficulty arises when an appropriate choice ofthe initial point x0 is not available. Since F is self-concordant, this issue canbe dealt with as in [N98, Chap 4.]. Let F(x) := (x)TF(x). As long ask > := 3

    5

    2 , the above step is replaced with the damped iteration

    (14) xk+1 = xk +1

    1 + kxk, k := F(xk) .

    This step is guided by the key inequality [N98, Thm 4.10]

    (15) F(xk+1) F(xk) (k),where (t) := t log(1 + t). This ensures a decrease in the objective function ofat least () per iteration. In particular, less than

    F(x0)F(x)()

    damped iterations

    are required to arrive at k < , at which point the quadratic phase (13) sets in.Moreover, we can significantly reduce the number of damped iterations by exploitingthe explicit form of F. The following theorem is proved in the Appendix.

    Theorem 3.2. For x RN+ , let := max1iN |(x)i|xi . With h = 11+ , we have(16) F(x + hx) F(x) (2F(x)/2) () .

    Consequently, we may replace the iteration (14) with

    (17) xk+1 = xk +1

    1 + kxk, k := xk/xk .

    This step is greater than (14), since k < k. On the other hand, the decrease inthe objective function is as good as (15) since (2k/

    2k)(k) (k), by Lemma

    4.3 in the Appendix. Let u1 := (1, . . . , 1)T and S :=

    i bi. We conclude with

    Theorem 3.3. Letx0 :=S

    uT1Cu1

    u1, := 0.95 3

    52 , and Tol > 0 a termina-

    tion threshold. Consider the following algorithm: for k 0, Compute

    uk := F(xk) = Cxk bx1kHk :=

    2F(xk) = C+ diag(bx

    2k )

    xk := H1k ukk := xk/xk

    (Damped Phase) While k > , do xk+1 = xk + 11+k xk . (Quadratic Phase) While k > Tol, do xk+1 = xk + xk .The number of iterations is less than 9.4 Slog(CN) in the damped phase, and(log2 log2(1/Tol) + 2.6) in the quadratic phase, withC the condition number ofC.The terminal value xend of the algorithm satisfiesxend xC Tol1Tol .

  • 7/29/2019 An Algorithm for Computing Risk Parity Weights - F. Spinu

    4/6

    4 FLORIN SPINU

    Proof. The bound on the number of iterations of the quadratic phase is standard,and can be derived from [N98, Thm 4.1.12]. To bound the number of damped

    iterations, we start from the upper bound

    F(x0)

    F(x)

    () . First, we remark thatxT0 Cx0 = (x

    )TCx = S. Let 1 and N be the smallest and the largest eigenvaluesof C. Since uT1 Cu1 Nu122 = N N, F(x0) = 12 S Slog( S

    1/2

    (uT1Cu1)1/2

    ) 12 S12

    Slog( SNN). On the other hand, S = (x)TCx 1x22. This implies xi

    S/1, i, hence F(x) 12 S 12 Slog( S1 ). Taking the difference, we obtainF(x0)F(x) 12 Slog(CN), where C = N1 . Finally 12() = 9.385 < 9.4. Thelast term satisfies xk xC xk xxk Tol1Tol (cf. [N98, Thm 4.1.11]). 3.3. Practical Implementation. a) For N = 50 and T ol = 106, the theoreticalupper bound for the number of quadratic iterations is 8, while the stated upperbound for the number of damped iterations can be quite large (depending on S).We find in practice however that the total number of iterations is significantly lower

    than that: for each of the 107

    trials with random weights b (uniform distribution)and random covariance matrix C (Wishart distribution), the algorithm terminatedafter less than 16 iterations. b) We have also tested the algorithm on the risk parityproblem in dimension N = 1400. That is, bi = 1/N and T ol = 10

    6. We foundthat the number of iterations required was less than 6 for each of 2105 trials withrandom covariance matrix C.

    4. Appendix

    Lemma 4.1. F(x) tends to + as x approaches the boundary ofRN+ :(18) lim

    xRN+

    F(x) = + .

    Proof. Let 1 the lowest eigenvalue of C. Since xTCx 1i x2i , we have

    (19) F(x) Ni=1

    (1

    21x

    2i bi log(xi)) =:

    Ni=1

    gi(xi) ,

    with gi(t) =12 1t

    2 bi log(t). The function gi is convex when t > 0, has a globalminimum at ti =

    bi/1, and satisfies

    (20) limt+

    gi(t) = limt0+

    gi(t) = + .Therefore for each i, F(x) gi(xi)+

    j=i gj(t

    j). Let B > 0 an arbitrary threshold.

    By (20) there exist i > i > 0 (depending on B) such that

    (21) xi / [i, i] F(x) B .Therefore F(x) B outside the box

    Ni=1[i, i], which proves the Lemma.

    Lemma 4.2. (x) > x22(x+1) , when x > 0.

    Proof. Let f(x) := (x) x22(x+1) . Since f(x) = x2

    2(x+1)2 > 0, f is increasing.

    Therefore f(x) > f(0) = 0, when x > 0.

    Lemma 4.3. The function (x)x2 is decreasing on (0,).Proof. ddx(

    (x)x2 ) =

    2x3 [

    x2

    2(x+1) (x)] < 0, by Lemma 4.2.

  • 7/29/2019 An Algorithm for Computing Risk Parity Weights - F. Spinu

    5/6

    AN ALGORITHM FOR COMPUTI NG RISK PARITY WEIGHTS 5

    4.1. Proof of Theorem 3.2. We follow the argument in [N98, Chap.4] and adaptit to our situation. From there, we borrow the notations

    (22) ux := (uT2F(x)u)1/2, F(x) := xx .The starting point is the second order Taylor expansion, valid when x, x + u RN+

    (23) F(x + u) F(x) uTF(x) =1

    0

    10

    t uT2F(x + tu)u ddt .

    Consider the case u := hx. To ensure x + hx RN+ , we need 0 < h < 1 , where

    (24) := max1iN

    |(x)i|xi

    .

    With := F(x), uTF(x) = h2 and (23) is re-written as

    (25) F(x + hx) F(x) + 2h =

    1

    0

    1

    0

    h2tx2x+htx ddt .

    We note that 2 = (x)TC(x) +

    i bix2ix2i

    and

    x2x+hx = (x)TC(x) +Ni=1

    bix2i

    x2i

    1

    (1 + h xixi )2

    .

    Since |1 + h xixi | 1 h, it follows that

    x2x+hx 2

    (1 h)2 , 0 < h