Ch3.Interest.random

SCHOOL OF MATHEMATICS & STATISTICS

STAT3364 –APPLIED PROBABILITY IN COMMERCE & FINANCE

Chapter 3. Random Interest Rates

3.1. Random rates. In this chapter time is discrete. So as before, let Cn denote the cash level attime n, but now assume that this is a random variable and hence that the single period accumulationfactor An = Cn/Cn−1 is random. The horizon-N cash level (i.e., FV of C0) is

CN = C0

N∏n=1

An.

The simplest possible assumption is that the An’s are independent and identically distributed (iid) andpositive with probability one. In the case that the Cn’s represent values of stock, there is a basicprinciple of limited liability whereby holders of stock are not responsible for debts incurred if the issuingorganization fails. This implies that prices/values can never be negative and hence they should neverbe modelled using two-sided distributions such as the normal. Frequently I work with log-returns Xn =log An, i.e., An = exp(Xn), and so I must assume that An > 0 to avoid having Xn = −∞ with positiveprobability.

In terms of log-returns the cash level is

CN = C0 exp

(N∑

n=1

Xn

), (n ≥ 0) (1)

where I define∑0

n=1(·) = 0. Denote the sum in (1) by WN , a sum of iid random variables. You knowquite a lot about the long-term behaviour of such sums, e.g., that their distribution is approximatelynormal if N is large (the central limit theorem). What does this prior knowledge imply about CN? Ibegin with some simpler properties.1

3.2. Moments. Recall that the expectation of a product of independent random variables equals theproduct of the individual expectations. So if a := E(An) then

E [CN/C0] = E

[N∏

n=1

An

]=

N∏n=1

E(An) = aN . (2a)

So if C0 is independent of the An’s and c0 = E(C0) then E(CN ) = c0aN . It follows that

limN→∞

E(CN ) =

∞ if a > 1,c0 if a = 1,0 if a < 1.

Indeed E(CN ) behaves exactly like CN when rates are constant and r = a − 1. Consequently it seemsthat a − 1 should be a good measure of average returns. It typically is the case in financial situationsthat a ≈ 1, in which case log a ≈ a− 1. So, there are two measures of average return (r and log a) whichare nearly equal, and each is in common use.

1A useful reference for discrete-time random interest rates is

M.A. Bean (2001) Probability: The Science of Uncertainty, §§2.4, 8.2, 8.4 & 8.5.

1

Next, suppose that a2 := E(A2n) < ∞. Since C2

N = C20

∏Nn=1 A2

n, taking the expectation yields

E(C2N ) = E(C2

0 )N∏

n=1

E(A2n) = E(C2

0 )aN2 ,

from which you obtain the variance formula

V ar(CN ) = E(C20 )aN

2 − c20a

2N . (2b)

Taking C0 to be non-random and using the fact that V ar(CN ) > 0, you infer that a2 > a2, and hencethe subtracted term on the right-hand side becomes negligible compared with the first term if N is large.In particular V ar(CN ) grows quickly with N if a2 > 1, i.e., the distribution of CN rapidly spreads outas the horizon increases, making forecasts very unreliable.

In the next two sections I introduce two models which are widely used by practitioners.

3.3. The binomial tree model. This is widely used in the theory and practice of asset pricing. LetSn denote the value of some stock on ‘day’ n and assume from day to day the value increases by a factoru > 1 with probability p, or it decreases by a factor d < 1 with probability q = 1 − p.2 Day to daytransitions are mutually independent, and they can be represented in a tree diagram, whence the nameof the model. Assume the initial value S0 is non-random, and observe that the possible values at timen are S0u

n > S0un−1d > · · · > S0udn−1 > S0d

n. The model seems quite crude, but it’s claimed insome financial literature to represent reality quite well if the ‘days’ are sufficiently short intervals of time.There are two mathematical representations of the binomial model.

Representation 1. This gives easy access to formulae for the moments of An. Define independentBernoulli-like random variables

An =

u with probability p,d with probability q.

Thus An is the growth factor for the interval (n − 1, n). It is clear that SN = S0

∏Nn=1 An. Since

a = E(An) = pu + qd and a2 = E(A2n) = pu2 + qd2 it follows from (2) (with Sn replacing Cn) that

E(SN ) = S0 (pu + qd)N & V ar(SN ) = S20

[(pu2 + qd2

)N − (pu + qd)2N].

For example, taking u = 1.1 and d = 0.9 gives values of these moments shown in Table 1. These tabulated

p a E(SN ) a2 a2 s.d.(SN )0.2 0.94 → 0 0.89 0.884 → 00.48 0.996 → 0 1.002 0.992 →∞0.5 1 ≡ 1 1.01 1 →∞0.7 1.04 →∞ 1.09 1.082 →∞

Table 1: Values of E(SN ) and s.d.(SN ) are the limits as the horizon N →∞.

values and the moment formulae show that if the standard deviation s.d.(SN ) becomes small in the longrun, then so does E(SN ). On the other hand, the expectation may become small and at the same times.d.(SN ) becomes large, thus increasing uncertainty. This occurs in the case p = 0.48.

2Strictly speaking, the basic mathematical assumption is that d < u. For example, it could be that the price never

decreases, instead increasing by a factor u with probability p, or by d with probability q. More complicated versions of the

binomial model allow for other possibilities, such as increasing, decreasing, or not changing.

2

Representation 2. This representation of the binomial tree is more convenient for calculating proba-bilities such as P (SN > S0x). Let UN count the number of ‘up days’ during the holding period, i.e., thenumber of subscripts n such that An = u. It should be obvious that UN ∼ Bin(N, p). There are N −UN

‘down days’, whenceSN = S0u

UN dN−UN (3)

It’s best to write probability statements about SN in terms of UN and then refer to tables or a package.For example,

P (SN > S0x) = P

(UN >

log x−N log d

log(u/d)

), (4)

where I have taken the (natural) logarithm of both sides of the left-hand side inequality and used (3) toobtain the right-hand side. Taking u = 1.1 and d = 0.9 (as above), x = 1.1 and N = 10, computationshows that(log x −N log d)/ log(u/d) = 5.725. Since the inequality in (4) is strict and UN takes integervalues, you must round up, i.e., the desired probability is P (UN ≥ 6) = 1− P (UN ≤ 5). If p = 0.5 thenP (UN ≤ 5) = 0.6230, so the probability that the stock value increases by at least 10% over 10 days is0.377.

3.4. The lognormal model. I begin by defining the lognormal law. Recall that if X ∼ N(µ, σ2),the normal law with mean µ and variance σ2, then you can write X = µ + σZ where Z ∼ N(0, 1), thestandard normal law.

Definition 3.4.1. A random variable A has the lognormal law with parameters µ and σ, writtenA ∼ LN(µ, σ) if

A = eX = eµ+σZ . (5)

Clearly A > 0, and since X = log A, the distribution function of A is

FA(x) = P (A ≤ x) = P (X ≤ log x) = Φ(

log x− µ

σ

).

The corresponding density function is found using the chain rule: For x > 0,

fA(x) =1

σxφ

(log x− µ

σ

)=

1x√

2πσ2exp

[−(

log x− µ

σ

)2]

;

and defining fA(0) = 0 ensures continuity at the origin.The median me of A satisfies FA(me) = 1

2 , giving

me = eµ.

A mode is any value of x at which fA(x) has a local maximum. Look for such values by solving f ′A(x) = 0.Using logarithmic differentiation to solve this equation shows that there is exactly one solution, and hencethat it must be a global maximum. The unique solution is the mode,

mo = eµ−σ2< me.

To find the moments of A you need to know that the moment generating function of Z is E(eθZ) =eθ2/2, (−∞ < θ < ∞). The representation X = µ + σZ implies that the moment function of A is

E(Aθ) = E[eµθ × eσθZ

]= eµθE

[e(σθ)Z

]= eµθ+σ2θ2/2.

3

Letting θ = 1 and θ = 2 gives the first and second moments,

a := E(A) = eµ+σ2/2 & a2 := E(A2) = e2µ+2σ2. (6a)

These imply thats := s.d.(A) = eµ+σ2√

1− e−σ2 . (6b)

Note that a > me, so you have the general order relation, mo < me < a for all possible parameter values.Probability calculations for A are carried out in terms of the standard normal law by using the aboveexpression for FA(x).

The symmetry of the standard normal density can be expressed by saying that Z and −Z have thesame (standard normal) law, and we write this relation as Z

L= −Z. The equality denotes equality in

law, i.e., having the same distribution. It does not say that Z and −Z are equal as random variables (i.e.the same function on the sample space). This symmetry is reflected by the lognormal law; (5) yields

A−1 = e−µ−σZ d= e−µ+σZ ,

i.e. if A ∼ LN(µ, σ) then A−1 ∼ LN(−µ, σ).It follows easily from the definition that the lognormal type is preserved under power transformations,

and also by multiplication of independent lognormal random variables. The following lemma records thesefundamental properties of the lognormal law.

Lemma 3.4.1. Let A ∼ LN(µ, σ). (i) If θ is a real number then Aθ ∼ LN(µθ, σθ). (ii) If B ∼LN(µ′, σ′) is independent of A, then AB ∼ LN(µ + µ′,

√σ2 + σ′2).

Practitioners frequently assume that log-returns Xn = log An have a normal law, Xn ∼ N(µl, σ2l ). The

subscript l signals log-returns. The parameter σl is called the volatility. Keep in mind that the parametersµl and σl are the mean and volatility of log-returns and not of the accumulation factors An = eXn . Itfollows from Definition 1 above that these have a lognormal law, An ∼ LN(µl, σl). Substituting (6a) into(2a) (and remembering the small change in notation) shows that the N -horizon expected cash level is3

E(CN ) = c0

[eµl+σ2

l /2]N

.

Take note of the fact that this depends on the volatility as well as on the single period mean log-returnµl. This fact is often misunderstood. In particular, if the volatility is large enough you can have a > 1even though µl < 0 in which case (as you will later see) that the actual cash level dwindles to zero as N

increases, even though its expected value increases.The lognormal law arises as a large N approximation to the binomial tree model in the following way.

Observe first that by taking the logarithm of (3) and simplifying a little, (3) can be expressed as

SN/S0 = exp [N log d + (log(u/d))UN ] .

The normal approximation to the binomial law asserts that probabilities calculated from the Bin(N, p)law can often be approximated well by computing these probabilities using the N(Np,Npq) law. Iwill express this approximation for the number of up days UN in random variable terms by writingUN

∼= Np +√

NpqZ, where q = 1 − p and Z ∼ N(0, 1). The notation V ∼= W means that the random

3Applied to stock prices, this formula is E(SN ) = S0[exp(µl + 12σ2

l )]. Cash levels CN and stock prices SN can be used

inter-changeably in most of what follows.

4

variable V has ‘approximately the same probability law’ as W . It does not mean the random variablesare approximately equal, i.e., as functions defined on the same sample space their values need not benearly equal everywhere on the sample space. Substituting this normal approximation into the aboveexpression for SN/S0 and simplifying yields

SN/S0∼= exp

[N(p log u + q log d) + (log(u/d))

√NpqZ

].

Reference to Definition 1 shows that right-hand side is lognormal, i.e.

SN/S0 ≈ LN(N(p log u + q log d), (log(u/d))

√Npq

),

meaning that the random variable on the left-hand side has a probability law which is approximatelythe lognormal law on the right-hand side. Thus probabilities for the binomial model can be evaluatedusing this approximation in conjunction with the continuity correction. In practice it’s best to executethe calculation as in (4), replacing the right-hand side with the normal approximation including thecontinuity correction. The typical case is that (log x−N log d)/ log(u/d) is not integer valued. Let I(x)denote its integer part. Then the right-hand side of (4) is P (UN > I(x)), and the normal approximationfor this, with the continuity correction, involves subtracting 1

2 from I(x).

3.5. Laws of large numbers. The expression (2a) for E(CN ) suggests that CN tracks E(CN ) as N

increases. This turns out to be false, and I now develop the results needed to investigate this matter. It’sclear from (1) that we need a deeper understanding of the behaviour of the partial sums WN =

∑Nn=1 Xn.

I begin by emphasizing that probability theory usually doesn’t make sure statements about randomoutcomes.

Example 3.5.1. Choose a number at random from the unit interval [0, 1]. The probability of choosinga particular number, 1/3 say, is zero. It follows from the additivity axiom that the the probabilitythat the randomly chosen number belongs to any specific countable subset is zero. In particular P (arandomly chosen number is rational) = 0. Expressing this in contrapositive fashion, we say that almostsurely a randomly chosen number is irrational, meaning that this event has probability equal to unity.This doesn’t mean that a rational number cannot be chosen, but only that there is a zero probability ofthis occuring. A consequence is that saying an event almost surely can’t occur does not imply that itsoccurence is impossible.

Suppose now that X1, X2, . . . are iid random variables. The weak law of large numbers (WLLN)asserts that the average WN := N−1

∑Nn=1 Xn converges in probability to µl = E(Xn), provided this

is finite. This means that the probability mass of the distribution of WN concentrates around µl asN →∞, more precisely, limN→∞ P (|WN −µl| ≤ ε) = 1 for any small ε > 0. In the case that the varianceσ2

l = V ar(Xn) < ∞ you may have seen the simple proof using Chebyshev’s inequality:

P (|WN − µl| > ε) ≤ V ar(WN )/ε2 = σ2l /Nε2 → 0 as N →∞.

The WLLN is suitable for most needs of statistical inference, but it doesn’t say much about individualsample paths of the averages.

In what follows you need to recall that the Xn denote functions defined on some sample space Ω.Denote elementary events (i.e., members of Ω) by ω, and define the set

Λ = ω ∈ Ω : limN→∞

WN exists and equals µl.

5

General theory shows that Λ is an event, i.e., it can be assigned a probability. Another result (Kol-mogorov’s zero-one law) asserts that either P (Λ) = 1, or P (Λ) = 0. In the former case we can saythat the sequence of averages WN : N ≥ 1 converges almost surely (denoted a.s.) to µl and we writeWN

a.s.→ µl. We call this outcome a strong law of large numbers (SLLN). It implies the WLLN. Thecomplete statement of this fundamental SLLN follows.

Kolmogorov’s SLLN (1928). If µl = E(Xn) is finite, then WNa.s.→ µl. Conversely, if there is a

random variable M such that P (|M | < ∞) = 1 and WNa.s.→ M , then µl is finite.

There is no quick proof of this best possible result. Necessary background and Cantelli’s proof isdescribed in §3.10. This proof assumes the stronger condition that the moment of order four, E(X4

n), isfinite, and hence it is much simpler than any proof of the best possible result.

3.6. Cash growth with random rates. Refering to (2a) above, I have suggested

a = E(A1) = [E(CN/C0)]1/N

as a possible measure of the average growth rate of the investment of C0 over N periods. Anotherapproach starts from the spot rate, defined at (2.14). In terms of our present notation, if

AN :=

[N∏

n=1

An

]1/N

=N∏

n=1

A1/Nn

is the geometric mean of the growth factors, then the (random) spot rate for the horizon N is AN − 1.In addition, recalling that WN = N−1

∑Nn=1 log An,

(CN/C0)1/N = AN = eW N .

Applying the SLLN shows the right-hand side a.s.→ eµl . This limit defines the geometric expectation ofthe accumulation factors An,

g∞ := Eg(An) = eE[log An] = eµl .

Hence(CN/C0)1/N a.s.→ g∞.

This result gives a true measure of the long-term behaviour of CN in the sense that

CN ≈ C0gN∞, (10)

showing that CN does not track its expectation c0aN . The relation between a and g∞ is exposed by the

following general result called the arithmetic-geometric mean inequality.

The AGM inequality. Eg(An) ≤ E(An), and equality holds only if P (An = a) = 1, i.e., if An is aconstant-valued random variable.

For truly random rates the AGM inequality asserts that g∞ < a, and hence the SLLN implies that

CN/E(CN ) ≈ [g∞/a]N → 0,

showing that E(CN ) grossly over-estimates the long-term behaviour of CN ; in all circumstances CN

becomes arbitrarily small with respect to aN .

6

Example 3.6.1. The binomial tree. I have shown in §3.3 that a = pu + qd. Since µl = E(log A1) =p log u + q log d = log(updq), the geometric expectation is

Eg(An) = g∞ = updq.

It is not obvious that updq < a = pu+ qd, but the AGM inequality says that it is. To obtain some idea ofdifferences between a and g∞, I choose the parameter values u = 1.1, d = 0.9 and list values for selectedvalues of p in Table 2. Note that in each case there is little difference between a and g∞. However there

p a g∞ a10 g10∞ a30 g30

∞

0.2 0.94 0.9367 0.539 0.521 0.156 0.1410.48 0.996 0.9910 0.961 0.914 0.887 0.7620.5 1 0.9950 1 0.951 1 0.8600.7 1.04 1.0357 1.480 1.421 3.243 2.8670.9 1.08 1.0781 2.1 2.122 10.063 9.557

Table 2: Derived quantities for the binomial model for various ‘up day’ probabilities.

is a small interval ( 12 , p

′) such that for p in this interval we have g∞ < 1 < a. Consequently using a as ameasure of growth suggests that the investment will grow in the long run, whereas the value of g∞ impliesthat in fact it will decrease. This difference is illustrated for N = 10 and N = 30 in the last two panelsof Table 2. If p = 1

2 then using the mean return a predicts that the horizon-N mean accumulation also isunity, whereas the geometric mean implies that the real accumulation trends downwards. If p = 0.7 thenaccumulation trends upwards, but mean accumulation over 30 periods (3.243) looks more optimistic thanexpected from the geometric moment (2.867). These issues are important because investment prospec-tuses sometimes report estimates of a and not of g∞.

The geometric expectation has meaning only for long horizons. Trends for shorter horizons could bebased on a, the expected short rate, or on the expected spot rate

gN := E(AN ) =[E(A1/N

1 )]N

.

Obviously g1 = a, and the so-called Liapunov inequality asserts that gN < a if N ≥ 2. For the binomialtree, gN = (pu1/N + qd1/N )N .

The quantity gN fits naturally with the geometric expectation g∞ because limN→∞ gN = g∞. Thisfollows because if N is large then

A1/N = exp(N−1 log A) ≈ 1 + N−1 log A,

and hencegN ≈ [1 + N−1E(log A)]N → exp(E(log A)) = g∞.

Some practice exercises ask you to confirm this in specific cases. Thus gN interpolates the end valuesg1 = a and g∞, and hence gN

N = [E(AN )]N can be taken as a reasonable measure of anticipated horizon-Ngrowth.

The quantities gN and g∞ are often quoted as annualized rates ρN := gN − 1 and ρ∞ := g∞ − 1,expressed as percentages.

7

Example 3.6.2. The lognormal model. The generic accumulation factor A = eµl+σlZ has theLN(µl, σl) law and its geometric mean is g∞ = Eg(A) = eE(µl+σlZ) = eµl = me. Hence the long-termtrend of accumulation is eNµl , and not aN = eN(µl+σ2

l /2). You can see that the volatility contributes aninflationary factor eσ2

l /2 > 1 which may give a misleading impression of good investment prospects whenin fact the real trend is not so large. In fact if µl + σ2

l /2 > 0 and µl < 0 the real trend is downwards,although the mean trend is increasing.

For shorter term investment you should calculate

gN =[E(eµl/N+(σl/N)Z

)]N= eµl+σ2

l /2N = g∞eσ2l /2N . (11)

It’s obvious that gN → g∞ = eµl in this case. Moreover, according to this measure the horizon-N meanaccumulation is gN

N = eµlN+σ2l /2, a value lying between gN

∞ and aN .

3.7. Parameter estimation. The parameter values in binomial and lognormal models must be es-timated from data. A typical situation is that time series of stock prices Sn : n = 0, . . . , N arepublished, and successive divisions yields the corresponding values of observed one-period accumulationfactors, an = Sn/Sn−1 for n = 1, . . . , N . The sample mean a of these factors usually is quoted as theaverage return r := a− 1, expressed as a percentage. Similarly the accumulation variance s2 = V ar(A)is estimated as the corresponding sample variance s2, and the sample standard variation also is quotedas a percentage.

How do we use these to estimate the parameters of a return distribution? One approach for thebinomial tree is covered by a practice exercise, and further consideration will be given in subsequentchapters. Here I consider the LN(µl, σl) law.

Let λ = 1 + s2/a2 = 1 + s2/(1 + r)2. Using (6a), it follows that λ = a2/a2 = eσ2l , i.e., σ2

l = log λ.Estimation by the method of moments exploits relations between moments and other parameters bysubstituting estimates of moments to obtain estimates of these other parameters. Thus substitutingr = a− 1 and s into the above formula for λ yields the estimator

σ2l := log λ = log

[1 +

s2

(1 + r)2

]. (12a)

Substitution into the first member of (6a) yields the estimator

µl = log(1 + r)− 12 σ

2l . (12b)

These expressions can be substituted into expressions for g∞ and gN to yield estimators

g∞ = eµl =(1 + r)2√

(1 + r)2 + s2=

a2

√a2 + s2

, (12c)

and from (11),

gN = g∞eσ2l /2N =

[(1 + r)2]1−1/2N

[(1 + r)2 + σ2l ]

12 (1−1/N)

. (12d)

Method of moments estimators usually don’t have optimal properties, but they are consistent and asymp-totically normal. This method of estimation is much used in econometric practice.

The data in Table 3 comes from an annual compilation published by Ibbotson & Associates reportingcomposite indices as follows:

1. Standard & Poors 500 Stock Composite Index;

8

2. Small-company stocks (NYSE);

3. Long-term high-grade corporate bonds (Salomon Bros.);

4. Long-term (20 year) U.S. Government bonds.

Each of these series is complete for 75 years, 1926 – 2000. The period of time covered includes the Great

Index r s ρ75 ρ75 ρ∞

1 12.98 20.17 11.05 11.245 11.2212 17.3 33.4 12.4 12.874 12.8163 6.0 8.7 5.7 5.650 5.6454 5.7 9.4 5.3 5.29 5.284

Table 3: The left-hand panel shows parameters estimated directly from index series, and quoted in theIbbotson compilation. The right-hand panel shows continuously compounded spot rates estimated forthe lognormal law using (11) and (12). Here ρN = gN − 1, expressed as a percentage.

Depression, post-war booms and more recent booms and busts. This is manifested by the high valuesof s for the stock indices; they are very volatile. In addition, they show a high historical growth rate.On the other hand the bond indices are much more staid. The AGM inequality implies that ρ75/r < 1.However this ratio is rather smaller for the stock indices than for the bond indices. This illustrates wellthe influence of volatility. Note that the model-based estimates of the mean spot rate g75 are reasonablyclose to the empirical estimates. The relative differences are smaller for the less volatile index series.This turns out to be the case for other model distributions, some of which have very different qualitativecharacteristics to the lognormal. We can conclude that estimates of mean spot rates predicted by fittedmodels are fairly insensitive to the form of model being fitted. This occurs because mean returns areclose to unity and the corresponding standard deviations are small. Consequently, although there maybe theoretical reasons for preferring one class of models over another, the numerical outcomes fromestimation and prediction generally show little difference. For more on this see the

Reference. La Grandville, O., Pakes, A.G., & Tricot, C. (2002) Random rates of growth: Introducingthe expo-normal distribution. Applied Stochastic Models in Business & Industry 18, 23–51.

3.8. Fluctuations. Let’s assume that the generic growth factor A ∼ LN(µl, σl) and look more closelyat the behaviour of the cash level CN as N increases. This case turns out to give insight into what happensin general. The definition of the lognormal law allows us to write An = eµl+σlZn where the Zn’s areindependent and standard normal. Since Z1+· · ·+ZN ∼ N(0, N), we see that CN/C0 ∼ LN(µlN,σl

√N).

In fact the representation of the An gives the exact expression

CN/C0 = eµlN+σl

∑N

n=1Zn .

The exponent on the right-hand side has a linear trend µlN which increases or decreases with N accordingto the sign of µl. Superimposed on this trend is a normally distributed fluctuation oscillating with anamplitude which grows a little faster than

√N (in fact, asymptotically in proportion to

√N log log N).

These fluctuations are what gives life to financial trading, sometimes giving rapid local rises and falls

9

in value around a slowly changing linear trend. (Slowly changing because µl ≈ 0.) Note that if µl 6= 0then the linear trend will dominate in the long run. So if µl < 0 then CN

a.s.→ 0, even if the volatility islarge enough to guarantee E(CN ) = eµlN+ 1

2 σ2l N → ∞. This is a paradox which has been noted in other

contexts, especially in connection with population growth.The fact that this behaviour for lognormal returns holds in general is a consequence of the central

limit theorem (CLT). At this point we need a formal definition of a concept intrinsic to approximationof distributions by others, e.g., approximation of Bin(N, p) by a normal or Poisson law when N is large.

Definition 3.8.1. Convergence in law. Suppose V, V1, V2, . . . are random variables with correspond-ing distribution functions F (x), F1(x), F2(x), . . .. We say that the sequence V1, V2, . . . converges in lawto V , written VN

L→ V , if

limN→∞

FN (x) = F (x) for each number x where F (x) is continuous. (CL)

(A common alternative term is ‘converges in distribution’.)Pay attention to the content of this definition. It actually says nothing about convergence of the VN ’s;

it is an assertion about behaviour of their DF’s. If the limit DF F (x) has a jump at x, then the definitionmakes no demand on what happens there: The numbers FN (x) may or may not converge.

Suppose for example that VN ∼ Bin(N, p). If p depends on N , p = pN , and NpN → λ > 0, then thePoisson approximation to the binomial actually asserts that Vn

L→ V where V ∼ Poi(λ). Here the limitDF, that of the Poi(λ) law, has jumps at x = 0, 1, . . . and it is continuous for all other x, and hence (CL)holds for these x. (Convergence holds for all x in this particular case.)

On the other hand, for the normal approximation to the binomial, the VN don’t converge in law; in-stead their standardized versions V ′N = (VN−Np)/

√Npq

L→ Z, where Z ∼ N(0, 1), i.e. limN→∞ P (V ′N ≤x) = Φ(x) for all real x. Here the limit DF is the standard normal DF which is everywhere continuous.

This last result is a special case of the standard CLT whose final form seems to be due to P. Levy, ( inthe 1920’s), although CLT’s have a history going back to 1850. In fact, the normal approximation of thebinomial law is a central limit result, and it goes back to the 1720/30’s (De Moivre-Laplace theorem).

Theorem 3.8.1. Levy’s CLT. If X1, X2, . . . are iid with finite mean µl and variance σ2l , and WN =∑N

j=1 Xj , thenWN − µlN

σl

√N

L→ Z ∼ N(0, 1). (13)

Another way of writing (13) is√

N(WN − µl)L→ σlZ. Note that the CLT makes no assumption about

whether the Xn have a density or not, and if they do have a density, it makes no assertion as to whetherthe density functions of the left-hand side of (13) converge, or not. In turns out that no statement canbe made without imposing further conditions.

We need just one more preliminary result.

Theorem 3.8.2. The continuous mapping theorem. Let VN be a sequence of random variablessuch that VN

L→ V , and let g(·) be a continuous function. Then g(VN ) L→ g(V ).

Refering now to (1) we observe that[(CN/C0)1/Ne−µl

]√N

= e√

N(W N−µl) L→ eσlZ ∼ LN(0, σl),

10

and we have used the continuous mapping theorem with g(x) = ex. So if N is large enough (N ≥ 20?)for the CLT to give a good normal approximation then

CN/C0∼= LN(µlN,σl

√N),

as asserted above.If the mean log-return µl = 0, if 0 < ε 1 is fixed, and if N is so large that log ε/N ≈ 0, then

P (CN/C0 ≤ ε) ≈ P (eσl

√NZ ≤ ε) = Φ

(log ε

σl

√N

)≈ 1

2 .

Similarly P (CN/C0 > ε−1) ≈ 12 . Thus with large probability the N -horizon accumulation CN/C0 is

either very large or very small, i.e., as N → ∞ the cash level CN oscillates between very small andincreasingly large values, and it makes rapid transitions between small and large values.

3.9. The Kelly system for optimal investment. Suppose that you have an amount c = C0 at timen = 0 to invest and that per-period returns are i.i.d. random variables X1, X2, . . ., i.e., one unit investedat time n grows to An+1 = 1 + Xn+1 ≥ 0 at time n + 1. The An should be non-negative, so I assumethat P (Xn ≥ −1) = 1.

The question is how to devise an investment strategy which, in some sense, maximizes your return.Possible criteria could be (i) To maximize the probability of attaining a prescribed monetary goal, e.g.,1000× c. Another could be (ii) To maximize your long-run expected cash level.

The above model embraces casino betting games. For example, betting on a colour in roulette ismodelled by assuming that P (Xn = ±1) = 1, specifically that p := P (An = 2) and q = 1−p = P (An = 0).Usually p ≤ 1

2 , in which case there is a very general theorem which asserts that if you must play, thenyour optimal strategy is to be bet all you have or what is requred to reach you goal, i.e., bold play isoptimal! The intuitive idea is that this strategy minimizes the number of rounds you need to play, thusminimizing the chance of losing to the casino.

The Kelly system seeks to build in protection against sudden-death total loss by specifying a propor-tion φ ∈ [0, 1] of how much of your cash to invest each time. Specifically, if you have Cn at time n, thenyou invest φCn and keep back (1− φ)Cn. Thus

Cn+1 = (1− φ)Cn + φCn(1 + Xn) = Cn(1 + φXn+1), (n = 0, 1, . . .).

Iteration yields

CN = cN∏

n=1

(1 + φXn). (k1)

This has the same sort of product form which defines CN in §1.If φ = 0, then you never invest. If φ = 1, then you invest boldly, and your fortune vanishes forever at

the first time n that Xn = −1. Let q = P (Xn = −1) and assume that q < 1 (since otherwise investmentsalways have a zero return). If you invest boldly, then

P (Cn > 0) = P (∩ni=1Xi > −1) = (P (X1 > −1))n = (1− q)n.

If q > 0, then∑

n≥1 P (Cn > 0) = (1 − q)/q < ∞. It follows from the first Borel-Cantelli lemma (see§10) that P (Cn > 0 i.o.) = 0, i.e., eventual ruin is certain. This shows that the long-term investor shouldchoose a proportion φ < 1.

11

One version of Criterion (ii) is choosing φ to maximize

E(Cn) = c[E(1 + φX1)]n = c(1 + φµ)n,

where µ = E(X1). So, no matter what the value of n, this quantity is maximized by maximizing φµ.If µ ≤ 0, the maximum occurs at φ = 0. In other words, if the investment (or gambling game) is notfavourable, then don’t invest.

If µ > 0, the investment is favourable, then choose φ = 1. But we have seen that this policy resultsin eventual ruin for the long-term investor (or persistent gambler).

The Kelly criterion is to choose φ to maximize the geometric mean Eg(Cn). The original rationale forthis policy was based on information theory applied to transmission rates along a noisy communicationchannel.4 But you know now that the long-term behaviour of products like (k1) are controlled by

Eg(1 + φX1) = exp[E(log(1 + φX1))],

and not by E(1 + φX1). So the Kelly procedure is equivalent to maximizing

g(φ) = [E(log(1 + φX1))].

Observe that g(0) = 0 and that

g(1) = [E(log(1 + X1))] = q × log(1− 1) + [E(log(1 + φX1));X1 > −1] = −∞.

Next,

g′(φ) = E

[X1

1 + φX1

],

so g′(0) = E(X1) = µ. Finally,

g′′(φ) = −E

[(X1

1 + φX1

)2]

< 0.

Hence g(φ) is concave in the interval [0, 1]. If µ ≤ 0 then g is decreasing from zero to −∞ in [0, 1], andhence its value is maximized at φ = 0. So the Kelly strategy says don’t invest (or gamble).

If µ > 0, then g has a unique maximum at a value φc ∈ (0, 1). In addition, there is a numberφz ∈ (φc, 1) such that g(φz) = 0. Hence g takes positive values if 0 < φ < φz, so choosing any investmentfraction in this interval ensures that CN

a.s.→ ∞. Choosing φ = φc gives the fastest long-term growth rate.Choosing φ in (φz, 1) ensures a long-term diminishing fortune.

It can be shown that choosing φ = φc beats any other strategy in the sense that CN (K)/CN (A) a.s.→ ∞,where the numerator is the cash level at time N using the optimal Kelly strategy, and the denominator isthe cash level using an alternative strategy. In addition, the Kelly strategy compared to any alternativeminimizes the expected time to reach any pre-assigned goal. Unfortunately, it usually is not easy todetermine φc and φz

4Kelly envisaged a dedicated channel linking a horse racing course to an off-course betting shop. The idea was that

someone at the race course would use the dedicated channel to call through results to a mate at the betting shop, and this

knowledge would arrive before the same results reached the betting shop operators via telephone. Thus the mate could

place bets on known race outcomes. But noise contamination in the dedicated line might distort the reported outcomes, so

he should not pursue a bold betting strategy. Kelly proved that his strategy is best in several respects.

12

Example 3.9.1. For the above gambling scenario, assume that p > 12 . Clearly

g(φ) = q log(1− φ) + p log(1 + φ) & g′(φ) =p− q − φ

1− φ.

Hence φc = p− q = 2p−1. Computing φz requires numerical solution of a non-linear equation. However,it turns out that the optimal geometric expected growth factor is

Eg(1 + φcX1) = eg(φc) = 2ppqq.

Optimal growth factors equal 1.005, 1.020 and 1.445 if p = 0.55, 0.6 and 0.9, respectively. Thus thegrowth rate may be optimal, but it is very small unless the game is very favourable.

If P (X1 = −1) = 0, then P (Cn > 0 for all n) = 1. However it will be the case that CNa.s.→ 0 if

µ < 0. If µ > 0, the main features of the Kelly strategy subsist. The principle change that it is possible(even likely) that g(1) > −∞. If its value is negative, then we still have a unique maximizing proportionin (0, 1), as above. But it can be that g(1) > 0, in which case g increases throughout the interval, andmaybe for values of φ > 1. If so, the investor should play even more boldly than the bold-play policy; heshould borrow to invest more than he owns in order to obtain the greatest possible geometric return.

One final comment. Several economists, especially Paul Samuelson, have argued strongly against theKelly system. Their point seems to be that even admitting that it is optimal in the long term, there isa positive probability that Cn can attain values small enough to effectively wipe out the investor beforerealizing the long-term rewards. This leads them to argue against criteria involving long-term outcomes.For more on the subject see the following and the references listed therein.

W. Poundstone, (2005). Fortune’s formula : the untold story of the scientific betting system that beat thecasinos and Wall Street. (This is in the Science Library; a very entertaining read.)L.M. Rotando & E.O. Thorp, (1992). The Kelly criterion and the stock market. The American Mathe-matical Monthly 99, 922–931. (Available on-line through the library catalogue.)

3.10. Optimal consumption. This is a more complex example of the consequences of the SLLN.Suppose an individual (rational economic wo/man) owns capital Cn−1 at time n−1 and uses Kn−1 ≤ Cn−1

to buy consumables. The remaining Cn−1 −Kn−1 is invested, resulting in capital at time n,

Cn = (Cn−1 −Kn−1)An (n = 1, 2, . . .),

where the An are iid growth factors.Economists suppose that individuals make decisions on the basis of computing values of a utility

function U(κ) which always is concave, increasing and satisfies U ′(0+) = ∞. The idea is that consumptionκ yields an amount of ‘pleasure’ U(κ), and that increasing consumption yields a decreasing incrementalgain in utility (i.e., benefit or pleasure).

Now let 0 < β < 1 be a discount factor, denote the consumption sequence by ~K = K0,K1, . . ., andthe discounted (or PV) total utility by

U( ~K) =∑n≥0

U(Kn)βn.

The question facing the individual is how to choose a consumption sequence which maximises the expecteddiscounted utility E[U( ~K)], and then, how does the optimal capital sequence Cn behave in the longrun?

13

A common choice of utility function is

U(κ) =κ1−α

1− α, 0 < α < 1.

It turns out that the optimal policy is to consume in proportion to available capital, i.e. choose

Kn = λCn

for some 0 < λ < 1. In addition, if E(A1−αn ) < 1/β then

λ = 1−[βE(A1−α

n )]1/α

.

Also,Cn = (1− λ)AnCn−1.

This has the same form as the cash-level sequences examined above but with (1 − λ)An replacing An.Using results based on the SLLN it follows that the long-term behaviour of capital is that

Cna.s.→

0 if E(log An) + log(1− λ) < 0,∞ if E(log An) + log(1− λ) > 0,oscillates if E(log An) + log(1− λ) = 0.

This result can be refined by using the CLT.

Reference. D. Levhari & ?. Srinivasan (1968), Optimal savings under uncertainty. Rev. Econ. Studies,35, 153–163.

3.11. More on the SLLN. This material is ancillary and not directly examinable. However, youshould read down to and including Example 3.11.1.

Let ε > 0 and define events DN (ε) = |W N − µl| > ε. Thus DN (ε) occurs if the average log-return deviates by more

than ε from its expected value. As mentioned above, the WLLN is equivalent to the statement that limN→∞ P (DN (ε)) = 0

for each ε > 0.

The SLLN would fail if, for each ε > 0, there was a positive probability that events Dn(ε) occurred for infinitely many

values of the subscripts n. I will write this possible outcome as P (Dn(ε) i.o.) > 0. So proving the SLLN requires only

that P (Dn(ε) i.o.) = 0 for all ε > 0. Finding a condition which ensures this requires an exact definition of the notion of

‘occurring i.o.’.

So let B1, B2, . . . be a countable collection of events in a sample space S, and let An = ∪j≥nBj . Thus An is the event

that at least one Bj occurs for some j ≥ n, i.e. if the elementary event ω ∈ An then ω ∈ Bj for some j ≥ n. Now let

A = ∩n≥1An, i.e. for each n there exists j ≥ n such that Bj occurs. In other words, infinitely many of the B’s occur. We

say that A is the event that “the Bn occur infinitely often” and we express the probability of this as

P (Bn i.o.) := P (A) = P (∩n≥1 ∪j≥n Bj).

You obtain a simple estimate of the left-hand side by observing that for any n ≥ 1,

P (Bn i.o.) ≤ P (∪j≥nBj) ≤∑j≥n

P (Bj), (7)

where the second inequality comes from Boole’s inequality. If the infinite sum is finite for some n, then this is true for all n

and hence the right-hand side tends to zero as n →∞. Since the left-hand side doesn’t depend on n (despite the notation)

its value must be zero. Thus we have the following handy result.

First Borel-Cantelli (B-C) lemma. If∑

n≥1P (Bn) < ∞ then P (Bn i.o.) = 0.

Remark: If the Bn’s are independent then P (Bn i.o.) = 0 or = 1, and the latter occurs if∑

n≥1P (Bn) = ∞. This

assertion is the second B-C lemma, and it’s harder to prove.

14

Example 3.11.1. Heavily biased coins. Suppose you play ‘one-up’ using an infinite collection of biased coins,

tossing each coin once only. You win each time a head shows up. Successive outcomes are independent, but this doesn’t

matter here. Assume that the nth coin shows heads with probability 1/n2. As this probability gets smaller the longer

you toss, you expect that heads will occur more rarely as the game progresses, but you can always look forward to a head

sometime further on. Right? Wrong!!

Let Bn denote the event of heads on the nth toss, so P (Bn) = 1/n2. Since∑

n≥1P (Bn) < ∞ (the sum equals π2/6),

the First B-C lemma says that P (Bn i.o.) = 0, i.e., almost surely you score only finitely many heads during the infinitely

long game. This is true whether tosses are independent or dependent, so no betting strategy will alter this outcome. This

seems to be a paradox because at any point in the game there is a positive probability of heads on the next toss. Thus the

coins act collectively by ‘turning off’ their heads production, even if outcomes are independent!

Example 3.11.2 Random perpetuity. Refer to Example 2.6.2. Assume a perpetuity pays random iid amounts

Yn > 0 at times n = 1, 2, . . . with a constant discount factor δ = (1 + r)−1 < 1, i.e. γn = (1 + r)n. Its present value is

V∞ =∑n≥1

Ynδn. (8)

The right-hand side is a random power series, and you need conditions which ensure that it is finite. Can the series diverge

with some positive probability? Does convergence depend on the value of δ? The answer is that P (V∞ < ∞) either is

unity, or it is zero (Kolmogorov’s zero-one law, again), that either is possible, and whichever occurs is independent of the

value of δ. The following result gives one half of a fundamental result about random power series. The notation x+ means

the positive part of x, defined to be x if x > 0, and zero otherwise.

Theorem 3.11.1. If E(log+ Y1) < ∞ then P (V∞ < ∞) = 1.

Remarks. There is a converse statement asserting that if the series converges for some δ < 1 then the above log-moment

must be finite. This moment condition is very mild. Suppose, for example, that Y1 has a Pareto law; its density function is

f(y) =

αy−α−1 if y ≥ 1,

0 if y < 1,

where α > 0 is a parameter. This is a probability law having a very long tail, meaning that P (Y1 > y) can decay very

slowly as y increases, so slowly that E(Y1) = ∞ if α ≤ 1. Integration by parts shows the log-moment is

E(log+ Y1) = α

∫ ∞

1

(log y)y−α−1dy =

∫ ∞

1

y−α−1dy = α−1 < ∞.

Proof. Let ε > 0 and suppose for now that

P (Yn > enε i.o.) = 0. (9)

This means that there is an integer valued random variable n′ such that a.s. Yn ≤ enε if n ≥ n′. Choose ε so

small that b := δeε < 1. So if n ≥ n′ then a.s. Ynδn ≤ enεδn = bn, and since∑

n≥0bn = (1− b)−1 < ∞, we

conclude that V∞ < ∞, as asserted.

We complete the proof by using the B-C lemma to prove (9):∑n≥1

P (Yn ≥ enε) =∑n≥1

P (Y1 ≥ enε) =∑n≥1

P (log Y1 ≥ nε)

< P (log Y1 > ε) +

∫ ∞

1

P (log+ Y1 > v)dv.

The last inequality is a consequence of the integral test for convergence of a series. To see that the integral is

finite (and hence that (9) holds), we use the fact that if the random variable V ≥ 0 then E(V ) =∫∞0

P (V >

v)dv. Take V = log+ Y1. #

Returning to the events Dn(ε), it follows from the first B-C lemma that P (Dn(ε) i.o.) = 0 if

∞∑n=1

P (Dn(ε)) < ∞.

If this condition holds for all ε > 0, then W N is said to converge completely to µl. Thus complete convergence implies

almost sure convergence (and the converse does not hold in general). The following proof requires one more item of fact.

Markov’s inequality asserts that if V is a positive-valued random variable with finite expectation µV , then for any δ > 0,

P (A > δ) ≤ µV /δ. This is proved in the same (simpler) way as you prove Chebyshev’s inequality. In fact this latter

inequality is just Markov’s inequality with V = (X − E(X))2.

15

Cantelli’s LLN. If E(|Xn|4

)< ∞, then W n converges completely to µl.

Proof. The idea is very simple, just a small variation on Chebyshev’s inequality. Observe that

P (Dn(ε)) = P (|Wn − nµl | > nε) = P (|Wn − nµl|4 > n4ε4) ≤E[(Wn − nµl)

4]

n4ε4.

The inequality results from an application of Markov’s inequality. The main task in the proof is computing the fourth-order

moment in the numerator. Let Yn = Xn − µl. Then E(Yn) = 0, V ar(Yn) = σ2l , and m4 = E(Y 4

n ) < ∞ follows from the

fourth-moment assumption of the theorem. Finally, let Tn = Wn − nµl = Y1 + · · ·+ Yn.

Compute T 4n as folows:

T 4n = [Y 2

1 + Y 22 + · · ·+ Y 2

n + 2Y1Y2 + · · ·+ 2Y1Yn

+ 2Y2Y3 + · · ·+ 2Y2Yn

...

+ 2Yn−1Yn]2

= Y 41 + · · ·+ Y 4

n + 6Y 21 Y 2

2 + · · ·+ 6Y 21 Y 2

n

+ 6Y 22 Y 2

3 + · · ·+ 6Y 22 Y 2

n

...

+ 6Y 2n−1Y 2

n

+∑

YhYiYjYk,

where at least one subscript in each term of the final sum differs from all the other subscripts, and the corresponding Y factor

is independent of the other factors. Since E(Yn) ≡ 0, the expectation of the final sum is zero. There are n fourth-power

terms, and 12n(n− 1) terms of the form 6Y 2

i Y 2j (i 6= j) and their expectation is 6σ2

l × σ2l = 6σ4

l . Hence

E(T 4n) = m4n + 6σ4

l × 12n(n− 1) ≤ const.n2.

It follows that ∑n≥1

P (Dn(ε)) ≤∑n≥1

const.n2

n4ε4= const.

∑n≥1

n−2 < ∞.

This establishes the asserted complete convergence, and in particular, that W na.s.→ µl.

16

Ch3.Interest.random

Documents

Transcript of Ch3.Interest.random