Statistical Theory of the Continuous Double Auction · Statistical theory of the continuous double...

Statistical Theory of theContinuous Double AuctionEric SmithJ. Doyne FarmerLászló GillemotSupriya Krishnamurthy

SFI WORKING PAPER: 2002-10-057

SFI Working Papers contain accounts of scientific work of the author(s) and do not necessarily represent theviews of the Santa Fe Institute. We accept papers intended for publication in peer-reviewed journals or proceedings volumes, but not papers that have already appeared in print. Except for papers by our externalfaculty, papers must be based on work done at SFI, inspired by an invited visit to or collaboration at SFI, orfunded by an SFI grant.©NOTICE: This working paper is included by permission of the contributing author(s) as a means to ensuretimely distribution of the scholarly and technical work on a non-commercial basis. Copyright and all rightstherein are maintained by the author(s). It is understood that all persons copying this information willadhere to the terms and constraints invoked by each author's copyright. These works may be reposted onlywith the explicit permission of the copyright holder.www.santafe.edu

SANTA FE INSTITUTE

Statistical theory of the continuous double auction

Eric Smith∗,1 J. Doyne Farmer†,1 Laszlo Gillemot,1 and Supriya Krishnamurthy1

1Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe NM 87501(Dated: October 20, 2002)

Most modern financial markets use a continuous double auction mechanism to store and matchorders and facilitate trading. In this paper we develop a microscopic dynamical statistical model forthe continuous double auction under the assumption of IID random order flow, and analyze it usingsimulation, dimensional analysis, and theoretical tools based on mean field approximations. Themodel makes testable predictions for basic properties of markets, such as price volatility, the depthof stored supply and demand vs. price, the bid-ask spread, the price impact function, and the timeand probability of filling orders. These predictions are based on properties of order flow and thelimit order book, such as share volume of market and limit orders, cancellations, typical order size,and tick size. Because these quantities can all be measured directly there are no free parameters.We show that the order size, which can be cast as a nondimensional granularity parameter, is inmost cases a more significant determinant of market behavior than tick size. We also provide anexplanation for the observed highly concave nature of the price impact function. On a broaderlevel, this work suggests how stochastic models based on zero-intelligence agents may be useful toprobe the structure of market institutions. Like the model of perfect rationality, a stochastic-zerointelligence model can be used to make strong predictions based on a compact set of assumptions,even if these assumptions are not fully believable.

Contents

I. Introduction 2A. Motivation 2B. Background: The continuous double auction 3C. The model 3D. Summary of prior work 5

II. Overview of predictions of the model 5A. Dimensional analysis 5B. Varying the granularity parameter ε 8

1. Depth profile 92. Liquidity for market orders: The price

impact function 103. Spread 114. Volatility and price diffusion 125. Liquidity for limit orders: Probability and

time to fill. 13C. Varying tick size dp/pc 13

III. Theoretical analysis 14A. Summary of analytic methods 14B. Characterizing limit-order books: dual

coordinates 15C. Frames and marginals 16D. Factorization tests 17E. Comments on renormalized diffusion 18F. Master equations and mean-field

approximations 191. A number density master equation 192. Solution by generating functional 20

∗Corresponding author: [email protected]†McKinsey Professor

3. Screening of the market-order rate 20

4. Verifying the conservation laws 21

5. Self-consistent parametrization 21

6. Accounting for correlations 22

7. Generalizing the shift-induced source terms 22

G. A mean-field theory of order separationintervals: The Independent IntervalApproximation 24

1. Asymptotes and conservation rules 25

2. Direct simulation in interval coordinates 26

IV. Concluding remarks 28

A. Ongoing work on empirical validation 28

B. Future Enhancements 28

C. Comparison to standard models based onvaluation and information arrival 29

A. Relationship of Price impact to cumulative

depth 29

1. Moment expansion 30

2. Quantiles 31

B. Supporting calculations in density

coordinates 32

1. Generating functional at general bin width 32

a. Recovering the continuum limit for prices 32

2. Cataloging correlations 33

a. Getting the intercept right 34

b. Fokker-Planck expanding correlations 34

Acknowledgments 35

References 35

2

I. INTRODUCTION

This section provides background and motivation, adescription of the model, and some historical contextfor work in this area. Section II gives an overview ofthe phenomenology of the model, explaining how dimen-sional analysis applies in this context, and presenting asummary of numerical results. Section III develops ananalytic treatment of model, explaining some of the nu-merical findings of Section II. We conclude in Section IVwith a discussion of how the model may be enhanced tobring it closer to real-life markets, and some commentscomparing the approach taken here to standard modelsbased on information arrival and valuation.

A. Motivation

In this paper we analyze the continuous double auctiontrading mechanism under the assumption of random or-der flow, developing a model introduced in [1]. This anal-ysis produces quantitative predictions about the mostbasic properties of markets, such as volatility, depth ofstored supply and demand, the bid-ask spread, the priceimpact, and probability and time to fill. These predic-tions are based on the rate at which orders flow into themarket, and other parameters of the market, such as or-der size and tick size. The predictions are falsifiable withno free parameters. This extends the original randomwalk model of Bachelier [2] by providing a basis for thediffusion rate of prices. The model also provides a possi-ble explanation for the highly concave nature of the priceimpact function. Even though some of the assumptionsof the model are too simple to be literally true, the modelprovides a foundation onto which more realistic assump-tions may easily be added.

The model demonstrates the importance of financialinstitutions in setting prices, and how solving a necessaryeconomic function such as providing liquidity can haveunanticipated side-effects. In a world of imperfect ra-tionality and imperfect information, the task of demandstorage necessarily causes persistence. Under perfect ra-tionality all traders would instantly update their orderswith the arrival of each piece of new information, butthis is clearly not true for real markets. The limit orderbook, which is the queue used for storing unexecuted or-ders, has long memory when there are persistent orders.It can be regarded as a device for storing supply and de-mand, somewhat like a capacitor is a device for storingcharge. We show that even under completely random IIDorder flow, the price process displays anomalous diffusionand interesting temporal structure. The converse is alsointeresting: For prices to be effectively random, incom-ing order flow must be non-random, in just the right wayto compensate for the persistence. (See the remarks inSection IV C.)

This work is also of interest from a fundamental pointof view because it suggests an alternative approach to

doing economics. The assumption of perfect rational-ity has been popular in economics because it provides aparsimonious model that makes strong predictions. Inthe spirit of Gode and Sunder [3], we show that theopposite extreme of zero intelligence random behaviorprovides another reference model that also makes verystrong predictions. Like perfect rationality, zero intelli-gence is an extreme simplification that is obviously notliterally true. But as we show here, it provides a use-ful tool for probing the behavior of financial institutions.The resulting model may easily be extended by introduc-ing simple boundedly rational behaviors. We also differfrom standard treatments in that we do not attempt tounderstand the properties of prices from fundamental as-sumptions about utility. Rather, we split the problem intwo. We attempt to understand how prices depend onorder flow rates, leaving the problem of what determinesthese order flow rates for the future.

One of our main results concerns the average priceimpact function. The liquidity for executing a marketorder can be characterized by a price impact function∆p = φ(ω, τ, t). ∆p is the shift in the logarithm of theprice at time t + τ caused by a market order of size ωplaced at time t. Understanding price impact is impor-tant for practical reasons such as minimizing transactioncosts, and also because it is closely related to an excessdemand function1, providing a natural starting point fortheories of statistical or dynamical properties of markets[4, 5]. A naive argument predicts that the price impactφ(ω) should increase at least linearly. This argumentgoes as follows: Fractional price changes should not de-pend on the scale of price. Suppose buying a single shareraises the price by a factor k > 1. If k is constant, buyingω shares in succession should raise it by kω. Thus, if buy-ing ω shares all at once affects the price at least as muchas buying them one at a time, the ratio of prices beforeand after impact should increase at least exponentially.Taking logarithms implies that the price impact as wehave defined it above should increase at least linearly.2

In contrast, from empirical studies φ(ω) for buy ordersappears to be concave [6–11]. Lillo et al. have shownfor that for stocks in the NYSE the concave behavior ofthe price impact is quite consistent across different stocks[11]. Our model produces concave price impact functionsthat are in qualitative agreement with these results.

Our work also demonstrates the value of physics tech-niques for economic problems. Our analysis makes exten-sive use of dimensional analysis, the solution of a master

1 In financial models it is common to define an excess demandfunction as demand minus supply; when the context is clear themodifier “excess” is dropped, so that demand refers to both sup-ply and demand.

2 This has practical implications. It is common practice to breakup orders in order to reduce losses due to market impact. Witha sufficiently concave market impact function, in contrast, it ischeaper to execute an order all at once.

3

equation through a generating functional, and a meanfield approach that is commonly used to analyze non-equilibrium reaction-diffusion systems and evaporation-deposition problems.

B. Background: The continuous double auction

Most modern financial markets operate continuously.The mismatch between buyers and sellers that typicallyexists at any given instant is solved via an order-basedmarket with two basic kinds of orders. Impatient traderssubmit market orders, which are requests to buy or sella given number of shares immediately at the best avail-able price. More patient traders submit limit orders, orquotes which also state a limit price, corresponding tothe worst allowable price for the transaction. (Note thatthe word “quote” can be used either to refer to the limitprice or to the limit order itself.) Limit orders often failto result in an immediate transaction, and are stored ina queue called the limit order book. Buy limit ordersare called bids, and sell limit orders are called offers orasks. We use the logarithmic price a(t) to denote the po-sition of the best (lowest) offer and b(t) for the positionthe best (highest) bid. These are also called the insidequotes. There is typically a non-zero price gap betweenthem, called the spread s(t) = a(t) − b(t). Prices arenot continuous, but rather have discrete quanta calledticks. Throughout this paper, all prices will be expressedas logarithms, and to avoid endless repetition, the wordprice will mean the logarithm of the price. The minimuminterval that prices change on is the tick size dp (also de-fined on a logarithmic scale; note this is not true for realmarkets). Note that dp is not necessarily infinitesimal.

As market orders arrive they are matched against limitorders of the opposite sign in order of first price andthen arrival time, as shown in Fig. 1. Because orders areplaced for varying numbers of shares, matching is notnecessarily one-to-one. For example, suppose the bestoffer is for 200 shares at $60 and the the next best is for300 shares at $60.25; a buy market order for 250 sharesbuys 200 shares at $60 and 50 shares at $60.25, movingthe best offer a(t) from $60 to $60.25. A high densityof limit orders per price results in high liquidity for mar-ket orders, i.e., it decreases the price movement when amarket order is placed. Let n(p, t) be the stored densityof limit order volume at price p, which we will call thedepth profile of the limit order book at any given timet. The total stored limit order volume at price level pis n(p, t)dp. For unit order size the shift in the best aska(t) produced by a buy market order is given by solvingthe equation

ω =

p′∑

p=a(t)

n(p, t)dp (1)

for p′. The shift in the best ask p′ − a(t), where is theinstantaneous price impact for buy market orders. A

buy market orders

sell market orders

log price

shar

es

bidsoffers

spread

FIG. 1: A schematic illustration of the continuous double auc-tion mechanism and our model of it. Limit orders are storedin the limit order book. We adopt the arbitrary conventionthat buy orders are negative and sell orders are positive. Asa market order arrives, it has transactions with limit ordersof the opposite sign, in order of price (first) and time of ar-rival (second). The best quotes at prices a(t) or b(t) movewhenever an incoming market order has sufficient size to fullydeplete the stored volume at a(t) or b(t). Our model assumesthat market order arrival, limit order arrival, and limit ordercancellation follow a Poisson process. New offers (sell limitorders) can be placed at any price greater than the best bid,and are shown here as “raining down” on the price axis. Sim-ilarly, new bids (buy limit orders) can be placed at any priceless than the best offer. Bids and offers that fall inside thespread become the new best bids and offers. All prices in thismodel are logarithmic.

similar statement applies for sell market orders, wherethe price impact can be defined in terms of the shift inthe best bid. (Alternatively, it is also possible to definethe price impact in terms of the change in the midpointprice).

We will refer to a buy limit order whose limit priceis greater than the best ask, or a sell limit order whoselimit price is less than the best bid, as a crossing limitorder or marketable limit order. Such limit orders resultin immediate transactions, with at least part of the orderimmediately executed.

C. The model

This model introduced in reference [1], is designed tobe as analytically tractable as possible while capturingkey features of the continuous double auction. All theorder flows are modeled as Poisson processes. We as-sume that market orders arrive in chunks of σ shares, ata rate of µ shares per unit time. The market order maybe a ‘buy’ order or a ‘sell’ order with equal probability.(Thus the rate at which buy orders or sell orders arriveindividually is µ/2.) Limit orders arrive in chunks of σshares as well, at a rate α shares per unit price and perunit time for buy orders and also for sell orders. Offers

4

are placed with uniform probability at integer multiplesof a tick size dp in the range of price b(t) < p < ∞, andsimilarly for bids on −∞ < p < a(t). When a marketorder arrives it causes a transaction; under the assump-tion of constant order size, a buy market order removesan offer at price a(t), and if it was the last offer at thatprice, moves the best ask up to the next occupied pricetick. Similarly, a sell market order removes a bid at priceb(t), and if it is the last bid at that price, moves the bestbid down to the next occupied price tick. In addition,limit orders may also be removed spontaneously by be-ing canceled or by expiring, even without a transactionhaving taken place. We model this by letting them beremoved randomly with constant probability δ per unittime.

While the assumption of limit order placement overan infinite interval is clearly unrealistic, it provides atractable boundary condition for modeling the behav-ior of the limit order book near the midpoint pricem(t) = (a(t)+b(t))/2, which is the region of interest sinceit is where transactions occur. Limit orders far from themidpoint are usually canceled before they are executed(we demonstrate this later in Fig. 5), and so far fromthe midpoint, limit order arrival and cancellation have asteady state behavior characterized by a simple Poissondistribution. Although under the limit order placementprocess the total number of orders placed per unit timeis infinite, the order placement per unit price interval isbounded and thus the assumption of an infinite intervalcreates no problems. Indeed, it guarantees that there arealways an infinite number of limit orders of both signsstored in the book, so that the bid and ask are alwayswell-defined and the book never empties. (Under otherassumptions about limit order placement this is not nec-essarily true, as we later demonstrate in Fig. 30.) Weare also considering versions of the model involving morerealistic order placement functions; see the discussion inSection IV B.

In this model, to keep things simple, we are using theconceptual simplification of effective market orders andeffective limit orders. When a crossing limit order isplaced part of it may be executed immediately. The effectof this part on the price is indistinguishable from that ofa market order of the same size. Similarly, given thatthis market order has been placed, the remaining part isequivalent to a non-crossing limit order of the same size.Thus a crossing limit order can be modeled as an effec-tive market order followed by an effective (non-crossing)limit order.3 Working in terms of effective market andlimit orders affects data analysis: The effective marketorder arrival rate µ combines both pure market ordersand the immediately executed components of crossing

3 In assigning independently random distributions for the twoevents, our model neglects the correlation between market andlimit order arrival induced by crossing limit orders.

limit orders, and similarly the limit order arrival rate αcorresponds only to the components of limit orders thatare not executed immediately. This is consistent withthe boundary conditions for the order placement process,since an offer with p ≤ b(t) or a bid with p ≥ a(t) wouldresult in an immediate transaction, and thus would be ef-fectively the same as a market order. Defining the orderplacement process with these boundary conditions real-istically allows limit orders to be placed anywhere insidethe spread.

Another simplification of this model is the use of loga-rithmic prices, both for the order placement process andfor the tick size dp. This has the important advantagethat it ensures that prices are always positive. In realmarkets price ticks are linear, and the use of logarithmicprice ticks is an approximation that makes both the cal-culations and the simulation more convenient. We findthat the limit dp → 0, where tick size is irrelevant, isa good approximation for many purposes. We find thattick size is less important than other parameters of theproblem, which provides some justification for the ap-proximation of logarithmic price ticks.

Assuming a constant probability for cancellation isclearly ad hoc, but in simulations we find that otherassumptions with well-defined timescales, such as con-stant duration time, give similar results. For our analyticmodel we use a constant order size σ. In simulations wealso use variable order size, e.g. half-normal distributionswith standard deviation

√

π/2σ, which ensures that themean value remains σ. As long as these distributionshave thin tails, the differences do not qualitatively af-fect most of the results reported here, except in a triv-ial way. As discussed in Section IV B, decay processeswithout well-defined characteristic times and size distri-butions with power law tails give qualitatively differentresults and will be treated elsewhere.

Even though this model is simply defined, the timeevolution is not trivial. One can think of the dynamicsas being composed of three parts: (1) the buy marketorder/sell limit order interaction, which determines thebest ask; (2) the sell market order/buy limit order in-teraction, which determines the best bid; and (3) therandom cancellation process. Processes (1) and (2) de-termine each others’ boundary conditions. That is, pro-cess (1) determines the best ask, which sets the bound-ary condition for limit order placement in process (2),and process (2) determines the best bid, which deter-mines the boundary conditions for limit order placementin process (1). Thus processes (1) and (2) are stronglycoupled. It is this coupling that causes the bid and askto remain close to each other, and guarantees that thespread s(t) = a(t)− b(t) is a stationary random variable,even though the bid and ask are not. It is the coupling ofthese processes through their boundary conditions thatprovides the nonlinear feedback that makes the price pro-cess complex.

5

D. Summary of prior work

There are two independent lines of prior work, one inthe financial economics literature, and the other in thephysics literature. The models in the economics litera-ture are directed toward empirical analysis, and treat theorder process as static. In contrast, the models in thephysics literature are conceptual toy models, but theyallow the order process to react to changes in prices, andare thus fully dynamic. Our model bridges this gap. Thisis explained in more detail below.

The first model of this type that we are aware of wasdue to Mendelson [12], who modeled random order place-ment with periodic clearing. This was developed alongdifferent directions by Cohen et al. [13], who used tech-niques from queuing theory, but assumed only one pricelevel and addressed the issue of time priority at that level(motivated by the existence of a specialist who effectivelypinned prices to make them stationary). Domowitz andWang [14] and Bollerslev et al. [15] further developedthis to allow more general order placement processes thatdepend on prices, but without solving the full dynami-cal problem. This allows them to get a stationary solu-tion for prices. In contrast, in our model the prices thatemerge make a random walk, and so are much more re-alistic. In order to get a solution for the depth of theorder book we have to go into price coordinates that co-move with the random walk. Dealing with the feedbackbetween order placement and prices makes the problemmuch more difficult, but it is key for getting reasonableresults.

The models in the physics literature incorporate pricedynamics, but have tended to be conceptual toy modelsdesigned to understand the anomalous diffusion proper-ties of prices. This line of work begins with a paper byBak et al. [16] which was developed by Eliezer and Kogan[17] and by Tang [18]. They assume that limit orders areplaced at a fixed distance from the midpoint, and thatthe limit prices of these orders are then randomly shuf-fled until they result in transactions. It is the randomshuffling that causes price diffusion. This assumption,which we feel is unrealistic, was made to take advantageof the analogy to a standard reaction-diffusion model inthe physics literature. Maslov [19] introduced an alter-ative model that was solved analytically in the mean-fieldlimit by Slanina [20]. Each order is randomly chosen tobe either a buy or a sell, and either a limit order or a mar-ket order. If a limit order, it is randomly placed within afixed distance of the current price. This again gives rise toanomalous price diffusion. A model allowing limit orderswith Poisson order cancellation was proposed by Challetand Stinchcombe [21]. Iori and Chiarella [22] have nu-merically studied a model including fundamentalists andtechnical traders.

The model studied in this paper was introduced byDaniels et al. [1]. This adds to the literature by intro-ducing a model that treats the feedback between orderplacement and price movement, while having enough re-

alism so that the parameters can be tested against realdata. The prior models in the physics literature havetended to focus primarily on the anomalous diffusion ofprices. While interesting and important for refining riskcalculations, this is a second-order effect. In contrast,we focus on the first order effects of primary interest tomarket participants, such as the bid-ask spread, volatil-ity, depth profile, price impact, and the probability andtime to fill an order. We demonstrate how dimensionalanalysis becomes a useful tool in an economic setting,and develop mean field theories in a context that is morechallenging than that of the toy models of previous work.

Subsequent to reference [1], Bouchaud et al. [23]demonstrated that, under the assumption that prices exe-cute a random walk, by introducing an additional free pa-rameter they can derive a simple equation for the depthprofile. In this paper we show how to do this from firstprinciples without introducing a free parameter.

II. OVERVIEW OF PREDICTIONS OF THE

MODEL

In this section we give an overview of the phenomenol-ogy of the model. Because this model has five parame-ters, understanding all their effects would generally be acomplicated problem in and of itself. This task is greatlysimplified by the use of dimensional analysis, which re-duces the number of independent parameters from fiveto two. Thus, before we can even review the results, weneed to first explain how dimensional analysis applies inthis setting. One of the surprising aspects of this modelis that one can derive several powerful results using thesimple technique of dimensional analysis alone.

Unless otherwise mentioned the results presented inthis section are based on simulations. These results arecompared to theoretical predictions in Section III.

A. Dimensional analysis

Because dimensional analysis is not commonly usedin economics we first present a brief review. For moredetails see Bridgman [24].

Dimensional analysis is a technique that is commonlyused in physics and engineering to reduce the numberof independent degrees of freedom by taking advantageof the constraints imposed by dimensionality. For suf-ficiently constrained problems it can be used to guessthe answer to a problem without doing a full analysis.The idea is to write down all the factors that a givenphenomenon can depend on, and then find the combi-nation that has the correct dimensions. For example,consider the problem of the period of a pendulum: Theperiod T has dimensions of time. Obvious candidatesthat it might depend on are the mass of the bobm (whichhas units of mass), the length l (which has units of dis-tance), and the acceleration of gravity g (which has units

6

Parameter Description Dimensions

α limit order rate shares/(price time)µ market order rate shares/timeδ order cancellation rate 1/timedp tick size priceσ characteristic order size shares

TABLE I: The five parameters that characterize this model.α, µ, and δ are order flow rates, and dp and σ are discretenessparameters.

of distance/time2). There is only one way to combinethese to produce something with dimensions of time, i.e.T ∼

√

l/g. This determines the correct formula for theperiod of a pendulum up to a constant. Note that itmakes it clear that the period does not depend on themass, a result that is not obvious a priori. We werelucky in this problem because there were three param-eters and three dimensions, with a unique combinationof the parameters having the right dimensions; in generaldimensional analysis can only be used to reduce the num-ber of free parameters through the constraints imposedby their dimensions.

For this problem the three fundamental dimensions inthe model are shares, price, and time. Note that by price,we mean the logarithm of price; as long as we are consis-tent, this does not create problems with the dimensionalanalysis. There are five parameters: three rate constantsand two discreteness parameters. The order flow ratesare µ, the market order arrival rate, with dimensions ofshares per time; α, the limit order arrival rate per unitprice, with dimensions of shares per price per time; and δ,the rate of limit order decays, with dimensions of 1/time.These play a role similar to rate constants in physicalproblems. The two discreteness parameters are the pricetick size dp, with dimensions of price, and the order sizeσ, with dimensions of shares. This is summarized in ta-ble I.

Dimensional analysis can be used to reduce the num-ber of relevant parameters. Because there are five pa-rameters and three dimensions (price, shares, time), andbecause in this case the dimensionality of the parametersis sufficiently rich, the dimensional relationships reducethe degrees of freedom, so that all the properties of thelimit-order book can be described by functions of two pa-rameters. It is useful to construct these two parametersso that they are nondimensional.

We perform the dimensional reduction of the modelby guessing that the effect of the order flow rates is pri-mary to that of the discreteness parameters. This leadsus to construct nondimensional units based on the orderflow parameters alone, and take nondimensionalized ver-sions of the discreteness parameters as the independentparameters whose effects remain to be understood. Aswe will see, this is justified by the fact that many of theproperties of the model depend only weakly on the dis-creteness parameters. We can thus understand much ofthe richness of the phenomenology of the model through

Parameter Description Expression

Nc characteristic number of shares µ/2δpc characteristic price interval µ/2αtc characteristic time 1/δdp/pc nondimensional tick size 2αdp/µε nondimensional order size 2δσ/µ

TABLE II: Important characteristic scales and nondimen-sional quantities. We summarize the characteristic share size,price and times defined by the order flow rates, as well asthe two nondimensional scale parameters dp/pc and ε thatcharacterize the effect of finite tick size and order size. Di-mensional analysis makes it clear that all the properties of thelimit order book can be characterized in terms of functions ofthese two parameters.

dimensional analysis alone.There are three order flow rates and three fundamen-

tal dimensions. If we temporarily ignore the discretenessparameters, there are unique combinations of the orderflow rates with units of shares, price, and time. Thesedefine a characteristic number of shares Nc = µ/2δ, acharacteristic price interval pc = µ/2α, and a character-istic timescale tc = 1/δ. This is summarized in table II.The factors of two occur because we have defined themarket order rate for either a buy or a sell order to beµ/2. We can thus express everything in the model innondimensional terms by dividing by Nc, pc, or tc as ap-propriate, e.g. to measure shares in nondimensional unitsN = N/Nc, or to measure price in nondimensional unitsp = p/pc.

The value of using nondimensional units is illustratedin Fig. 2. Fig. 2(a) shows the average depth profile forthree different values of µ and δ with the other parame-ters held fixed. When we plot these results in dimensionalunits the results look quite different. However, when weplot them in terms of nondimensional units, as shown inFig. 2(b), the results are indistinguishable. As explainedbelow, because we have kept the nondimensional ordersize fixed, the collapse is perfect. Thus, the problem ofunderstanding the behavior of this model is reduced tostudying the effect of tick size and order size.

To understand the effect of tick size and order size it isuseful to do so in nondimensional terms. The nondimen-sional scale parameter based on tick size is constructed bydividing by the characteristic price, i.e. dp/pc = 2αdp/µ.The theoretical analysis and the simulations show thatthere is a sensible continuum limit as the tick size dp→ 0,in the sense that there is non-zero price diffusion and afinite spread. Furthermore, the dependence on tick sizeis weak, and for many purposes the limit dp→ 0 approx-imates the case of finite tick size fairly well. As we willsee, working in this limit is essential for getting tractableanalytic results.

A nondimensional scale parameter based on order sizeis constructed by dividing the typical order size (whichis measured in shares) by the characteristic number ofshares Nc, i.e. ε ≡ σ/Nc = 2δσ/µ. ε characterizesthe “chunkiness” of the orders stored in the limit order

7

3 2 1 0 1 2 3-600

-400

-200

0

200

400

600

n

p

a)

4 3 2 1 0 1 2 3 41.5

1

0.5

0

0.5

1

1.5

n δ

/ α

p / pC

b)

FIG. 2: The usefulness of nondimensional units. (a) We showthe average depth profile for three different parameter sets.The parameters α = 0.5, σ = 1, and dp = 0 are held con-stant, while δ and µ are varied. The line types are: (dotted)δ = 0.001, µ = 0.2; (dashed) δ = 0.002, µ = 0.4 and (solid)δ = 0.004, µ = 0.8. (b) is the same, but plotted in nondimen-sional units. The horizontal axis has units of price, and so hasnondimensional units p = p/pc = 2αp/µ. The vertical axishas units of n shares/price, and so has nondimensional unitsn = npc/Nc = nδ/α. Because we have chosen the parametersto keep the nondimensional order size ε constant, the collapseis perfect. Varying the tick size has little effect on the resultsother than making them discrete.

book. As we will see, ε is an important determinant ofliquidity, and it is a particularly important determinantof volatility. In the continuum limit ε → 0 there is noprice diffusion. This is because price diffusion can occuronly if there is a finite probability for price levels out-side the spread to be empty, thus allowing the best bidor ask to make a persistent shift. If we let ε → 0 whilethe average depth is held fixed the number of individualorders becomes infinite, and the probability that spon-taneous decays or market orders can create gaps outsidethe spread becomes zero. This is verified in simulations.Thus the limit ε → 0 is always a poor approximation toa real market. ε is a more important parameter than thetick size dp/pc. In the mean field analysis in Section III,

Quantity Dimensions Scaling relation

Asymptotic depth shares/price d ∼ α/δSpread price s ∼ µ/αSlope of depth profile shares/price2 λ ∼ α2/µδ = d/sPrice diffusion rate price2/time D0 ∼ µ2δ/α2

TABLE III: Estimates from dimensional analysis for the scal-ing of a few market properties based on order flow rates alone.α is the limit order density rate, µ is the market order rate,and δ is the spontaneous limit order removal rate. These es-timates are constructed by taking the combinations of thesethree rates that have the proper units. They neglect the de-pendence on on the order granularity ε and the nondimen-sional tick size dp/pc. More accurate relations from simula-tion and theory are given in table IV.

we let dp/pc → 0, reducing the number of independentparameters from two to one, and in many cases find thatthis is a good approximation.

The order size σ can be thought of as the order gran-ularity. Just as the properties of a beach with fine sandare quite different from that of one populated by fist-sizedboulders, a market with many small orders behaves quitedifferently from one with a few large orders. Nc providesthe scale against which the order size is measured, andε characterizes the granularity in relative terms. Alter-natively, 1/ε can be thought of as the annihilation ratefrom market orders expressed in units of the size of spon-taneous decays. Note that in nondimensional units thenumber of shares can also be written N = N/Nc = Nε/σ.

The construction of the nondimensional granularityparameter illustrates the importance of including a spon-taneous decay process in this model. If δ = 0 (which im-plies ε = 0) there is no spontaneous decay of orders, anddepending on the relative values of µ and α, genericallyeither the depth of orders will accumulate without boundor the spread will become infinite. As long as δ > 0, incontrast, this is not a problem.

For some purposes the effects of varying tick size andorder size are fairly small, and we can derive approxi-mate formulas using dimensional analysis based only onthe order flow rates. For example, in table III we givedimensional scaling formulas for the average spread, themarket order liquidity (as measured by the average slopeof the depth profile near the midpoint), the volatility, andthe asymptotic depth (defined below). Because these es-timates neglect the effects of discreteness, they are onlyapproximations of the true behavior of the model, whichdo a better job of explaining some properties than oth-ers. Our numerical and analytical results show that somequantities also depend on the granularity parameter εand to a weaker extent on the tick size dp/pc. Nonethe-less, the dimensional estimates based on order flow aloneprovide a good starting point for understanding marketbehavior. A comparison to more precise formulas derivedfrom theory and simulations is given in table IV.

An approximate formula for the mean spread can bederived by noting that it has dimensions of price , and theunique combination of order flow rates with these dimen-

8

Quantity Scaling relation Figure

Asymptotic depth d = α/δ 3Spread s = (µ/α)f(ε, dp/pc) 10, 24Slope of depth profile λ = (α2/µδ)g(ε, dp/pc) 3, 20 - 21Price diffusion (τ → 0) D0 = (µ2δ/α2)ε−0.5 11, 14(c)Price diffusion (τ → ∞) D∞ = (µ2δ/α2)ε0.5 11, 14(c)

TABLE IV: The dependence of market properties on modelparameters based on simulation and theory, with the relevantfigure numbers. These formulas include corrections for or-der granularity ε and finite tick size dp/pc. The formula forasymptotic depth from dimensional analysis in table III is ex-act with zero tick size. The expression for the mean spread ismodified by a function of ε and dp/pc, though the dependenceon them is fairly weak. For the liquidity λ, corresponding tothe slope of the depth profile near the origin, the dimensionalestimate must be modified because the depth profile is nolonger linear (mainly depending on ε) and so the slope de-pends on price. The formulas for the volatility are empiricalestimates from simulations. The dimensional estimate for thevolatility from Table III is modified by a factor of ε−0.5 forthe early time price diffusion rate and a factor of ε0.5 for thelate time price diffusion rate.

sions is µ/α. While the dimensions indicate the scaling ofthe spread, they cannot determine multiplicative factorsof order unity. A more intuitive argument can be madeby noting that inside the spread removal due to cancella-tion is dominated by removal due to market orders. Thusthe total limit order placement rate inside the spread, foreither buy or sell limit orders αs, must equal the orderremoval rate µ/2, which implies that spread is s = µ/2α.As we will see later, this argument can be generalized andmade more precise within our mean-field analysis whichthen also predicts the observed dependence on the gran-ularity parameter ε. However this dependence is ratherweak and only causes a variation of roughly a factor oftwo for ε < 1 (see Figs. 10 and 24), and the factor of 1/2derived above is a good first approximation. Note thatthis prediction of the mean spread is just the character-istic price pc.

It is also easy to derive the mean asymptotic depth,which is the density of shares far away from the mid-point. The asymptotic depth is an artificial construct ofour assumption of order placement over an infinite inter-val; it should be regarded as providing a simple boundarycondition so that we can study the behavior near the mid-point price. The mean asymptotic depth has dimensionsof shares/price , and is therefore given by α/δ. Further-more, because removal by market orders is insignificantin this regime, it is determined by the balance betweenorder placement and decay, and far from the midpointthe depth at any given price is Poisson distributed. Thisresult is exact.

The average slope of the depth profile near the mid-point is an important determinant of liquidity, since itaffects the expected price response when a market or-der arrives. The slope has dimensions of shares/price2,which implies that in terms of the order flow rates it

scales roughly as α2/µδ. This is also the ratio of theasymptotic depth to the spread. As we will see later,this is a good approximation when ε ∼ 0.01, but forsmaller values of ε the depth profile is not linear near themidpoint, and this approximation fails.

The last two entries in table IV are empirical estimatesfor the price diffusion rate D, which is proportional tothe square of the volatility. That is, for normal diffusion,starting from a point at t = 0, the variance v after timet is v = Dt. The volatility at any given timescale t isthe square root of the variance at timescale t. The esti-mate for the diffusion rate based on dimensional analysisin terms of the order flow rates alone is µ2δ/α2. How-ever, simulations show that short time diffusion is muchfaster than long time diffusion, due to negative autocor-relations in the price process, as shown in Fig. 11. Theinitial and the asymptotic diffusion rates appear to obeythe scaling relationships given in table IV. Though ourmean-field theory is not able to predict this functionalform, the fact that early and late time diffusion rates aredifferent can be understood within the framework of ouranalysis, as described in Sec. III E. Anomalous diffusionof this type implies negative autocorrelations in midpointprices. Note that we use the term “anomalous diffusion”to imply that the diffusion rate is different on short andlong timescales. We do not use this term in the sense thatit is normally used in the physics literature, i.e. that thelong-time diffusion is proportional to tγ with γ 6= 1 (forlong times γ = 1 in our case).

B. Varying the granularity parameter ε

We first investigate the effect of varying the order gran-ularity ε in the limit dp → 0. As we will see, the granu-larity has an important effect on most of the properties ofthe model, and particularly on depth, price impact, andprice diffusion. The behavior can be divided into threeregimes, roughly as follows:

• Large ε, i.e. ε >∼ 0.1. This corresponds to alarge accumulation of orders at the best bid andask, nearly linear market impact, and roughly equalshort and long time price diffusion rates. This is theregime where the mean-field approximation used inthe theoretical analysis works best.

• Medium ε i.e. ε ∼ 0.01. In this range the accu-mulation of orders at the best bid and ask is smalland near the midpoint price the depth profile in-creases nearly linearly with price. As a result, as acrude approximation the price impact increases asroughly the square root of order size.

• Small ε i.e. ε <∼ 0.001. The accumulation of ordersat the best bid and ask is very small, and near themidpoint the depth profile is a convex function ofprice. The price impact is very concave. The short

9

time price diffusion rate is much greater than thelong time price diffusion rate.

Since the results for bids are symmetric with those foroffers about p = 0, for convenience we only show theresults for offers, i.e. buy market orders and sell limitorders. In this sub-section prices are measured relativeto the midpoint, and simulations are in the continuumlimit where the tick size dp → 0. The results in thissection are from numerical simulations. Also, bear inmind that far from the midpoint the predictions of thismodel are not valid due to the unrealistic assumptionof an order placement process with an infinite domain.Thus the results are potentially relevant to real marketsonly when the price p is at most a few times as large asthe characteristic price pc.

1. Depth profile

The mean depth profile, i.e. the average number ofshares per price interval, and the mean cumulative depthprofile are shown in Fig. 3, and the standard deviation ofthe cumulative profile is shown in Fig. 4. Since the depthprofile has units of shares/price, nondimensional units ofdepth profile are n = npc/Nc = nδ/α. The cumulativedepth profile at any given time t is defined as

N(p, t) =

p∑

p=0

n(p, t)dp. (2)

This has units of shares and so in nondimensional termsis N(p) = N(p)/Nc = 2δN(p)/µ = N(p)ε/σ.

In the high ε regime the annihilation rate due to mar-ket orders is low (relative to δσ), and there is a significantaccumulation of orders at the best ask, so that the av-erage depth is much greater than zero at the midpoint.The mean depth profile is a concave function of price.In the medium ε regime the market order removal rateincreases, depleting the average depth near the best ask,and the profile is nearly linear over the range p/pc ≤ 1.In the small ε regime the market order removal rate in-creases even further, making the average depth near theask very close to zero, and the profile is a convex functionover the range p/pc ≤ 1.

The standard deviation of the depth profile is shownin Fig. 4. We see that the standard deviation of thecumulative depth is comparable to the mean depth, andthat as ε increases, near the midpoint there is a similartransition from convex to concave behavior.

The uniform order placement process seems at firstglance one of the most unrealistic assumptions of ourmodel, leading to depth profiles with a finite asymptoticdepth (which also implies that there are an infinite num-ber of orders in the book). However, orders far awayfrom the spread in the asymptotic region almost neverget executed and thus do not affect the market dynam-ics. To demonstrate this in Fig. 5 we show the compari-son between the limit-order depth profile and the depth

0 0.5 1 1.5 2 2.5 30

0.2

0.4

0.6

0.8

1

normalized depth profile

p / pC

n / n

C

a)

0 0.5 1 1.5 2 2.5 30

0.5

1

1.5

2

2.5normalized cumulative depth profile

p / pC

N /

NC

b)

FIG. 3: The mean depth profile and cumulative depth versusp = p/pc = 2αp/µ. The origin p/pc = 0 corresponds to themidpoint. (a) is the average depth profile n in nondimensionalcoordinates n = npc/Nc = nδ/α. (b) is nondimensional cu-mulative depth N(p)/Nc. We show three different values ofthe nondimensional granularity parameter: ε = 0.2 (solid),ε = 0.02 (dash), ε = 0.002 (dot), all with tick size dp = 0.

0 0.5 1 1.5 2 2.5 3 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9 cumulative profile standard deviation

p / pC

(<N

2 > -

<N

>2 )

/

N1/

2C

FIG. 4: Standard deviation of the nondimensionalized cumu-lative depth versus nondimensional price, corresponding toFig. (3).

10

0 0.5 1 1.5 2 2.5 3 3.5 40

0.2

0.4

0.6

0.8

1

1.2

1.4depth and effective depth profile

n δ

/ α

p / pC

0

0.2

0.4

0.6

0.8

1

1.2

1.4

n eδ

/ α

FIG. 5: A comparison between the depth profiles and theeffective depth profiles as defined in the text, for differentvalues of ε. Heavy lines refer to the effective depth profiles ne

and the light lines correspond to the depth profiles.

ne of only those orders which eventually get executed.4

The density ne of executed orders decreases rapidly as afunction of the distance from the mid-price. Thereforewe expect that near the midpoint our results should besimilar to alternative order placement processes, as longas they also lead to an exponentially decaying profile ofexecuted orders (which is what we observe above). How-ever, to understand the behavior further away from themidpoint we are also working on enhancements that in-clude more realistic order placement processes groundedon empirical measurements of market data, as summa-rized in section IV B.

2. Liquidity for market orders: The price impact function

In this sub-section we study the instantaneous priceimpact function φ(t, ω, τ → 0). This is defined as the(logarithm of the) midpoint price shift immediately afterthe arrival of a market order in the absence of any otherevents. This should be distinguished from the asymp-totic price impact φ(t, ω, τ → ∞), which describes thepermanent price shift. While the permanent price shiftis clearly very important, we do not study it here. Thereader should bear in mind that all prices p, a(t), etc.are logarithmic.

The price impact function provides a measure of theliquidity for executing market orders. (The liquidity forlimit orders, in contrast, is given by the probability ofexecution, studied in section II B 5). At any given timet, the instantaneous (τ = 0) price impact function is the

4 Note that the ratio ne/n is not the same as the probability offilling orders (Fig. 12) because in that case the price p/pc refersto the distance of the order from the midpoint at the time whenit was placed.

0 0.1 0.2 0.3 0.4 0.5 0.60

0.1

0.2

0.3

0.4

0.5

0.6

<dm

> /

p C

N ε / σ

FIG. 6: The average price impact corresponding to the re-sults in Fig. (3). The average instantaneous movement of thenondimensional mid-price, 〈dm〉/pc caused by an order of sizeN/Nc = Nε/σ. ε = 0.2 (solid), ε = 0.02 (dash), ε = 0.002(dot).

inverse of the cumulative depth profile. This follows im-mediately from equations (1) and (2), which in the limitdp → 0 can be replaced by the continuum transactionequation:

ω = N(p, t) =

∫ p

0

n(p, t)dp (3)

This equation makes it clear that at any fixed t the priceimpact can be regarded as the inverse of the cumulativedepth profile N(p, t). When the fluctuations are suffi-ciently small we can replace n(p, t) by its mean valuen(p) = 〈n(p, t)〉. In general, however, the fluctuationscan be large, and the average of the inverse is not equal tothe inverse of the average. There are corrections based onhigher order moments of the depth profile, as given in themoment expansion derived in Appendix A 1. Nonethe-less, the inverse of the mean cumulative depth providesa qualitative approximation that gives insight into thebehavior of the price impact function. (Note that ev-erything becomes much simpler using medians, since themedian of the cumulative price impact function is ex-actly the inverse of the median price impact, as derivedin Appendix A 1).

Mean price impact functions are shown in Fig. 6 andthe standard deviation of the price impact is shown inFig. 7. The price impact exhibits very large fluctuationsfor all values of ε: The standard deviation has the sameorder of magnitude as the mean or even greater for smallNε/σ values. Note that these are actually virtual priceimpact functions. That is, to explore the behavior of theinstantaneous price impact for a wide range of order sizes,we periodically compute the price impact that an orderof a given size would have caused at that instant, if it hadbeen submitted. We have checked that real price impactcurves are the same, but they require a much longer timeto accumulate reasonable statistics.

11

0 0.1 0.2 0.3 0.4 0.5 0.60

0.1

0.2

0.3

0.4

0.5

0.6

N ε / σ

(<dm

2 > -

<dm

>2 )

/

p C1/

2

FIG. 7: The standard deviation of the instantaneous priceimpact dm/pc corresponding to the means in Fig. 6, as afunction of normalized order size εN/σ. ε = 0.2 (solid), ε =0.02 (dash), ε = 0.002 (dot).

One of the interesting results in Fig. 6 is the scale ofthe price impact. The price impact is measured relativeto the characteristic price scale pc, which as we have men-tioned earlier is roughly equal to the mean spread. Aswe will argue in relation to Fig. 8, the range of nondi-mensional shares shown on the horizontal axis spans therange of reasonable order sizes. This figure demonstratesthat throughout this range the price is the order of mag-nitude (and typically less than) the mean spread size.

Due to the accumulation of orders at the ask in thelarge ε regime, for small p the mean price impact isroughly linear. This follows from equation (3) underthe assumption that n(p) is constant. In the medium εregime, under the assumption that the variance in depthcan be neglected, the mean price impact should increaseas roughly ω1/2. This follows from equation (3) un-der the assumption that n(p) is linearly increasing andn(0) ≈ 0. (Note that we see this as a crude approxima-tion, but there can be substantial corrections caused bythe variance of the depth profile). Finally, in the smallε regime the price impact is highly concave, increasingmuch slower than ω1/2. This follows because n(0) ≈ 0and the depth profile n(p) is convex.

To get a better feel for the functional form of the priceimpact function, in Fig. 8 we numerically differentiate itversus log order size, and plot the result as a function ofthe appropriately scaled order size. (Note that becauseour prices are logarithmic, the vertical axis already incor-porates the logarithm). If we were to fit a local power lawapproximation to the function at each price, this corre-sponds to the exponent of that power law near that price.Notice that the exponent is almost always less than one,so that the price impact is almost always concave. Mak-ing the assumption that the effect of the variance of thedepth is not too large, so that equation (3) is a good as-sumption, the behavior of this figure can be understoodas follows: For N/Nc ≈ 0 the price impact is dominated

10-4

10-3

10-2

10-1

100

101

102

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

1.2

N ε / σ

d lo

g (<

dm>

/ pC

) / d

log

(N ε

/ σ)

FIG. 8: Derivative of the nondimensional mean mid-pricemovement, with respect to logarithm of the nondimensionalorder size N/Nc = Nε/σ, obtained from the price impactcurves in Fig. 6.

by n(0) (the constant term in the average depth profile)and so the logarithmic slope of the price impact is alwaysnear to one. As N/Nc increases, the logarithmic slope isdriven by the shape of the average depth profile, which islinear or convex for smaller ε, resulting in concave priceimpact. For large values of N/Nc, we reach the asymp-totic region where the depth profile is flat (and where ourmodel is invalid by design). Of course, there can be devi-ations to this behavior caused by the fact that the meanof the inverse depth profile is not in general the inverseof the mean, i.e. 〈N−1(p)〉 6= 〈N(p)〉−1 (see App. A 1).

To compare to real data, note that N/Nc = Nε/σ.N/σ is just the order size in shares in relation to the av-erage order size, so by definition it has a typical value ofone. For the London Stock Exchange, we have found thattypical values of ε are in the range 0.001−0.1. For a typ-ical range of order sizes from 100− 100, 000 shares, withan average size of 10, 000 shares, the meaningful range forN/Nc is therefore roughly 10−5 to 1. In this range, forsmall values of ε the exponent can reach values as low as0.2. This offers a possible explanation for the previouslymysterious concave nature of the price impact function,and contradicts the linear increase in price impact basedon the naive argument presented in the introduction.

3. Spread

The probability density of the spread is shown in Fig. 9.This shows that the probability density is substantial ats/pc = 0. (Remember that this is in the limit dp → 0).The probability density reaches a maximum at a valueof the spread approximately 0.2pc, and then decays. Itmight seem surprising at first that it decays more slowlyfor large ε, where there is a large accumulation of or-ders at the ask. However, it should be borne in mind

12

0 0.5 1 1.5 2 2.5 3 0

0.01

0.02

0.03

0.04

0.05

0.06

s / pC

PD

F (

s / p

C)

a)

0 0.5 1 1.5 2 2.5 3 3.5 0

0.2

0.4

0.6

0.8

1

s / pC

CD

F (

s / p

C)

b)

FIG. 9: The probability density function (a), and cumulativedistribution function (b) of the nondimensionalized bid-askspread s/pc, corresponding to the results in Fig. (3). ε = 0.2(solid), ε = 0.02 (dash), ε = 0.002 (dot).

that the characteristic price pc = µ/α depends on ε.Since ε = 2δσ/µ, by eliminating µ this can be writtenpc = 2σδ/(αε). Thus, holding the other parameters fixed,large ε corresponds to small pc, and vice versa. So in fact,the spread is very small for large ε, and large for small ε,as expected. The figure just shows the small correctionsto the large effects predicted by the dimensional scalingrelations.

For large ε the probability density of the spread decaysroughly exponentially moving away from the midpoint.This is because for large ε the fluctuations around themean depth are roughly independent. Thus the proba-bility for a market order to penetrate to a given pricelevel is roughly the probability that all the ticks smallerthan this price level contain no orders, which gives riseto an exponential decay. This is no longer true for smallε. Note that for small ε the probability distribution ofthe spread becomes insensitive to ε, i.e. the nondimen-sionalized distribution for ε = 0.02 is nearly the same asthat for ε = 0.002.

It is apparent from Fig. 9 that in nondimensional unitsthe mean spread increases with ε. This is confirmed inFig. 10, which displays the mean value of the spread as a

0 0.05 0.1 0.15 0.2 0.25 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

ε

<s>

/ p C

FIG. 10: The mean value of the spread in nondimensionalunits s = s/pc as a function of ε. This demonstrates that thespread only depends weakly on ε, indicating that the predic-tion from dimensional analysis given in table (III) is a reason-able approximation. .

function of ε. The mean spread increases monotonicallywith ε. It depends on ε as roughly a constant (equal toapproximately 0.45 in nondimensional coordinates) plusa linear term whose slope is rather small. We believethat for most financial instruments ε < 0.3. Thus thevariation in the spread caused by varying ε in the range0 < ε < 0.3 is not large, and the dimensional analy-sis based only on rate parameters given in table IV is agood approximation. We get an accurate prediction ofthe ε dependence across the full range of ε from the In-dependent Interval Approximation technique derived insection III G, as shown in Fig. 24.

4. Volatility and price diffusion

The price diffusion rate, which is proportional to thesquare of the volatility, is important for determining riskand is a property of central interest. From dimensionalanalysis in terms of the order flow rates the price dif-fusion rate has units of price2/time, and so must scaleas µ2δ/α2. We can also make a crude argument for thisas follows: The dimensional estimate of the spread (seeTable IV) is µ/2α. Let this be the characteristic stepsize of a random walk, and let the step frequency be thecharacteristic time 1/δ (which is the average lifetime fora share to be canceled). This argument also gives theabove estimate for the diffusion rate. However, this isnot correct in the presence of negative autocorrelationsin the step sizes. The numerical results make it clearthat there are important ε-dependent corrections to thisresult, as demonstrated below.

In Fig. 11 we plot simulation results for the varianceof the change in the midpoint price at timescale τ ,Var (m (t+ τ) −m (t)). The slope is the diffusion rate,which at any fixed timescale is proportional to the squareof the volatility. It appears that there are at least two

13

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0

0.1

0.2

0.3

0.4

0.5

0.6

<[m

(t+

τ) -

m(t

)]2 >

/ p C

2

τδ

FIG. 11: The variance of the change in the nondimension-alized midpoint price versus the nondimensional time delayinterval τδ. For a pure random walk this would be a straightline whose slope is the diffusion rate, which is proportionalto the square of the volatility. The fact that the slope issteeper for short times comes from the nontrivial temporalpersistence of the order book. The three cases correspond toFig. 3: ε = 0.2 (solid), ε = 0.02 (dash), ε = 0.002 (dot).

timescales involved, with a faster diffusion rate for shorttimescales and a slower diffusion rate for long timescales.Such anomalous diffusion is not predicted by mean-fieldanalysis. Simulation results show that the diffusion rateis correctly described by the product of the estimatefrom dimensional analysis based on order flow parametersalone, µ2δ/α2, and a τ -dependent power of the nondi-mensional granularity parameter ε = 2δσ/µ, as summa-rized in table IV. We cannot currently explain why thispower is −1/2 for short term diffusion and 1/2 for long-term diffusion. However, a qualitative understanding canbe gained based on the conservation law we derive inSection III C. A discussion of how this relates to pricediffusion is given in Section III E.

Note that the temporal structure in the diffusion pro-cess also implies non-zero autocorrelations of the mid-point price m(t). This corresponds to weak negative au-tocorrelations in price differences m(t) − m(t − 1) thatpersist for timescales until the variance vs. τ becomes astraight line. The timescale depends on parameters, butis typically the order of 50 market order arrival times.This temporal structure implies that there exists an ar-bitrage opportunity which, when exploited, would makeprices more random and the structure of the order flownon-random.

5. Liquidity for limit orders: Probability and time to fill.

The liquidity for limit orders depends on the proba-bility that they will be filled, and the time to be filled.This obviously depends on price: Limit orders close tothe current transaction prices are more likely to be filled

-1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 40

0.2

0.4

0.6

0.8

1

execution probablility vs. price

p / pC

Γ

FIG. 12: The probability Γ for filling a limit order placed at aprice p/pc where p is calculated from the instantaneous mid-price at the time of placement. The three cases correspondto Fig. 3: ε = 0.2 (solid), ε = 0.02 (dash), ε = 0.002 (dot).

quickly, while those far away have a lower likelihood tobe filled. Fig. 12 plots the probability Γ of a limit orderbeing filled versus the nondimensionalized price at whichit was placed (as with all the figures in this section, thisis shown in the midpoint-price centered frame). Fig. 12shows that in nondimensional coordinates the probabilityof filling close to the bid for sell limit orders (or the askfor buy limit orders) decreases as ε increases. For largeε, this is less than 1 even for negative prices. This saysthat even for sell orders that are placed close to the bestbid there is a significant chance that the offer is deletedbefore being executed. This is not true for smaller valuesof ε, where Γ(0) ≈ 1. Far away from the spread the fillprobabilities as a function of ε are reversed, i.e. the prob-ability for filling limit orders increases as ε increases. Thecrossover point where the fill probabilities are roughly thesame occurs at p ≈ pc. This is consistent with the depthprofile in Fig. 3 which also shows that depth profiles fordifferent values of ε cross at about p ∼ pc.

Similarly Fig 13 shows the average time τ taken to fillan order placed at a distance p from the instantaneousmid-price. Again we see that though the average time islarger at larger values of ε for small p/pc, this behaviourreverses at p ∼ pc.

C. Varying tick size dp/pc

The dependence on discrete tick size dp/pc, of the cu-mulative distribution function for the spread, instanta-neous price impact, and mid-price diffusion, are shownin Fig. 14. We chose an unrealistically large value ofthe tick size, with dp/pc = 1, to show that, even withvery coarse ticks, the qualitative changes in behavior aretypically relatively minor.

Fig. 14(a) shows the cumulative density function ofthe spread, comparing dp/pc = 0 and dp/pc = 1. It

14

-2 -1 0 1 2 3 4 5 60

0.5

1

1.5

2

2.5

3

3.5time to execution

p / pC

τδ

FIG. 13: The average time τ nondimensionalized by the rateδ, to fill a limit order placed at a distance p/pc from theinstantaneous mid-price.

is apparent from this figure that the spread distributionfor coarse ticks “effectively integrates” the distributionin the limit dp → 0. That is, at integer tick values themean cumulative depth profiles roughly match, and inbetween integer tick values, for coarse ticks the probabil-ity is smaller. This happens for the obvious reason thatcoarse ticks quantize the possible values of the spread,and place a lower limit of one tick on the value the spreadcan take. The shift in the mean spread from this effectis not shown, but it is consistent with this result; thereis a constant offset of roughly 1/2 tick.

The alteration in the price impact is shown inFig. 14(b). Unlike the spread distribution, the averageprice impact varies continuously. Even though the ticksize is quantized, we are averaging over many events andthe probability of a price impact of each tick size is acontinuous function of the order size. Large tick sizeconsistently lowers the price impact. The price impactrises more slowly for small p, but is then similar exceptfor a downward translation.

The effect of coarse ticks is less trivial for mid-pricediffusion, as shown in Fig. 14(c). At ε = 0.002, coarseticks remove most of the rapid short-term volatility ofthe midpoint, which in the continuous-price case arisesfrom price fluctuations smaller than dp/pc = 1. Thislessens the negative autocorrelation of midpoint price re-turns, and reduces the anomalous diffusion. At ε = 0.2,where both early volatility and late negative autocorre-lation are smaller, coarse ticks have less effect. The netresult is that the mid-price diffusion becomes less sensi-tive to the value of ε as tick size increases, and there isless anomalous price diffusion.

0 0.5 1 1.5 2 2.5 3 3.50

0.2

0.4

0.6

0.8

1

s / pC

CD

F (

s / p

C)

a)

0 0.1 0.2 0.3 0.4 0.5 0.60

0.1

0.2

0.3

0.4

0.5

<dm

> /

p CN ε / σ

b)

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0

0.1

0.2

0.3

0.4

0.5

0.6

c)<

[m(t

+τ)

- m

(t)]

2 > /

p C2

τδ

FIG. 14: Dependence of market properties on tick size. Heavylines are dp/pc → 0; light lines are dp/pc = 1. Cases cor-respond to Fig. 3, with ε = 0.2 (solid), ε = 0.02 (dash),ε = 0.002 (dot). (a) is the cumulative distribution func-tion for the nondimensionalized spread. (b) is instantaneousnondimensionalized price impact, (c) is diffusion of the nondi-mensionalized midpoint shift, corresponding to Fig. 11.

III. THEORETICAL ANALYSIS

A. Summary of analytic methods

We have investigated this model analytically using twoapproaches. The first one is based on a master equation,given in Section III F. This approach works best in themidpoint centered frame. Here we attempt to solve di-rectly for the average number of shares at each price tick

15

as a function of price. The midpoint price makes a ran-dom walk with a nonstationary distribution. Thus thekey to finding a stationary analytic solution for the aver-age depth is to use comoving price coordinates, which arecentered on a reference point near the center of the book,such as the midpoint or the best bid. In the first approx-imation, fluctuations about the mean depth at adjacentprices are treated as independent. This allows us to re-place the distribution over depth profiles with a simplerprobability density over occupation numbers n at each pand t. We can take a continuum limit by letting the ticksize dp become infinitesimal. With finite order flow rates,this gives vanishing probability for the existence of morethan one order at any tick as dp→ 0. This is described indetail in section III F 3. With this approach we are ableto test the relevance of correlations as a function of theparameter ε as well as predict the functional dependenceof the cumulative distribution of the spread on the depthprofile. It is seen that correlations are negligible for largevalues of ε(ε ∼ 0.2) while they are very important forsmall values (ε ∼ 0.002).

Our second analytic approach which we term the In-dependent Interval Approximation (IIA) is most easilycarried out in the bid-centered frame and is describedin section III G. This approach uses a different repre-sentation, in which the solution is expressed in terms ofthe empty intervals between non-empty price ticks. Thesystem is characterized at any instant of time by a setof intervals {...x−1, x0, x1, x2...} where for example x0 isthe distance between the bid and the ask (the spread),x−1 is the distance between the second buy limit orderand the bid and so on (see Fig. 15). Equations arewritten for how a given interval varies in time. Changesto adjacent intervals are related, giving us an infinite setof coupled non-linear equations. However using a mean-field approximation we are able to solve the equations,albeit only numerically. Besides predicting how the vari-ous intervals (for example the spread) vary with the pa-rameters, this approach also predicts the depth profilesas a function of the parameters. The predictions from theIIA are compared to data from numerical simulations, inSection III G 2. They match very well for large ε and lesswell for smaller values of ε. The IIA can also be mod-ified to incorporate various extensions to the model, asmentioned in Section III G 2.

In both approaches, we use a mean field approxima-tion to get a solution. The approximation basically liesin assuming that fluctuations in adjacent intervals (whichmight be adjacent price ranges in the master equation ap-proach or adjacent empty intervals in the IIA) are inde-pendent. Also, both approaches are most easily tractableonly in the continuum limit dp→ 0, when every tick hasat most only one order. They may however be extendedto general tick size as well. This is explained in the ap-pendix for the Master Equation approach.

Because correlations are important for small ε, bothmethods work well mostly in the large ε limit, thoughqualitative aspects of small ε behavior may also be

gleaned from them. Unfortunately, at least based onour preliminary investigation of London Stock Exchangedata, it seems that it is this small ε limit that real marketsmay tend more towards. So our approximate solutionsmay not be as useful as we would like. Nonetheless, theydo provide some conceptual insights into what determinesdepth and price impact.

In particular, we find that the shape of the mean depthprofile depends on a single parameter ε, and that the rel-ative sizes of its first few derivatives account for boththe order size-dependence of the market impact, and therenormalization of the midpoint diffusivity. A higher rel-ative rate of market versus limit orders depletes the cen-ter of the book, though less than the classical estimatepredicts. This leads to more concave impact (explain-ing Fig. 8) and faster short-term diffusivity. However,the orders pile up more quickly (versus classically nondi-mensionalized price) with distance from the midpoint,causing the rapid early diffusion to suffer larger meanreversion. These are the effects shown in Fig. 11. Wewill elaborate on the above remarks in the following sec-tions, however, the qualitative relation of impact to mid-point autocorrelation supplies a potential interpretationof data, which may be more robust than details of themodel assumptions or its quantitative results.

Both of the treatments described above are approxi-mations. We can derive an exact global conservation lawof order placement and removal whose consequences weelaborate in section III C. This conservation law mustbe respected in any sensible analysis of the model, giv-ing us a check on the approximations. It also providessome insight into the anomalous diffusion properties ofthis model.

B. Characterizing limit-order books: dual

coordinates

We begin with the assumption of a price space. Price isa dimensional quantity, and the space is divided into binsof length dp representing the ticks, which may be finiteor infinitesimal. Prices are then discrete or continuous-valued, respectively.

Statistical properties of interest are computed fromtemporal sequences or ensembles of limit-order book con-figurations. If n is the variable used to denote the num-ber of shares from limit orders in some bin (p, p+ dp)at the beginning t of an elementary time interval, a con-figuration is specified by a function n (p, t). It is conve-nient to take n positive for sell limit orders, and negativefor buy limit orders. Because the model dynamics pre-cludes crossing limit orders, there is in general a high-est instantaneous buy limit-order price, called the bidb (t), and a lowest sell limit-order price, the ask a (t),with b (t) < a (t) always. The midpoint price, defined asm (t) ≡ [a (t) + b (t)] /2, may or may not be the price ofany actual bin, if prices are discrete (m (t) may be a half-integer multiple of dp). These quantities are diagrammed

16

p

dp

a a+dp

b b+dp

x(1)

x(0)

}

}}x(-1)

}

FIG. 15: The price space and order profile. n (p, t) has beenchosen to be 0 or ±1, a restriction that will be convenientlater. Price bins are labeled by their lower boundary price,and intervals x (N) will be defined below.

p

N

b b+dpa a+dp

n(b)

n(a)

FIG. 16: The accumulated order number N (p, t). N (a, t) ≡0, because contributions from all bins cancel in the two sums.N remains zero down to b (t) + dp, because there are no un-canceled, nonzero terms. N (b, t) becomes negative, becausethe second sum in Eq. (4) now contains n (b, t), not canceledby the first.

in Fig. 15.An equivalent specification of a limit-order book con-

figuration is given by the cumulative order count

N (p, t) ≡p−dp∑

−∞

|n (p, t)| −a−dp∑

−∞

|n (p, t)| , (4)

where −∞ denotes the lower boundary of the price space,whose exact value must not affect the results. (Becauseby definition there are no orders between the bid and ask,the bid could equivalently have been used as the originof summation. Because price bins will be indexed hereby their lower boundaries, though, it is convenient hereto use the ask.) The absolute values have been placed sothat N , like n, is negative in the range of buy orders andpositive in the range of sells. The construction of N (p, t)is diagrammed in Fig. 16.

In many cases of either sparse orders or infinitesimaldp, with fixed order size (which we may as well define tobe one share) there will be either zero or one share in anysingle bin, and Eq. (4) will be invertible to an equivalentspecification of the limit-order book configuration

p (N, t) ≡ max {p | N (p, t) = N} , (5)

p

N

bb+dp

aa+dp

n(b)

n(a)

1 2-1

FIG. 17: The inverse function p (N, t). The function is ingeneral defined only on discrete values of N , so this domainis only invariant when order size is fixed, a convenience thatwill be assumed below. Between the discrete domain, and thedefinition of p as a maximum, the inverse function effectivelyinterpolates between vertices of the reflected image of N (p, t),as shown by the dotted line.

shown in Fig. 17. (Strictly, the inversion may be per-formed for any distribution of order sizes, but the re-sulting function is intrinsically discrete, so its domain isonly invariant when order size is fixed. To give p (N, t)the convenient properties of a well-defined function on aninvariant domain, this will be assumed below.)

With definition (5), p (0, t) ≡ a (t), p (−1, t) ≡ b (t),and one can define the intervals between orders as

x (N, t) ≡ p (N, t) − p (N − 1, t) . (6)

Thus x (0, t) = a (t) − b (t), the instantaneous bid-ask spread. The lowest values of x (N, t) bracket-ing the spread are shown in Fig. 15. For symmetricorder-placement rules, probability distributions over con-figurations will be symmetric under either n (p, t) →−n (−p, t), or x (N, t) → x (−N, t). Coordinates N andp furnish a dual description of configurations, and n andx are their associated differences. The Master Equationapproach of section III F assumes independent fluctu-ation in n while the Independent Interval Approxima-tion of Sec. III G assumes independent fluctuation inx (In this section, it will be convenient to abbreviatex (N, t) ≡ xN (t)).

C. Frames and marginals

The x (N, t) specification of limit-order book configu-rations has the property that its distribution is station-ary under the dynamics considered here. The same is nottrue for p (N, t) or n (p, t) directly, because bid, midpoint,and ask prices undergo a random walk, with a renormal-ized diffusion coefficient. Stationary distributions for n-variables can be obtained in co-moving frames, of whichthere are several natural choices.

17

The bid-centered configuration is defined as

nb (p, t) ≡ n (p− b (t) , t) . (7)

If an appropriate rounding convention is adopted in thecase of discrete prices, a midpoint-centered configurationcan also be defined, as

nm (p, t) ≡ n (p−m (t) , t) . (8)

The midpoint-centered configuration has qualitative dif-ferences from the bid-centered configuration, which willbe explored below. Both give useful insights to the orderdistribution and diffusion processes. The ask-centeredconfiguration, na (p, t), need not be considered if orderplacement and removal are symmetric, because it is amirror image of nb (p, t).

The spread is defined as the difference s (t) ≡ a (t) −b (t), and is the value of the ask in bid-centered coordi-nates. In midpoint-centered coordinates, the ask appearsat s (t) /2.

The configurations nb and nm are dynamically corre-lated over short time intervals, but evolve ergodically inperiods longer than finite characteristic correlation times.Marginal probability distributions for these can thereforebe computed as time averages, either as functions on thewhole price space, or at discrete sets of prices. Theirmarginal mean values at a single price p will be denoted〈nb (p)〉, 〈nm (p)〉, respectively.

These means are subject to global balance constraints,between total order placement and removal in the pricespace. Because all limit orders are placed above the bid,the bid-centered configuration obeys a simple balance re-lation:

µ

2=

∞∑

p=b+dp

(α− δ 〈nb (p)〉) . (9)

Eq. (9) says that buy market orders must account, on av-erage, for the difference between all limit orders placed,and all decays. After passing to nondimensional coordi-nates below, this will imply an inverse relation betweencorrections to the classical estimate for diffusivity at earlyand late times, discussed in Sec. III E. In addition, thisconservation law plays an important role in the analysisand determination of the x(N, t)’s, as we will see later inthe text.

The midpoint-centered averages satisfy a different con-straint:

µ

2= α

〈s〉2

+

∞∑

p=b+dp

(α− δ 〈nm (p)〉) . (10)

Market orders in Eq. (10) account not only for the ex-cess of limit order placement over evaporation at pricesabove the midpoint, but also the “excess” orders placedbetween b (t) and m (t). Since these always lead to mid-point shifts, they ultimately appear at positive comov-ing coordinates, altering the shape of 〈nm (p)〉 relativeto 〈nb (p)〉. Their rate of arrival is α 〈m− b〉 = α 〈s〉 /2.These results are also confirmed in simulations.

D. Factorization tests

Whether in the bid-centered frame or the midpointcentered frame, the probability distribution function forthe entire configuration n (p) is too difficult a problemto solve in its entirety. However, an approximate masterequation can be formed for n independently at each p ifall joint probabilities factor into independent marginals,as

Pr ({n (pi)}i) =∏

i

Pr (n (pi)) , (11)

where Pr denotes, for instance, a probability density forn orders in some interval around p.

Whenever orders are sufficiently sparse that the ex-pected number in any price bin is simply the probabilitythat the bin is occupied (up to a constant of propor-tionality), the independence assumption implies a rela-tion between the cumulative distribution for the spreadof the ask and the mean density profile. In units wherethe order size is one, the relation is

Pr (s/2 < p) = 1 − exp

−p−dp∑

p′=b+dp

〈nm (p′)〉

. (12)

This relation is tested against simulation results inFig. 18. One can observe that there are three regimes.

A high-ε regime is defined when the mean density pro-file at the midpoint 〈nm (0)〉 <∼ 1, and strongly concavedownward. In this regime, the approximation of inde-pendent fluctuations is excellent, and a master equationtreatment is expected to be useful. Intermediate-ε is de-fined by 〈nm (0)〉 � 1 and nearly linear, and the approx-imation of independence is marginal. Large-ε is definedby 〈nm (0)〉 � 1 and concave upward, and the approxi-mation of independent fluctuations is completely invalid.These regimes of validity correspond also to the qualita-tive ranges noted already in Sec. II B.

In the bid centered frame however, Eq. 12 never seemsto be valid for any range of parameters. We will discusslater why this might be so. For the present therefore, themaster equation approach is carried out in the midpoint-centered frame. Alternatively, the mean field theory ofthe separations is most convenient in the bid-centeredframe, so that frame will be studied in the dual basis.The relation of results in the two frames, and via the twomethods of treatment, will provide a good qualitative,and for some properties quantitative, understanding ofthe depth profile and its effect on impacts.

It is possible in a modified treatment, to match cer-tain features of simulations at any ε, by limited incorpo-ration of correlated fluctuations. However, the generalmaster equation will be developed independent of these,and tested against simulation results at large ε, where itsdefining assumptions are well met.

18

0 0.5 1 1.5 2 2.5 30

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

a).n

δ / α

and

CD

F (

s / p

C)

p / pC

0 0.2 0.4 0.6 0.8 1 1.2 1.40

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

b).

n δ

/ α a

nd C

DF

(s

/ pC

)

p / pC

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

c).

n δ

/ α a

nd C

DF

(s

/ pC

)

p / pC

FIG. 18: CDFs Pr (s/2 < p) from simulations (thin solid),mean density profile 〈nm (p)〉 from simulations (thick solid),and computed CDF of spread (thin dashed) from 〈nm (p)〉,under the assumption of uncorrelated fluctuations, at threevalues of ε. (a): ε = 0.2 (low market order rate); approxima-tion is very good. (b): ε = 0.02 (intermediate market orderrate); approximation is marginal. (c): ε = 0.002 (high marketorder rate); approximation is very poor.

E. Comments on renormalized diffusion

A qualitative understanding of why the diffusivity isdifferent over short and long times scales, as well as whyit may depend on ε, may be gleaned from the following

observations.

First, global order conservation places a strong con-straint on the classically nondimensionalized density pro-file in the bid-centered frame. We have seen that atε � 1, the density profile becomes concave upward nearthe bid, accounting for an increasing fraction of the al-lowed “remainder area” as ε → 0 (see Figs. 3 and 28).Since this remainder area is fixed at unity, it can be con-served only if the density profile approaches one morequickly with increasing price. Low density at low priceappears to lead to more frequent persistent steps in theeffective short-term random walk, and hence large short-term diffusivity. However, increased density far from thebid indicates less impact from market orders relative tothe relaxation time of the Poisson distribution, and thusa lower long-time diffusivity.

The qualitative behavior of the bid-centered densityprofile is the same as that of the midpoint-centered pro-file, and this is expected because the spread distributionis stationary, rather than diffusive. In other words, theonly way the diffusion of the bid or ask can differ fromthat of the midpoint is for the spread to either increaseor decrease for several succeeding steps. Such autocorre-lation of the spread cannot accumulate with time if thespread itself is to have a stationary distribution. Thus,the shift in the midpoint over some time interval can onlydiffer from that of the bid or ask by at most a constant,as a result of a few correlated changes in the spread. Thisdifference cannot grow with time, however, and so doesnot affect the diffusivity at long times.

Indeed, both of the predicted corrections to the classi-cal estimate for diffusivity are seen in simulation resultsfor midpoint diffusion. The simulation results, however,show that the implied autocorrelations change the dif-fusivity by factors of

√ε, suggesting that these correc-

tions require a more subtle derivation than the one at-tempted here. This will be evidenced by the difficultyof obtaining a source term S in density coordinates (sec-tionIII F), which satisfied both the global order conserva-tion law, and the proper zero-price boundary condition,in the midpoint-centered frame.

An interesting speculation is that the subtlety of thesecorrelations also causes the density n (p, t) in bid-centeredcoordinates not to approximate the mean-field condi-tion at any of the parameters studied here, as noted inSec. III D. Since short-term and long-term diffusivity cor-rections are related by a hard constraint, the difficultyof producing the late-time density profile should matchthat of producing the early-time profile. The midpoint-centered profile is potentially easier, in that the late-timecomplexity must be matched by a combination of theearly-time density profile and the scaling of the expectedspread. It appears that the complex scaling is absorbedin the spread, as per Fig. 10 and Fig. 24, leaving a densitythat can be approximately calculated with the methodsused here.

19

F. Master equations and mean-field

approximations

There are two natural limits in which functional config-urations may become simple enough to be tractable prob-abilistically, with analytic methods. They correspond tomean field theories in which fluctuations of the dual dif-ferentials of either N (p, t) or p (N, t) are independent.In the first case, probabilities may be defined for anydensity n (p, t) independently at each p, and in the sec-ond for the separation intervals x (N, t) at each N . Themean field theory from the first approximation will besolved in Subsec. III F 1, and that from the second inSubsec. III G. As mentioned above, because the fluc-tuation independence approximation is only usable in amidpoint-centered frame, n (p, t) will refer always to thisframe. x (N, t) is well-defined without reference to anyframe.

1. A number density master equation

If share-number fluctuations are independent at differ-ent p, a density π (n, p, t) may be defined, which gives theprobability to find n orders in bin (p, p+ dp), at time t.The normalization condition defining π as a probabilitydensity is

∑

n

π (n, p, t) = 1, (13)

for each bin index p and at every t. The index t willbe suppressed henceforth in the notation since we arelooking for time-independent solutions.

Supposing an arbitrary density of order-book config-urations π (n, p) at time t, the stochastic dynamics ofthe configurations causes probability to be redistributedaccording to the master equation

∂

∂tπ (n, p) =

α (p) dp

σ[π (n− σ, p) − π (n, p)]

+δ

σ[(n+ σ) π (n+ σ, p) − nπ (n, p)]

+µ (p)

2σ[π (n+ σ, p) − π (n, p)]

+∑

∆p

P+ (∆p) [π (n, p− ∆p) − π (n, p)]

+∑

∆p

P− (∆p) [π (n, p+ ∆p) − π (n, p)] . (14)

Here ∂π (n, p) /∂t is a continuum notation for[π (n, p, t+ δt) − π (n, p, t)] /δt, where δt is an ele-mentary time step, chosen short enough that at mostone event alters any typical configuration. Eq. (14)represents a general balance between additions and re-movals, without regard to the meaning of n. Thus, α (p)

is a function that must be determined self-consistentlywith the choice of frame. As an example of how thisworks, in a bid-centered frame, α (p) takes a fixed valueα (∞) at all p, because the deposition rate is independentof position and frame shifts. The midpoint-centeredframe is more complicated, because depositions belowthe midpoint cause shifts that leave the depositedorder above the midpoint. The specific consequence forα (p) in this case will be considered below. µ (p) /2 is,similarly, the rate of market orders surviving to cancellimit orders at price p. µ (p) /2 decreases from µ (0) /2at the ask (for buy market orders, because µ total ordersare divided evenly between buys and sells) to zero asp → ∞, as market orders are screened probabilisticallyby intervening limit orders. α (∞) and µ (0) are thus theparameters α and µ of the simulation.

The lines of Eq. (14) correspond to the followingevents. The term proportional to α (p) dp/σ describesdepositions of discrete orders at that rate (because α isexpressed in shares per price per time), which raise con-figurations from n − σ to n shares at price p. The termproportional to δ comes from deletions and has the op-posite effect, and is proportional to n/σ, the number oforders that can independently decay. The term propor-tional to µ (p) /2σ describes market order annihilations.For general configurations, the preceding three effectsmay lead to shifts of the origin by arbitrary intervals ∆p,and P± are for the moment unknown distributions overthe frequency of those shifts. They must be determinedself-consistently with the configuration of the book whichemerges from any solution to Eq. (14).

A limitation of the simple product representation offrame shifts is that it assumes that whole order-bookconfigurations are transported under p ± ∆p → p, in-dependently of the value of n (p). As long as fluctuationsare independent, this is a good approximation for ordersat all p which are not either the bid or the ask, eitherbefore or after the event that causes the shift. The cor-relations are never ignorable for the bins which are thebid and ask, though, and there is some distribution ofinstances in which any p of interest plays those parts.Approximate methods to incorporate those correlationswill require replacing the product form with a sum ofproducts conditioned on states of the order book, as willbe derived below.

The important point is that the order-flow dependenceof Eq. (14) is independent of these self-consistency re-quirements, and may be solved by use of generating func-tionals at general α (p), µ (p), and P±. The solution, ex-act but not analytically tractable at general dp, will bederived in closed form in the next subsection. It has awell-behaved continuum limit at dp→ 0, however, whichis analytically tractable, so that special case will be con-sidered in the following subsection.

20

2. Solution by generating functional

The moment generating functional for π is defined fora parameter λ ∈ [0, 1], as

Π (λ, p) ≡∞∑

n/σ=0

λn/σπ (n, p) . (15)

Introducing a shorthand for its value at λ = 0,

Π (0, p) = π (0, p) ≡ π0 (p) , (16)

while the normalization condition (13) for probabilitiesgives

Π (1, p) = 1 , ∀p. (17)

By definition of the average of n (p) in the distributionπ, denoted 〈n (p)〉,

∂

∂λΠ (λ, p)

∣

∣

∣

∣

λ→1

=〈n (p)〉σ

, (18)

and because Π will be regular in some sufficiently smallneighborhood of λ = 1, one can expand

Π (λ, p) = 1 + (λ− 1)〈n (p)〉σ

+ O(λ− 1)2. (19)

Multiplying Eq. (14) by λn/σ and summing over n,(and suppressing the argument p in the notation every-where; α (p) or α (0) will be used where the distinctionof the function from its boundary value is needed) thestationary solution for Π must satisfy

0 =λ− 1

σ

{

αdpΠ − δ σ∂Π

∂λ− µ

2λ(Π − π0)

}∣

∣

∣

∣

(λ,p)

+∑

∆p

P+ (∆p) [Π (λ, p− ∆p) − Π (λ, p)]

+∑

∆p

P− (∆p) [Π (λ, p+ ∆p) − Π (λ, p)] . (20)

Only the symmetric case with no net drift will be con-sidered here for simplicity, which requires P+ (∆p) =P− (∆p) ≡ P (∆p). In a Fokker-Planck expansion, the(unrenormalized) diffusivity of whatever reference priceis used as coordinate origin, is related to the distributionP by

D ≡∑

∆p

P (∆p) ∆p2. (21)

The rate at which shift events happen is

R ≡∑

∆p

P (∆p) , (22)

and the mean shift amount appearing at linear order inderivatives (relevant at p→ 0), is

〈∆p〉 ≡∑

∆p P (∆p) ∆p∑

∆p P (∆p). (23)

Anywhere in the interior of the price range (where p isnot at any stage the bid, ask, or a point in the spread),Eq. (20) may be written

{

∂

∂λ− D

δ (λ− 1)

∂2

∂p2− αdp− µ/2λ

δ σ

}

Π =µ

2δ σλπ0.

(24)Evaluated at λ → 1, with the use of the expansion (19),this becomes

(

1 − D

δ

∂

∂p2

)

〈n〉 =αdp

δ− µ

2δ(1 − π0) . (25)

At this point it is convenient to specialize to the casedp→ 0, wherein the eligible values of any 〈n (p)〉 becomejust σ and zero. The expectation is then related to theprobability of zero occupancy (at each p) as

〈n〉 = σ [1 − π0] , (26)

yielding immediately

αdp

δ=

[

µ

2δ σ+

(

1− D

δ

d2

dp2

)]

〈n〉 . (27)

Eq. (27) defines the general solution 〈n (p)〉 for themaster equation (14), in the continuum limit 2αdp/µ →0. The shift distribution P (∆p) appears only throughthe diffusivity D, which must be solved self-consistently,along with the otherwise arbitrary functions α and µ.The more general solution at large dp is carried out inApp. B 1.

A first step toward nondimensionalization may betaken by writing Eq. (27) in the form (re-introducingthe indexing of the functions)

α (p)

α (∞)=

[

µ (p)

µ (0)+ ε

(

1 − D

δ

d2

dp2

)]

1

ε

δ 〈n〉 .αdp

. (28)

Far from the midpoint, where only depositions and can-cellations take place, orders in bins of width dp are Pois-son distributed with mean α (∞) dp/δ. Thus, the asymp-totic value of δ 〈n〉 /α (∞) dp at large p is unity. This isconsistent with a limit for α (p) /α (∞) of unity, and alimit for the screened µ (p) /µ (0) of zero. The reason forgrouping the nondimensionalized number density with1/ε, together with the proper normalization of the char-acteristic price scale, will come from examining the decayof the dimensionless function µ (p) /µ (0).

3. Screening of the market-order rate

In the context of independent fluctuations, Eq. (26)implies a relation between the mean density and the rateat which market orders are screened as price increases.The effect of a limit order, resident in the price bin p whena market order survives to reach that bin, is to preventits arriving at the bin at p + dp. Though the nature ofthe shift induced, when such annihilation occurs, depends

21

on the comoving frame being modeled, the change in thenumber of orders surviving is independent of frame, andis given by

dµ = −µ (1 − π0) = −µ 〈n〉 /σ. (29)

Eq. (29) may be rewritten

d log (µ (p) /µ (0)) =

dp− 1

ε

(

2α (∞)

µ (0)

)(

δ 〈n (p)〉α (∞) dp

)

, (30)

identifying the characteristic scale for prices as pc =µ (0) /2α (∞) ≡ µ/2α. Writing p ≡ p/pc, the functionthat screens market orders is the same as the argumentof Eq. (28), and will be denoted

1

ε

δ 〈n (p)〉α (∞) dp

≡ ψ (p) (31)

Defining a nondimensionalized diffusivity β ≡ D/δp2c,

Eq. (27) can then be put in the form

α (p)

α (∞)=

[

µ (p)

µ (0)+ ε

(

1 − βd2

dp2

)]

ψ, (32)

with

µ (p)

µ (0)≡ ϕ (p) = exp

(

−∫ p

0

dp′ψ (p′)

)

, (33)

4. Verifying the conservation laws

Since nothing about the derivation so far has made ex-plicit use of the frame in which n is averaged, the combi-nation of Eq. (32) with Eq. (33) respects the conservationlaws (9) and (10), if appropriate forms are chosen for thedeposition rate α (p).

For example, in the bid-centered frame, α (p) /α (∞) =1 everywhere. Multiplying Eq. (32) by dp and integratingover the whole range from the bid to +∞, we recover thenondimensionalized form of Eq. (9):

∫ ∞

0

dp (1 − εψ) = 1, (34)

iff we are careful with one convention. The integral ofthe diffusion term formally produces the first derivativedψ/dp|∞0 . We must regard this as a true first derivative,and consider its evaluation at zero continued far enoughbelow the bid to capture the identically zero first deriva-tive of the sell order depth profile.

In the midpoint centered frame, the correct formfor the source term should be α (p) /α (∞) = 1 +Pr (s/2 ≥ p), whatever the expression for the cumulativedistribution function. Recognizing that the integral ofthe CDF is, by parts, the mean value of s/2, the sameintegration of Eq. (32) gives

∫ ∞

0

dp (1 − εψ) = 1 − 〈s〉2, (35)

the nondimensionalized form of Eq. (10). Again, thisworks only if the surface contribution from integratingthe diffusion term vanishes.

Neither of these results required the assumption of in-dependent fluctuations, though that will be used belowto give a simple approximate form for Pr (s/2 ≥ p) ≈ϕ (p). They therefore provide a check that the extinctionform (33) propagates market orders correctly into the in-terior of the order-book distribution, to respect globalconservation. They also check the consistency of the in-tuitively plausible form for α in the midpoint-centeredframe. The detailed form is then justified whenever theassumption of independent fluctuations is checked to bevalid.

5. Self-consistent parametrization

The assumption of independent fluctuations of n (p)used above to derive the screening of market orders, isequivalent to a specification of the CDF of the ask. Mar-ket orders are only removed between prices p and p+ dpin those instances when the ask is at p. Therefore

Pr (s/2 ≥ p) = ϕ (p) , (36)

the continuum limit of Eq. (12). Together with the formα (p) /α (∞) = 1 + Pr (s/2 ≥ p), Eq. (32) becomes

1 + ϕ = −[

dϕ

dp+ ε

(

1 − βd2

dp2

)

d logϕ

dp

]

. (37)

(If the assumption of independent fluctuations were validin the bid-centered frame, it would take the same form,but with ϕ removed on the left-hand side.)

To consistently use the diffusion approximation, withthe realization that for p = 0, nπ (n, p− ∆p) = 0 foressentially all ∆p in Eq. (14), it is necessary to set theFokker-Planck approximation to ψ (0 − 〈∆p〉) = 0 as aboundary condition. Nondimensionalized, this gives

β

2

d2ψ

dp2

∣

∣

∣

∣

0

=R

δ

(

〈∆p〉 ddp

− 1

)

ψ

∣

∣

∣

∣

0

, (38)

where R is the rate at which shifts occur (Eq. 22). Inthe solutions below, the curvature will typically be muchsmaller than ψ (0) ∼ 1, so it will be convenient to enforcethe simpler condition

〈∆p〉 dψdp

∣

∣

∣

∣

0

− ψ (0) ≈ 0, (39)

and verify that it is consistent once solutions have beenevaluated.

Self-consistent expressions for β and 〈∆p〉 are then con-structed as follows. Given an ask at some position a (inthe midpoint-centered frame), there is a range from −ato a in which sell limit orders may be placed, which willinduce positive midpoint-shifts. The shift amount is half

22

n δ

/ α a

nd C

DF

(s

/ pC

)

p / pC0 1 2 3 4 5

0

0.2

0.4

0.6

0.8

1

1.2

FIG. 19: Fit of the self-consistent solution with diffusivityterm to simulation results for the midpoint-centered frame.Thin solid line is the analytic solution for the mean numberdensity, and thick solid line is simulation result, at ε = 0.02.Thin dashed line is the analytic prediction for the cumulativedistribution function Pr (s/2 ≤ p), and thick dashed line issimulation result.

as great as the distance from the bid, so the measurefor shifts dP+ (∆p) from sell limit-order addition inher-its a term 2α (0) (d∆p) Pr (a ≥ ∆p), where the last fac-tor counts the instances with asks large enough to admitshifts by ∆p. There is an equal contribution to dP− fromaddition of buy limit orders. Symmetry requires that forevery positive shift due to an addition, there is a nega-tive shift due to evaporation with equal measure, so thecontribution from buy limit order removal should equalthat for sell limit order addition. When these contribu-tions are summed, the measures for positive and negativeshifts both equal

dP± (∆p) = 4α (∞) (d∆p) Pr (a ≥ ∆p) . (40)

Eq. (40) may be inserted into the continuum limit ofthe definition (21) for D, and then nondimensionalizedto give

β =4

ε

∫ ∞

0

d∆p (∆p)2ϕ (∆p) , (41)

where the mean-field substitution of ϕ (∆p) forPr (a ≥ ∆p) has been used. Similarly, the mean shiftamount used in Eq. (39) is

〈∆p〉 =

∫∞

0d∆p (∆p)ϕ (∆p)∫∞

0d∆p ϕ (∆p) .

(42)

A fit of Eq. (37) to simulations, using these self-consistent measures for shifts, is shown in Fig. 19. Thissolution is actually a compromise between approxima-tions with opposing ranges of validity. The diffusionequation using the mean order depth describes nonzerotransport of limit orders through the midpoint, an ap-proximation inconsistent with the correlations of shifts

with states of the order book. This approximation is asmall error only at ε → 0. On the other hand, both theform of α, and the self-consistent solutions for 〈∆p〉 andβ, made use of the mean-field approximation, which wesaw was only valid for ε <∼ 1. The two approximationsappear to create roughly compensating errors in the in-termediate range ε ∼ 0.02.

6. Accounting for correlations

The numerical integral implementing the diffusion so-lution actually doesn’t satisfy the global conservationcondition that the diffusion term integrate to zero overthe whole price range. Thus, it describes diffusive trans-port of orders through the midpoint, and as such alsodoesn’t have the right p = 0 boundary condition. Theeffective absorbing boundary represented by the pure dif-fusion solution corresponds roughly to the approxima-tion made by Bouchaud et al. [23] It differs from theirs,though, in that their method of images effectively ap-proximates the region of the spread as a point, whereasEq. (32) actually resolves the screening of market ordersas the spread fluctuates.

Treating the spread region – roughly defined as therange over which market orders are screened – as a pointis consistent with treating the resulting coarse-grained“midpoint” as an absorbing boundary. If the spread is re-solved, however, it is not consistent for diffusion to trans-port any finite number density through the midpoint, be-cause the midpoint is strictly always in the center of anopen set with no orders, in a continuous price space. Thecorrect behavior in a neighborhood of the “fine-grainedmidpoint” can be obtained by explicitly accounting forthe correlation of the state of orders, with the shifts thatare produced when market or limit order additions occur.

We expect the problem of recovering both the globalconservation law and the correct p = 0 boundary con-dition to be difficult, as it should be responsible for thenon-trivial corrections to short-term and long-term dif-fusion mentioned earlier. We have found, however, thatby explicitly sacrificing the global conservation law, wecan incorporate the dependence of shifts on the positionof the ask, in an interesting range around the midpoint.At general ε, the corrections to diffusion reproduce themean density over the main support of the CDF of thespread. While the resulting density does not predict thatCDF (due to correlated fluctuations), it closely enoughresembles the real density that the independent CDFs ofthe two are similar.

7. Generalizing the shift-induced source terms

Nondimensionalizing the generating-functional masterequation (20) and keeping leading terms in dp at λ→ 1,

23

0 0.5 1 1.5 2 2.5 3 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1 n

δ / α

and

CD

F (

s / p

C)

p / pC

FIG. 20: Reconstruction with source terms S that approxi-mately account for correlated fluctuations near the midpoint.ε = 0.2. Thick solid line is averaged order book depth fromsimulations, and thin solid is the mean field result. Thin dot-ted line is the simulated CDF for s/2, and thick dotted lineis the mean field result. Thick dashed line is the CDF thatwould be produced from the simulated depth, if the mean-field approximation were exact.

get

α (p)

α (∞)=

(

µ (p)

µ(

0) + ε

)

ψ (p)

−∫

dP+ (∆p) [ψ (p− ∆p) − ψ (p)]

−∫

dP− (∆p) [ψ (p+ ∆p) − ψ (p)] (43)

where dP± (∆p) is the nondimensionalized measure thatresults from taking the continuum limit of P± in the vari-able ∆p.

Eq. (43) is inaccurate because the number of ordersshifted into or out of a price bin p, at a given spread,may be identically zero, rather than the unconditionalmean value ψ. We take that into account by replacingthe last two lines of Eq. (43) with lists of source terms,whose forms depend on the position of the ask, weightedby the probability density for that ask. Independent fluc-tuations are assumed by using Eq. (36).

It is convenient at this point to denote the replacementof the last two lines of Eq. (43) with the notation S,yielding

α (p)

α (∞)=

(

µ (p)

µ(

0) + ε

)

ψ − S. (44)

The global conservation laws for orders would be satisfiedif∫

dpS = 0.The source term S is derived approximately in

App. B 2. The solution to Eq. (44) at ε = 0.2, withthe simple-diffusive source term replaced by the evalua-tions (B29 - B37), is compared to the simulated order-

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

n δ

/ α a

nd C

DF

(s

/ pC

)

p / pC

FIG. 21: Reconstruction with correlated source terms for ε =0.02. Line style and thickness are the same as in Fig. 20.

0 0.5 1 1.50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

n δ

/ α a

nd C

DF

(s

/ pC

)

p / pC

FIG. 22: Reconstruction with correlated source terms for ε =0.002. Line style and thickness are the same as in Fig. 20.

book depth and spread distribution in Fig. 20. The sim-ulated 〈n (p)〉 satisfies Eq. (35), showing what is the cor-rect “remainder area” below the line 〈n〉 ≡ 1. The nu-merical integral deviates from that value by the incorrectintegral

∫

dpS 6= 0. However, most of the probability forthe spread lies within the range where the source termsS are approximately correct, and as a result the distri-bution for s/2 is predicted fairly well.

Even where the mean-field approximation is known tobe inadequate, the source terms defined here capturemost of the behavior of the order-book distribution inthe region that affects the spread distribution. Fig. 21shows the comparison to simulations for ε = 0.02, andFig. 22 for ε = 0.002. Both cases fail to reproduce thedistribution for the spread, and also fail to capture thelarge-p behavior of ψ. However, they approximate ψ atsmall p well enough that the resulting distribution forthe spread is close to what would be produced by thesimulated ψ if fluctuations were independent.

24

G. A mean-field theory of order separation

intervals: The Independent Interval Approximation

A simplifying assumption that is in some sense dualto independent fluctuations of n (p), is independent fluc-tuations in the intervals x (N) at different N . Here wedevelop a mean-field theory for the order separation in-tervals in this model. From this, we will also be able tomake an estimate of the depth profiles for any value ofthe parameters. For convenience of notation we will usexN to denote x (N).

Limit order placements are considered to take placestrictly on sites which are not occupied. This is the samelevel of approximation as made in the previous section.The time step is normalized to unity, as above, so thatrates are equal to probabilities after one update of thewhole configuration. The rates α and µ used in this sec-tion correspond to α(∞) and µ(0) as defined earlier.

As shown in Fig. 15 the configuration is entirely spec-ified instant by instant if the instantaneous values of theorder separation intervals are known.

Consider now, how these intervals might change dueto various processes. For the spread x0, these processesand the corresponding change in x0, are listed below.

1. x0 → x0 + x1 with rate (δ + µ/2σ) (when the askeither evaporates or is deleted by a market order).

2. x0 → x0 + x−1 with rate (δ + µ/2σ) (when the bideither evaporates or is deleted by the correspondingmarket order).

3. x0 → x′ for any value 1 ≤ x′ ≤ x0 − 1, whena sell limit order is deposited anywhere in thespread. The rate for any single deposition isαdp/σ, so the cumulative rate for some depositionis αdp (x0 − 1) /σ. (The −1 comes from the prohi-bition against depositing on occupied sites.)

4. Similarly x0 → x0 − x′ for any 1 ≤ x′ ≤ x0 − 1,when a buy limit order is deposited in the spread,also with cumulative rate αdp (x0 − 1) /σ.

5. Since the above processes describe all possiblesingle-event changes to the configuration, the prob-ability that it remains unchanged in a single timestep is 1 − 2δ − µ/σ − 2αdp (x0 − 1) /σ.

In all that follows, we will put σ = 1 without loss ofgenerality. If we know x0, x1, and x−1 at time t, theexpected value at time t+ dt is then

x0 (t+ dt) = x0 (t) [1 − 2δ − µ0 − 2α (x0 − 1)]

+ (x0 + x1)(

δ +µ

2

)

+ (x0 + x−1)(

δ +µ

2

)

+ (α0dp)x0 (x0 − 1) (45)

Here, xi(t) represents the value of the interval averagedover many realizations of the process evolved up to timet.

Again representing the finite difference as a timederivative, the change in the expected value, given x0,x1, and x−1, is

dx0

dt= (x1 + x−1)

(

δ +µ

2

)

− (αdp)x0 (x0 − 1) . (46)

Were it not for the quadratic term arising from depo-sition, Eq. (46) would be a linear function of x0, x1, andx−1. However we now need an approximation for

⟨

x20

⟩

,where the angle brackets represent an average over real-izations as before or equivalently a time average in thesteady state. Let us for the moment assume that we canapproximate

⟨

x20

⟩

by a〈x0〉2, where a is some as yet un-determined constant to be determined self-consistently.We will make this approximation for all the xk’s. Thisis clearly not entirely accurate because the PDF of xkcould depend on k (as indeed it does. We will commenton this a little later). However as we will see this is stilla very good approximation.

We will therefore make this approximation in Eq. (46)and everywhere below, and look for steady state solutionswhen the xk’s have reached a time independent averagevalue.

It then follows that,

(δ + µ/2) (x1 + x−1) = aαdpx0 (x0 − 1) (47)

The interval xk may be though of as the inverse of the

density at a distance∑k−1

j=0 xj from the bid. That is,

xi ≈ 1/⟨

n(

∑i−1j=0 xjdp

)⟩

, the dual to the mean depth,

at least at large i. It therefore makes sense to introducea normalized interval

xi ≡ εα

δxidp =

xidp

pc≈ 1

ψ(

∑i−1j=0 xj

) , (48)

the mean-field inverse of the normalized depth ψ. In thisnondimensionalized form, Eq. (47) becomes

(1 + ε) (x1 + x−1) = ax0 (x0 − dp) , (49)

where dp = dp/pc.Since the depth profile is symmetric about the origin,

x1 = x−1. From the equations, it can be seen that thisansatz is self-consistent and extends to all higher xi. Sub-stituting this in Eq. (49) we get

(1 + ε) x1 =a

2x0 (x0 − dp) = (1 + ε) x−1 (50)

Proceeding to the change of x1, the events that can oc-cur, with their probabilities, are shown in Table V, withthe remaining probability that x1 remains unchanged.

The differential equation for the mean change of x1 canbe derived along previous lines and becomes

dx1

dt=(

2δ +µ

2

)

x2 −(

δ +µ

2

)

x1

+ αdp

[

x0 (x0 − 1)

2− x1 (x1 − 1)

2− x1 (x0 − 1)

]

(51)

25

case rate rangex1 → x2 (δ + µ/2)x1 → (x1 + x2) δx1 → x′ αdp x′ ∈ (1, x0 − 1)x1 → x1 − x′ αdp x′ ∈ (1, x1 − 1)

TABLE V: Events that can change the value of x1, with theirrates of occurrence.

Note that in the above equations, the mean-field approx-imation consists of assuming that terms like 〈x0x1〉 areapproximated by the product 〈x0〉〈x1〉. This is thus an‘independent interval’ approximation.

Nondimensionalizing Eq. (51) and combining the resultwith Eq. (50) gives the stationary value for x2 from x0

and x1,

(1 + 2ε) x2 =a

2x1 (x1 − dp) + x1 (x0 − dp) . (52)

Following the same procedure for general k, the nondi-mensionalized recursion relation is

(1 + kε) xk =a

2xk−1 (xk−1 − dp) + xk−1

k−2∑

i=0

(xi − dp) .

(53)

1. Asymptotes and conservation rules

Far from the bid or ask, xk must go to a constantvalue, which we denote x∞. In other words, for large k,xk+1 → xk . Taking the difference of Equation (53) fork + 1 and k in this limit gives the identification

εx∞ = x∞ (x∞ − dp) , (54)

or x∞ = ε+dp. Apart from the factor of dp, arising fromthe exclusion of deposition on already-occupied sites, thisagrees with the limit ψ (∞) → 1/ε found earlier. In thecontinuum limit dp→ 0 at fixed ε, these are the same.

From the large-k limit of Eq. (53), one can also solveeasily for the quantity S∞ ≡

∑∞i=0 (xi − x∞), which is

related to the bid-centered order conservation law men-tioned in Section III C. Dividing by a factor of x∞ atlarge k,

(1 + kε) =a

2(x∞ − dp) +

k−2∑

i=0

(xi − dp) , (55)

or, using Eq. (54) and rewriting the sum on the righthand

side as∑k−2

i=0 (xi − x∞) +∑k−2

i=0 x∞ − dp

1 + (1 − a

2)ε = S∞. (56)

The interpretation of S∞ is straightforward. There are

k+1 orders in the price range∑k

i=0 xi. Their decay rate

ε S∞ from theory S∞ from MCS0.66 1 1.0000.2 1 1.0000.04 1 0.9980.02 1 1.000

TABLE VI: Theoretical vs. results from simulations for S∞.

is δ (k + 1), and the rate of annihilation from market or-ders is µ/2. The rate of additions, up to an uncertaintyabout what should be considered the center of the in-terval, is (αdp)

∑ki=0(xi − 1) in the bid-centered frame

(where effective α is constant and additions on top ofpreviously occupied sites is forbidden). Equality of addi-tion and removal is the bid-centered order conservationlaw (again), in the form

µ

2+ δ (k + 1) = αdp

k∑

i=0

(xi − 1). (57)

Taking k large, nondimensionalizing, and using Eq. (54),Eq. (57) becomes

1 = S∞. (58)

This conservation law is indeed respected to a remark-able accuracy in Monte Carlo simulations of the modelas indicated in table VI.

In order that the equation for the x’s obey this exactconservation law, we require Eq. 56 to be equal to Eq.58. We can hence now self-consistently set the value ofa = 2.

The value of a implies that we have now set⟨

x2k

⟩

∼2 〈xk〉2. This would be strictly true if the probability dis-tribution function of the interval xk were exponentiallydistributed for all k. This is generally a good approxi-mation for large k for any ε. Fig 23 shows the numericalresults from Monte Carlo simulations of the model, forthe probability distribution function for three intervalsx0, x1 and x5 at ε = 0.1. The functional form for P (x0)and P (x1) are better approximated by a Gaussian thanan exponential. However P (x5) is clearly an exponential.

Eq. (58) has an important consequence for the short-term and long-term diffusivities, which can also be seen insimulations, as mentioned in earlier sections. The nondi-mensionalization of the diffusivity D with the rate pa-rameters, suggests a classical scaling of the diffusivity

D ∼ p2cδ =

µ2

4α2δ. (59)

As mentioned earlier, it is observed from simulations thatthe locally best short-time fit to the actual diffusivity ofthe midpoint is ∼

√

1/ε times the estimate (59), and thelong-time diffusivity is ∼ √

ε times the classical estimate.While we do not yet know how to derive this relationanalytically, the fact that early and late-time renormal-izations must have this qualitative relation can be arguedfrom the conservation law (58).

26

1e-06

1e-05

0.0001

0.001

0.01

0.1

1

0 5 10 15 20 25 30 35 40 45

P x (y

)

y

FIG. 23: The probability distribution functions Px(y) vs. yfor the intervals x = x0, x1 and x5 at ε = 0.1, on a semi-logscale. Solid curve is for x0, dashed for x1, and dot for x5. Thefunctional form of the distribution changes from a Gaussianto an exponential.

S∞ is the area enclosed between the actual density andthe asymptotic value. Increases in 1/ε (descaled market-order rate) deplete orders near the spread, diminishingthe mean depth at small p, and induce the upward cur-vature seen in Fig. 3, and even more strongly in Fig. 28below. As noted above, they cause more frequent shifts(more than compensating for the slight decrease in aver-age step size), and increase the classically descaled diffu-sivity β. However, as a result, this increases the fractionof the area in S∞ accumulated near the spread, requir-ing that the mean depth at larger p increase to compen-sate (see Fig. 28). The resulting steeper approach tothe asymptotic depth at prices greater than the meanspread, and the larger negative curvature of the distribu-tion, are fit by an effective diffusivity that decreases withincreasing 1/ε. Since the distribution further from themidpoint represents the imprint of market order activityfurther in the past, this effective diffusivity describes thelong-term evolution of the distribution. The resulting an-ticorrelation of the small-p and large-p effective diffusionconstants implied by conservation of the area S∞ is ex-actly consistent with their respective ∼

√

1/ε and ∼ √ε

scalings. The general idea here is to connect diffusivitiesat short and long time scales to the depth profile nearthe spread and far away from the spread respectively.The conservation law for the depth profile, then impliesa connection between these two diffusivities.

2. Direct simulation in interval coordinates

The set of equations determined by the generalform (53) is ultimately parametrized by the single in-put x0. The correct value for x0 is determined whenthe xk are solved recursively, by requiring convergenceto x∞. We do this recursion numerically, in the samemanner as was done to solve the differential equation for

0.3

0.4

0.5

0.6

0.7

0.8

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35

<s>

/pc

ε

FIG. 24: The mean value of the spread in nondimensionalunits s = s/pc as a function of ε. The numerical valueabove (solid) is compared with the theoretical estimate be-low (dash). .

0.0001

0.01

1

100

1 10

FIG. 25: Four pairs of curves for the quantity xk/x∞ − 1vs. k. The value of ε increases from top to bottom (ε =0.02, 0.04, 0.2, 0.66). In each pair of curves, the markers areobtained from simulations while the solid curve is the predic-tion of Eq. 53 evaluated numerically. The difference betweennumerics and mean-field increases as ε decreases, especiallyfor large k.

the normalized mean density ψ (p).In Fig. 24 we compare the numerical result for x0 with

the analytical estimate generated as explained above.The results are surprisingly good throughout the entirerange. Though the theoretical value consistently under-estimates the numerical value, yet the functional form iscaptured accurately.

In Fig. 25, the values of xk for all k, are compared tothe values determined directly from simulations.

Fig. 26 shows the same data on a semilog scale forxk/x∞ − 1, showing the exponential decay at large ar-gument characteristic of a simple diffusion solution. TheIIA is clearly a good approximation for large ε. How-ever for small ε it starts deviating significantly from the

27

0.0001

0.01

1

100

0 10 20 30 40 50 7060

^

FIG. 26: Same plot as Fig. 25 but on a semi-log scale to showexponential decay at large k.

simulations, especially for large k.

The values of xk computed from the IIA, can be verydirectly used to get an estimation of the price impact.The price impact, as defined in earlier sections, can bethought of as the change in the position of the mid-point (or the bid), consecutive to a certain number oforders being filled. Within the framework of the sim-plified model we study here, this is simply the quantity

〈∆m〉 = 1/2∑kk′=1 xk′ , for k orders. The factor of 1/2

comes from considering the change in the position of themidpoint and not the bid. Fig 27 shows 〈∆m〉 nondi-mensionalized by pc plotted as a function of the numberof orders (multiplied by ε), for three different values of ε.Again, the theory matches quite well with the numerics,qualitatively. For large ε the agreement is quantitativeas well.

The simplest approximation to the density profiles inthe midpoint-centered frame is to continue to approxi-mate the mean density as 1/xk, but to regard that den-

sity as evaluated at position x0/2+∑ik=1 xk. This clearly

is not an adequate treatment in the range of the spread,both because the intervals are discrete, whereas meanψ is continuous, and because the density profiles satisfydifferent global conservation laws associated with non-constancy of α. For large k however, this approximationmight hold. The mean-field values (only) correspondingto a plot of εψ (p) versus p, are shown in Fig. 28. Herethe theoretically estimated xk’s at different parametervalues are used to generate the depth profile using theprocedure detailed above.

A comparison of the theoretically estimated profileswith the results from Monte Carlo simulations of themodel, is shown in Fig. 29. As evident, the theoreti-cal estimate for the density profile is better for large εrather than small ε.

We can also generalize the above analysis to when theorder placement process is no longer uniform. In partic-ular it has been found that a power-law order placementprocess is relevant [23, 26]. We carry out the above analy-

0

0.1

0.2

0.3

0.4

0.5

0.6

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

<∆m

> /p

c

N ε

FIG. 27: Three pairs of curves for the quantity 〈∆m〉/pc vs.

Nε where 〈∆m〉 = 1/2�

N

k=1xk. The value of ε increases

from top to bottom (ε = 0.002, 0.02, 0.2). In each pair ofcurves, the markers are obtained from simulations while thesolid curve is the prediction of the IIA. For ε = 0.002, weshow only the theoretical prediction. The theory capturesthe functional form of the price impact curves for differentε. Quantitatively, its better for larger epsilon, as remarkedearlier.

0

0.2

0.4

0.6

0.8

1

1.2

1 2 3 4

n δ

/ α

p / pC

FIG. 28: Density profiles for different values of ε ranging overthe values 0.2, 0.02, 0.004, 0.001, obtained from the Indepen-dent Interval Approximation.

sis for when α = ∆0β/(∆ + ∆0)

β where ∆ is the distancefrom the current bid and ∆0 determines the ’shoulder’ ofthe power-law. We find an interesting dependence of theexistence of solutions on β. In particular we find that forβ > 1, ∆0 needs to be larger than some value (which de-pends on β as well as other parameters of the model suchas µ and δ) for solutions of the IIA to exist. This mightbe interpreted as a market order wiping out the entirebook, if the exponent is too large. When solutions exist,we find that the the depth profile has a peak , consistentwith the findings of [23]. In Fig. 30 the depth profiles

28

0.8

0.6

0.4

0.2

1

0 1 2 3 4

n δ

/ α

p / pC

FIG. 29: Density profiles from Monte Carlo simulation (mark-ers) and the Independent Interval Approximation (lines).Pluses and dash line are for ε = 0.2, while crosses and dottedline are for ε = 0.02.

0.11 10 100 1000

p

n

0.2

0.4

0.3

0.5

0.6

FIG. 30: Density profiles for a power-law order placementprocess for different values of ∆0.

for three different values of ∆0 are plotted.

IV. CONCLUDING REMARKS

A. Ongoing work on empirical validation

Members of our group are working on the problem ofempirically testing this model. We are using a datasetfrom the London Stock Exchange. We have chosen thisdata because it contains every order and every cancel-lation. This makes it possible to measure all the pa-rameters of this model directly. It is also possible to re-construct the order book and measure all the statisticalproperties we have studied in this paper. Our empiricalwork so far shows that, despite its many limitations, our

model can act as an effective guide to future research.We believe that the main discrepancies between the pre-dictions of our model and the data can be dealt withby using a more sophisticated model of order flow. Wesummarize some of the planned improvements in the fol-lowing subsection.

B. Future Enhancements

As we have mentioned above, the zero intelligence, IIDorder flow model should be regarded as just a startingpoint from which to add more complex behaviors. Weare considering several enhancements to the order flowprocess whose effects we intend to discuss in future pa-pers. Some of the enhancements include:

• Trending of order flow.

We have demonstrated that IID order flow neces-sarily leads to non-IID prices. The converse is alsotrue: Non-IID order flow is necessary for IID prices.In particular, the order flow must contain trends,i.e. if order flow has recently been skewed towardbuying, it is more likely to continue to be skewedtoward buying. If we assume perfect market effi-ciency, in the sense that prices are a random walk,this implies that there must be trends in order flow.

• Power law placement of limit prices

For both the London Stock Exchange and the ParisBourse, the distribution of the limit price relativeto the best bid or ask appears to decay as a power-law [23, 26]. Our investigations of this show thatthis can have an important effect. Exponents largerthan one result in order books with a finite num-ber of orders. In this case, depending on other pa-rameters, there is a finite probability that a singlemarket order can clear the entire book (see Sec-tion III G 2).

• Power law or log-normal order size distribution.

Real order placement processes have order size dis-tributions that appear to be roughly like a log-normal distribution with a power law tail [27]. Thishas important effects on the fluctuations in liquid-ity.

• Non-Poisson order cancellation process.

When considered in real time order placement can-cellation does not appear to be Poisson [21]. How-ever, this may not be a bad approximation in eventtime rather than real time.

• Conditional order placement.

Agents may conditionally place larger market or-ders when the book is deeper, causing the marketimpact function to grow more slowly. We intendto measure this effect and incorporate it into ourmodel.

29

• Feedback between order flow and prices.

In reality there are feedbacks between order flowand price movements, beyond the feedback in thereference point for limit order placement built intothis model. This can induce bursts of trading, caus-ing order flow rates to speed up or slow down, andgive rise to clustered volatility.

The last item is just one of many examples of how onecan surely improve the model by making order flow con-ditional on available information. However, we believe itis important to first gain an understanding of the prop-erties of simple unconditional models, and then build onthis foundation to get a fuller understanding of the prob-lem.

C. Comparison to standard models based on

valuation and information arrival

In the spirit of Gode and Sunder [3], we assume a sim-ple, zero-intelligence model of agent behavior and showthat the market institution exerts considerable power inshaping the properties of prices. While not disputingthat agent behavior might be important, our model sug-gests that, at least on the short timescale many of theproperties of the market are dictated by the market in-stitution, and in particular the need to store supply anddemand. Our model is stochastic and fully dynamic, andmakes predictions that go beyond the realm of experi-mental economics, giving quantitative predictions aboutfundamental properties of a real market. We have devel-oped what were previously conceptual toy models in thephysics literature into a model with testable explanatorypower.

This raises questions about the comparison to standardmodels based on the response of valuations to news. Theidea that news might drive changes in order flow ratesis compatible with our model. That is, news can drivechanges in order flow, which in turn cause the best bidor ask price to change. But notice that in our modelthere are no assumptions about valuations. Instead, ev-erything depends on order flow rates. For example, thediffusion rate of prices increases as the 5/2 power of mar-ket order flow rate, and thus volatility, which depends onthe square root of the diffusion rate, increases as the 5/4power. Of course, order flow rates can respond to infor-mation; an increase in market order rate indicates addedimpatience, which might be driven by changes in valua-tion. But changes in long-term valuation could equallywell cause an increase in limit order flow rate, which de-creases volatility. Valuation per se does not determinewhether volatility will increase or decrease. Our modelsays that volatility does not depend directly on valua-tions, but rather on the urgency with which they arefelt, and the need for immediacy in responding to them.

Understanding the shape of the price impact functionwas one of the motivations that originally set this project

into motion. The price impact function is closely relatedto supply and demand functions, which have been cen-tral aspects of economic theory since the 19th century.Our model suggests that the shape of price impact func-tions in modern markets is significantly influenced notso much by strategic thinking as by an economic funda-mental: The need to store supply and demand in orderto provide liquidity. A priori it is surprising that this re-quirement alone may be sufficient to dictate at least thebroad outlines of the price impact curve.

Our model offers a “divide and conquer” strategyto understanding fundamental problems in economics.Rather than trying to ground our approach directly onassumptions of utility, we break the problem into twoparts. We provide an understanding of how the statisti-cal properties of prices respond to order flow rates, andleave the problem open of how order flow rates dependon more fundamental assumptions about information andutility. Order flow rates have the significant advantagethat, unlike information, utility, or the cognitive powersof an agent, they are directly measurable. We hope thatby breaking the problem into two pieces, and partiallysolving the second piece, we can ultimately help providea deeper understanding of how markets work.

APPENDIX A: RELATIONSHIP OF PRICE

IMPACT TO CUMULATIVE DEPTH

An important aspect of markets is the immediate liq-uidity, by which we mean the immediate response ofprices to incoming market orders. When a market orderenters, its execution range depends both on the spreadand on the depth of the orders in the book. These de-termine the sequence of transaction prices produced bythat order, as well as the instantaneous market impact.Long term liquidity depends on the longer term responseof the limit order book, and is characterized by the priceimpact function φ(ω, τ) for values of τ > 0. Immediateliquidity affects short term volatility, and long term liq-uidity affects volatility measured over longer timescales.In this section we address only short term liquidity. Weaddress volatility on longer timescales in section II B 4.

We characterize liquidity in terms of either the depthprofile or the price impact. The depth profile n(p, t) isthe number of shares n at price p at time t. For manypurposes it is convenient to think in terms of the cumu-lative depth profile N , which is the sum of n values up(or down) to some price. For convenience we establish areference point at the center of the book where we definep ≡ 0 and N(0) ≡ 0). The reference point can be ei-ther the midpoint quote, or the best bid or ask. We alsostudy the price impact function ∆p = φ(ω, τ, t), where∆p is the shift in price at time t+ τ caused by an orderof size ω placed at time t. Typically we define ∆p as theshift in the midpoint price, though it is also possible touse the best bid or ask (Eq. 1).

The price impact function and the depth profile are

30

closely related, but the relationship is not trivial. N(∆p)gives us the average total number of orders upto a dis-tance ∆p away from the origin. Whereas, in order tocalculate the price impact, what we need is the averageshift ∆p caused by a fixed number of orders. Makingthe identifications p = ∆p, N = ω, and choosing a com-mon reference point, the instantaneous price impact isthe inverse of the instantaneous cumulative depth, i.e.φ(ω, 0, t) = N−1(ω, t). This relationship is clearly trueinstant by instant. However it is not true for averages,since the mean of the inverse is not in general equal to theinverse of the means, i.e. 〈φ〉 6= 〈N〉−1

. This is highly rel-evant here, since because the fluctuations in these func-tions are huge, our interest is primarily in their statisticalproperties, and in particular the first few moments.

A relationship between the moments can be derived asfollows:

1. Moment expansion

There is some subtlety in how we relate the marketimpact to the cumulative order count N (p, t). One el-igible definition of market impact ∆p is the movementof the midpoint, following the placement of an orderof size ω. If we define the reference point so thatN (a, t) ≡ 0, and the market order is a buy, this definitionputs ω (∆p, t) = N (a+ 2∆p, t) −N (a, t). In words, themidpoint shift is half the shift in the best offer. An alter-native choice would be to let ω (∆p, t) = N (∆p, t), whichwould include part of the instantaneous spread in thedefinition of impact in midpoint-centered coordinates, ornone of it in ask-centered coordinates. The issue of howimpact is related to N (p, t) is separate from whether thebest ask is set equal to the reference point for prices, andmay be chosen differently to answer different questions.

Under any such definition, however, the impact ∆p is amonotonic function of ω in every instance, so either maybe taken as the independent variable, along with the in-dex t that labels the instance. We wish to account for thedifferences in instance averages 〈〉 of ω and ∆p, regardedrespectively as the dependent variables, in terms of thefluctuations of either.

In spite of the fact that the density n (p, t) is a highlydiscontinuous variable in general, monotonicity of the cu-mulative N (p, t) enables us to picture a power series ex-pansion for ω (p, t) in p, with coefficients that fluctuate intime. The simplest such expansion that captures muchof the behavior of the simulated output is

ω (p, t) = a (t) + b (t) p+c (t)

2p2, (A1)

if p is regarded as the independent variable, or

p (ω, t) =−b (t) +

√

b2 (t) + 2c (t) (ω − a (t))

c (t), (A2)

if ω is. While the variable a (t) would seem unneces-sary since ω is zero at p = 0, empirically we find that

simultaneous fits to both ω and ω2 at low order can bemade better by incorporating the additional freedom offluctuations in a.

We imagine splitting each t-dependent coefficient intoits mean, and a zero-mean fluctuation component, as

a (t) ≡ a+ δa (t) , (A3)

b (t) ≡ b+ δb (t) (A4)

and

c (t) ≡ c+ δc (t) . (A5)

The fluctuation components will in general depend onε. The values of the mean and second moment of thefluctuations can be extracted from the mean distributions〈ω〉 and

⟨

ω2⟩

. The mean values come from the linearexpectation:

〈ω (0)〉 = a, (A6)

∂ 〈ω (p)〉∂p

∣

∣

∣

∣

p=0

= b, (A7)

and

∂2 〈ω (p)〉∂p2

∣

∣

∣

∣

p=0

= c. (A8)

Given these, the fluctuations then come from thequadratic expectation as

⟨

ω2 (0)⟩

= a2 +⟨

δa2⟩

, (A9)

∂⟨

ω2 (p)⟩

∂p

∣

∣

∣

∣

∣

p=0

= 2ab+ 2 〈δaδb〉 , (A10)

∂2⟨

ω2 (p)⟩

∂p2

∣

∣

∣

∣

∣

p=0

= 2(

b2 + ac)

+ 2⟨

δb2 + δaδc⟩

, (A11)

∂3⟨

ω2 (p)⟩

∂p3

∣

∣

∣

∣

∣

p=0

= 6(

bc+ 〈δbδc〉)

, (A12)

and

∂4⟨

ω2 (p)⟩

∂p4

∣

∣

∣

∣

∣

p=0

= 6(

c2 +⟨

δc2⟩)

. (A13)

When ω is given a specific definition in terms of the cu-mulative distribution, its averages become averages overthe density in the order book.

31

The values of the moments as obtained above may thenbe used in a derivative expansion of the inverse func-tion (A2), making the prediction for the averaged impact

〈p (ω)〉 = p+1

2

∂2p

∂a2

⟨

δa2⟩

+1

2

∂2p

∂b2⟨

δb2⟩

+1

2

∂2p

∂c2⟨

δc2⟩

+∂2p

∂a∂b〈δaδb〉 +

∂2p

∂a∂c〈δaδc〉 +

∂2p

∂b∂c〈δbδc〉 ,

(A14)

where overbar denotes the evaluation of the function (A2)or its indicated derivative at b (t) = b, c (t) = c, and ω.The fluctuations

⟨

δb2⟩

and 〈δaδc〉 cannot be determinedindependently from Eq. (A11). However, in keeping withthis fact, their coefficient functions in Eq. (A14) are iden-tical, so the inversion remains fully specified.

If we denote by Z the radical

Z ≡√

b2 + 2c (ω − a), (A15)

the various partial derivative functions in Eq. (A14) eval-uate to

1

2

∂2p

∂a2 =−c2Z

3 , (A16)

∂2p

∂a∂b=

b

Z3, (A17)

1

2

∂2p

∂b2=

∂2p

∂a∂c=ω − a

Z3, (A18)

∂2p

∂b∂c=

1

c2− b

Zc2− (ω − a) b

cZ3, (A19)

and

1

2

∂2p

∂c2=Z − b

c3− ω − a

2c2Z− (ω − a)

2

2cZ3. (A20)

Plugging these into Eq. (A14) gives the predicted meanprice impact, compared to actual mean in Fig. 31. Herethe measure used for price impact is the movement of theask from buy market orders. The cumulative order dis-tribution is computed in ask-centered coordinates, elim-inating the contribution from the half-spread in the pcoordinate. The inverse of the mean cumulative distri-bution (dotted), which corresponds to p in Eq. (A14),clearly underestimates the actual mean impact (solid).However, the corrections from only second-order fluctua-tions in a, b, and c account for much of the difference atall values of ε.

2. Quantiles

Another way to characterize the relationship betweendepth profile and market impact is in terms of their quan-tiles (the fraction greater than a given value, for example

0 0.1 0.2 0.3 0.4 0.5 0.6 0

0.2

0.4

0.6

0.8

1

1.2

<dm

> /

p C

N ε / σ

0 0.05 0.1 0.15 0.2 0.25 0.3 0.350

0.2

0.4

0.6

0.8

1

<dm

> /

p C

N ε / σ

0 0.05 0.1 0.15 0.2 0.25 0.3 0

0.2

0.4

0.6

0.8

1

<dm

> /

p C

N ε / σ

FIG. 31: Comparison of the inverse mean cumulative orderdistribution p (dot), to the actual mean impact (solid), andthe second-order fluctuation expansion (A14, dash). (a) : ε =0.2. (b): ε = 0.02. (c): ε = 0.002.

the median is the 0.5 quantile). Interestingly, the rela-tionship between quantiles is trivial. Letting Qr(x) bethe rth quantile of x, because the the cumulative depthN(p) is a non-decreasing function with inverse p = φ(N),we have the relation

Qr(φ) = (Q1−r(N))−1 (A21)

This provides an easy and accurate way to compare depthand price impact when the tick size is sufficiently small.However, when the tick size is very coarse, the quantiles

32

are in general not very useful, because unlike the mean,the quantiles do not vary continuously, and only take ona few discrete values.

As we have argued in the previous section, in nondi-mensional coordinates all of the properties of the limitorder book are described by the two dimensionless scalefactors ε and dp/pc (see table (II). When expressed indimensionless coordinates, any property, such as depth,spread, or price impact, can only depend on these twoparameters. This reduces the search space from five di-mensions to two, which greatly simplifies the analysis.Any results can easily be re-expressed in dimensional co-ordinates from the definitions of the dimensionless pa-rameters.

APPENDIX B: SUPPORTING CALCULATIONS

IN DENSITY COORDINATES

The following two subsections provide details for themaster equation solution in density coordinates. The firstprovides the generating functional solution for the den-sity at general dp, and the second the approximate sourceterm for correlated fluctuations.

1. Generating functional at general bin width

As in the main text, α and µ represent the functionsof p everywhere in this subsection, because the boundaryvalues do not propagate globally. Eq. (24) can be solvedby assuming there is a convergent expansion in (formal)small D/δ,

Π ≡∑

j

(

D

δ

)j

Πj , (B1)

and it is convenient to embellish the shorthand notationas well, with

Πj (0, p) ≡ π0j (p) . (B2)

It follows that expected number also expands as

〈n〉 ≡∑

j

(

D

δ

)j

〈n〉j . (B3)

Order by order in D/δ, Eq. (24) requires

(

∂

∂λ− αdp− µ/2λ

δ σ

)

Πj =µ

2δ σλπ0j +

∂

∂p2

Πj−1

(λ− 1).

(B4)Because Πj have been introduced in order to be chosenhomogeneous of degree zero in D/δ, the normalizationcondition requires that

Π0 (1, p) = 1 , Πj 6=0 (1, p) = 0 , ∀p (B5)

The implied recursion relations for expected occupationnumbers are

〈n〉0 =αdp

δ− µ

2δ(1 − π00) , (B6)

at j = 0, and

〈n〉j =µ

2δπ0j +

∂

∂p2〈n〉j−1 (B7)

otherwise.Eq. (B4) is solved immediately by use of an integrating

factor, to give the recursive integral relation

Πj (λ) = π0j

[

1 +αdp

δ σI (λ)

]

+ I (λ)

⟨⟨

∂

∂p2

Πj−1

λ− 1

⟩⟩

λ

, (B8)

where

I (λ) ≡ λ

∫ 1

0

dze(αdp/δ σ)(1−z)z(µ/2δ σ), (B9)

and⟨⟨

∂

∂p2

Πj−1

λ− 1

⟩⟩

λ

≡

λ

I (λ)

∫ 1

0

dz e(αdp/δ σ)(1−z)z(µ/2δ σ) ∂2

∂p2

Πj−1 (λz)

λz − 1.

(B10)

The surface condition (B5) provides the starting pointfor this recursion, by giving at j = 0

π00 =1

1 + (αdp/δ σ) I (1). (B11)

Given forms for α and µ, Eq. (B6) may be solved directlyfrom Eq. (B11), and extended by Eq. (B7) to solve for〈n (p)〉. More generally, equations (B9), (B10), and (B11)may be solved to any desired order numerically, to obtainthe fluctuation characteristics of n (p). Finding the solu-tion becomes difficult, however, when α and µ must berelated self-consistently to the solutions for Π. The spe-cial case dp→ 0 admits a drastic simplification, in whichthe whole expansion for 〈n (p)〉 may be directly summed,to recover the result in the main text. In this limit, onegets a single differential equation in p which is solvableby numerical integration. The existence and regularity ofthis solution demonstrates the existence of a continuumlimit on the price space, and can be simulated directlyby allowing orders to be placed at arbitrary real-valuedprices.

a. Recovering the continuum limit for prices

In the limit that the dimensionless quantity αdp/δ σ →0, Eq. (B9) simplifies to

I (λ) → λ

1 + µ/2δ σ+ O (dp) , (B12)

33

from which it follows that

〈n〉0 → αdp/δ

1 + µ/2δ σ+ O

(

dp2)

. (B13)

The important simplification given by vanishing dp, aswill be seen below, is that the expansion (19) collapses,at leading order in dp, to

Π0 (λ) → 1 + (λ− 1)〈n〉0σ

+ O(

dp2)

. (B14)

Eq. (B14) is used as the input to an inductive hypothesis

Πj−1 (λ) → Πj−1 (1)+(λ− 1)〈n〉j−1

σ+O

(

dp2)

, (B15)

(n. b. 〈n〉j−1 ∼ O (dp), Πj−1 (1) = either 1 or 0), which

with Eq. (B10), then recovers the condition at j:

Πj (λ) → (λ− 1)I (1)d2

dp2

〈n〉j−1

σ+ O

(

dp2)

. (B16)

Using Eq. (19) at λ → 1, and Eq. (B12) for I, gives therecursion for the number density

〈n〉j 6=0 → 1

1 + µ/2δ σ

d2

dp2〈n〉j−1 + O

(

dp2)

. (B17)

The sum (B3) for 〈n〉 is then

〈n〉 =∑

j

(

D

δ

1

1 + µ/2δ σ

d2

dp2

)j

〈n〉0. (B18)

Using Eq. (B13) for 〈n〉0 and re-arranging terms,Eq. (B18) is equivalent to

〈n〉 =1

1 + µ/2δ σ

∑

j

(

D

δ

d2

dp2

1

1 + µ/2δ σ

)jαdp

δ.

(B19)The series expansion in the price Laplacian is formally

the geometric sum

(1 + µ/2δ σ) 〈n〉 =

[

1 − D

δ

d2

dp2

1

1 + µ/2δ σ

]−1αdp

δ,

(B20)which can be inverted to give Eq. (27), a relation that islocal in derivatives.

2. Cataloging correlations

A correct source term S must correlate the incidencesof zero occupation with the events producing shifts. Itis convenient to separate these into the four independenttypes of deposition and removal.

First we consider removal of buy limit orders, whichgenerates a negative shift of the midpoint. Let a′ denotethe position of the ask after the shift. Then all possible

case source prob∆p ≤ a′ < p ψ (p− ∆p) − ψ (p) ϕ (∆p) − ϕ (p)

∆p ≤ p < a′ ≤ p+ ∆p 0 − ψ (p) ϕ (p) − ϕ (p+ ∆p)p < ∆p < a′ ≤ p+ ∆p 0 − ψ (p) ϕ (∆p) − ϕ (p+ ∆p)

TABLE VII: Contributions to “effective P−” from removalof a buy limit order, conditioned on the position of the askrelative to p.

case source prob∆p ≤ a′ < p ψ (p+ ∆p) − ψ (p) ϕ (∆p) − ϕ (p)

∆p ≤ p < a′ ≤ p+ ∆p ψ (p+ ∆p) − 0 ϕ (p) − ϕ (p+ ∆p)p < ∆p < a′ ≤ p+ ∆p ψ (p+ ∆p) − 0 ϕ (∆p) − ϕ (p+ ∆p)

TABLE VIII: Contributions to “effective P+” from removalof a sell limit order, conditioned on the position of the askrelative to p.

shifts ∆p are related to a given price bin p and a′ in one ofthree ordering cases, shown in Table VII. For each case,the source term corresponding to [ψ (p− ∆p) − ψ (p)] inEq. (43) is given, together with the measure of order-bookconfigurations for which that case occurs. The mean-fieldassumption (36) is used to estimate these measures.

As argued when defining β in the simpler diffusion ap-proximation for the source terms, the measure of shiftsfrom removal of either buy or sell limit orders should besymmetric with that of their addition within the spread,which is is 2d∆p for either type, in cases when the shift±∆p is consistent with the value of the spread. The onlychange in these more detailed source terms is replace-ment of the simple Pr (a ≥ ∆p) with the entries in thethird column of Table VII. When the ∆p cases are in-tegrated over their range as specified in the first columnand summed, the result is a contribution to S of

∫ p

0

2d∆p ψ (p− ∆p) [ϕ (∆p) − ϕ (p)]

−∫ ∞

0

2d∆p ψ (p) [ϕ (∆p) − ϕ (p+ ∆p)] (B21)

Sell limit-order removals generate another sequence ofcases, symmetric with the buys, but inducing positiveshifts. The cases, source terms, and frequencies are givenin Table VIII. Their contribution to S, after integrationover ∆p, is then

∫ ∞

0

2d∆p ψ (p+ ∆p) [ϕ (∆p) − ϕ (p+ ∆p)]

−∫ p

0

2d∆p ψ (p) [ϕ (∆p) − ϕ (p)] (B22)

Order addition is treated similarly, except that a de-notes the position of the ask before the event. Sell limit-order additions generate negative shifts, with the casesshown in Table IX. Integration over ∆p consistent with

34

case source prob∆p ≤ a < p− ∆p ψ (p− ∆p) − ψ (p) ϕ (∆p) − ϕ (p− ∆p)

∆p ≤ p− ∆p < a ≤ p 0 − ψ (p) ϕ (p− ∆p) − ϕ (p)p− ∆p < ∆p < a ≤ p 0 − ψ (p) ϕ (∆p) − ϕ (p)

TABLE IX: Contributions to “effective P−” from additionof a sell limit order, conditioned on the position of the askrelative to p.

case source prob∆p ≤ a′ < p ψ (p+ ∆p) − ψ (p) ϕ (∆p) − ϕ (p)

∆p ≤ p < a′ ≤ p+ ∆p ψ (p+ ∆p) − 0 ϕ (p) − ϕ (p+ ∆p)p < ∆p < a′ ≤ p+ ∆p ψ (p+ ∆p) − 0 ϕ (∆p) − ϕ (p+ ∆p)

TABLE X: Contributions to “effective P+” from addition of abuy limit order, conditioned on the position of the ask relativeto p.

these cases gives the negative-shift contribution to S∫ p/2

0

2d∆p ψ (p+ ∆p) [ϕ (∆p) − ϕ (p− ∆p)]

−∫ p

0

2d∆p ψ (p) [ϕ (∆p) − ϕ (p+ ∆p)] (B23)

The corresponding buy limit-order addition cases aregiven in Table X, and their positive-shift contribution toS turns out to be the same as that from removal of selllimit orders (B22).

Writing the source as a sum of two terms S ≡ Sbuy +Ssell, the combined contribution from buy limit-order ad-ditions and removals is

Sbuy (p) =

∫ p

0

2d∆p [ψ (p− ∆p) − ψ (p)] [ϕ (∆p) − ϕ (p)]

−∫ ∞

0

2d∆p [ψ (p+ ∆p) − ψ (p)] [ϕ (∆p) − ϕ (p+ ∆p)]

(B24)

The corresponding source term from sell order additionand removal is

Ssell (p) =

∫ p/2

0

2d∆p ψ (p− ∆p) [ϕ (∆p) − ϕ (p− ∆p)]

− 2

∫ p

0

2d∆p ψ (p) [ϕ (∆p) − ϕ (p)]

+

∫ ∞

0

2d∆p ψ (p+ ∆p) [ϕ (∆p) − ϕ (p+ ∆p)]

(B25)

The forms (B24) and (B25) do not lead to∫

dpS =0, and correcting this presumably requires distributingthe orders erroneously transported through the midpointby the diffusion term, to interior locations where theythen influence long-time diffusion autocorrelation. Thesesource terms manifestly satisfy S (0) = 0, though, andthat determines the intercept of the average order depth.

a. Getting the intercept right

Evaluating Eq. (44) with α (p) /α (∞) = 1 +Pr (s/2 ≥ p), at p = 0 gives the boundary value of thenondimensionalized, midpoint-centered, mean order den-sity

ψ (0) =2

1 + ε, (B26)

which dimensionalizes to

〈n (0)〉σdp

=2α (∞) /σ

µ (0) /2σ + δ. (B27)

Eq. (B27), for the total density, is the same as theform (B6) produced by the diffusion solution for thezeroth order density, as should be the case if diffusionno longer transports orders through the midpoint. Thisform is verified in simulations, with midpoint-centeredaveraging.

Interestingly, the same argument for the bid-centeredframe would simply omit the ϕ from α (0) /α (∞), pre-dicting that

ψ (0) =1

1 + ε, (B28)

a result which is not confirmed in simulations. Thus, inaddition to not satisfying the mean-field approximation,the bid-centered density average appears to receive somediffusive transport of orders all the way down to the bid.

b. Fokker-Planck expanding correlations

Equations (B24) and (B25) are not directly easy to usein a numerical integral. However, they can be Fokker-Planck expanded to terms with behavior comparable tothe diffusion equation, and the correct behavior near themidpoint. Doing so gives the nondimensional expansionof the source term S corresponding to the diffusion con-tribution in Eq. (32):

S = R (p)ψ (p) + P (p)dψ (p)

dp+ εβ (p)

d2ψ (p)

dp2 . (B29)

The rate terms in Eq. (B29) are integrals defined as

R (p) =

∫ ∞

0

2d∆p [ϕ (∆p) − ϕ (p+ ∆p)]

− 2

∫ p

0

2d∆p [ϕ (∆p) − ϕ (p)]

+

∫ p/2

0

2d∆p [ϕ (∆p) − ϕ (p− ∆p)] ,(B30)

35

P (p) = 2

∫ ∞

0

2d∆p∆p [ϕ (∆p) − ϕ (p+ ∆p)]

−∫ p

0

2d∆p p [ϕ (∆p) − ϕ (p)]

−∫ p/2

0

2d∆p∆p [ϕ (∆p) − ϕ (p− ∆p)] ,(B31)

and

εβ (p) =

∫ ∞

0

2d∆p (∆p)2[ϕ (∆p) − ϕ (p+ ∆p)]

+1

2

∫ p

0

(∆p)22d∆p [ϕ (∆p) − ϕ (p)]

+1

2

∫ p/2

0

2d∆p (∆p)2[ϕ (∆p) − ϕ (p− ∆p)] .

(B32)

All of the coefficients (B30 - B31) vanish mani-festly as p → 0, and at large p, R,P → 0, whileεβ (p) → 4

∫∞

0 d∆p (∆p)2ϕ (∆p), recovering the diffusion

constant (41) of the simplified source term. However,they are still not convenient for numerical integration,being nonlocal in ϕ.

The exponential form (33) is therefore exploited to ap-proximate ϕ, in the region where its value is largest, withthe expansion

ϕ (p± ∆p) ≈ ϕ (p)ϕ (∆p) e±p∆p ∂ψ/∂p|0 (B33)

In the range where the mean-field approximation is valid,ϕ is dominated by the constant term ψ (0), and even thefactors e±p∆p ∂ψ/∂p|

0 can be approximated as unity. Thisleaves the much-simplified expansions

R (p) = [1 − ϕ (p)] I0 (∞)

− 2 [I0 (p) − 2pϕ (p)]

+ [1 − ϕ (p)] I0 (p/2) , (B34)

P (p) = 2 [1 − ϕ (p)]I1 (∞)

−[

I1 (p) − p2ϕ (p)]

− [1 − ϕ (p)] I1 (p/2) , (B35)

and

εβ (p) = [1 − ϕ (p)]I2 (∞)

+1

2

[

I2 (p) − 2

3p3ϕ (p)

]

+ [1 − ϕ (p)] I2 (p/2) . (B36)

In Equations (B34 - B36),

Ij (p) ≡∫ p

0

2d∆p (∆p)jϕ (∆p) , (B37)

for j = 0, 1, 2. These forms (B34 - B36) are inserted inEq. (B29) for S to produce the mean-field results com-pared to simulations in Fig. 20 - Fig. 22.

ACKNOWLEDGMENTS

We would like to thank the McKinsey Corporation,Credit Suisse First Boston, the McDonnel Foundation,Bob Maxfield, and Bill Miller for supporting this re-search. We would also like to thank Paolo Patelli, R.Rajesh, Spyros Skouras, and Ilija Zovko for helpful dis-cussions, and Marcus Daniels for valuable technical sup-port.

[1] M. Daniels, J.D. Farmer, G. Iori, and D. E. Smith,“How storing supply and demand affects price diffusion”,http://xxx.lanl.gov/cond-mat/0112422 (2001).

[2] L. Bachelier, “Theorie de la speculation”, 1900.Reprinted in P.H. Cootner, The Random Character ofStock Prices, 1964, MIT Press Cambridge.

[3] D. Gode and S. Sunder, “Allocative efficiency of marketswith zero intelligence traders: Markets as a partial substi-tute for individual rationality”, J. of Political Economy,101:119 (1993).

[4] J.D. Farmer, “Market force, ecology, and evolution”, SFIworking paper 98-12-117 or http://xxx.lanl.gov/adapt-org/9812005.

[5] J.-P. Bouchaud and R. Cont, “A Langevin approachto stock market fluctuations and crashes”, EuropeanPhysics Journal B 6, 543 (1998)

[6] J.A. Hausman and A.W. Lo, “An ordered probit anal-

ysis of transaction stock prices”, Journal of FinancialEconomics 31 319-379 (1992)

[7] J.D. Farmer, “Slippage 1996”, Prediction Company in-ternal technical report (1996), http://www.predict.com

[8] N. Torre, BARRA Market Impact Model Handbook,BARRA Inc, Berkeley CA, www.barra.com (1997).

[9] A. Kempf and O. Korn, “Market depth and order size”,University of Mannheim technical report (1998).

[10] V. Plerou, P. Gopikrishnan, X. Gabaix, and H.E. Stan-ley, “Quantifying stock price response to demand fluctu-ations”, http://xxx.lanl.gov/cond-mat/0106657.

[11] F. Lillo, J.D. Farmer, and R. Mantegna, “Single curvecollapse of the price impact function”, to appear inNature. A more detailed version can be found athttp://xxx.lanl.gov/cond-mat/0207438.

[12] H. Mendelson, “Market behavior in a clearing house”,Econometrica 50, 1505-1524 (1982).

36

[13] K.J. Cohen, R.M. Conroy and S.F. Maier, “Order flowand the quality of the market”, in: Y. Amihud, T. Hoand R. Schwartz, eds., Market Making and the Chang-ing Structure of the Securities Industry, 1985, LexingtonBooks, Lexington MA.

[14] I. Domowitz and Jianxin Wang, “Auctions as algo-rithms”, J. of Econ. Dynamics and Control 18, 29 (1994).

[15] T. Bollerslev, I. Domowitz, and J. Wang, “Order flow andthe bid-ask spread: An empirical probability model ofscreen-based trading”, J. of Econ. Dynamics and Control21, 1471 (1997).

[16] P. Bak, M. Paczuski, and M. Shubik, “Price variations ina stock market with many agents”, Physica A 246, 430(1997).

[17] D. Eliezer and I.I. Kogan, “Scaling laws for the mar-ket microstructure of the interdealer broker markets”,http://xxx.lanl.gov/cond-mat/9808240.

[18] L.-H. Tang and G.-S. Tian, “ Reaction-diffusion-branching models of stock price fluctuations”, PhysicaA, 264, 543 (1999).

[19] S. Maslov, “Simple model of a limit order-driven mar-ket”, Physica A 278, 571(2000).

[20] F. Slanina, “Mean-field approximation for a limit or-der driven market model”, http://xxx.lanl.gov/cond-

mat/0104547[21] D. Challet and R. Stinchcombe, “Analyzing and Model-

ing 1+1d markets”, Physica A 300, 285, (2001), preprintcond-mat/0106114

[22] C. Chiarella and G. Iori, “A simulation analysis of themicrostructure of double auction markets”, to appear inQuantitative Finance, October (2002).

[23] J. P. Bouchaud, M. Mezard, and M. Potters, “Statisticalproperties of stock order books: empirical results andmodels”, http://www.arxiv.org/cond-mat/0203511

[24] P. W. Bridgman, Dimensional Analysis, 1922, Yale Uni-versity Press, New Haven.

[25] D. Challet and R. Stinchcombe, “Exclusion par-ticle models of limit order financial markets”,http://xxx.lanl.gov/cond-mat/0106114

[26] I. I. Zovko and J. D. Farmer, “ The power of patience:A behavioral regularity in limit order placement” cond-mat 0206280, to appear in Quantitative Finance, October(2002)

[27] S. Maslov and M. Mills, “Price fluctuations from theorder book perspective — empirical facts and a simplemodel”, Physica A 299, 234 (2001).

Statistical Theory of the Continuous Double Auction · Statistical theory of the continuous double...

Documents

Transcript of Statistical Theory of the Continuous Double Auction · Statistical theory of the continuous double...