SPEEDEX: A Scalable, Parallelizable, and Economically ...

SPEEDEX: A Scalable, Parallelizable, andEconomically Efficient Distributed EXchangeGeoffrey RamseyerStanford University

Stanford, California, [email protected]

Ashish GoelStanford University

Stanford, California, [email protected]

David MazièresStanford University

Stanford, California, USA

AbstractSPEEDEX is a decentralized exchange (DEX) letting partici-pants securely trade assets without giving any single partyundue control over the market. SPEEDEX offers several ad-vantages over prior DEXes. It achieves high throughput—over 100,000 transactions per second on 32-core servers,even with 70M open offers. It eliminates internal arbitrageopportunities, so that a direct trade from asset𝐴 to 𝐵 alwaysreceives as good a price as trading through some third assetsuch as USD. Finally, it prevents frontrunning attacks thatwould otherwise increase the effective bid-ask spread forsmall traders. SPEEDEX’s key design insight is to use anArrow-Debreu exchange market structure that fixes the val-uation of assets for all trades in a given block of transactions.Not only does this market structure provide fairness acrosstrades, it makes trade operations commutative and henceefficiently parallelizable.

1 IntroductionDigital currencies are moving closer to mainstream adoption.Examples include central bank digital currencies (CBDCs)such as China’s DC/EP, commercial efforts such as Diem,and numerous decentralized-blockchain-based stablecoinssuch as Tether, Dai, and USDC. Available digital currenciesdiffer wildly in terms of privacy, openness, smart contractsupport, performance, regulatory risk, solvency guarantees,compliance features, retail vs. wholesale suitability, and cen-tralization of the underlying ledger. Because of these dif-ferences, and because financial stability demands differentmonetary policy in different countries, we cannot hope fora one-size-fits-all global digital currency. Instead, to realizethe full potential of digital currencies (and digital assets ingeneral), we need a trading platform where many digitalcurrencies can efficiently coexist.Trading currencies requires an asset exchange to match

and execute offers from agreeable counterparties. The idealdigital currency exchange would avoid giving any centralauthority undue power over the global flow of currency. Theideal exchange would also be transparent, auditable, andresistant to servers profitably front-running small tradersby siphoning off the bid-ask spread. Finally, the ideal ex-change would present no arbitrage opportunities: when sell-ing currency𝐴 to buy currency 𝐵, there would be no need to

create more complex trades through other currencies (e.g.,𝐴→ 𝐶 → 𝐵) to get a better price.

In a digital asset context, the gold standard for avoidingcentralized control is to operate a decentralized exchange, orDEX: a transparent exchange implemented as a determinis-tic replicated state machine maintained by many differentparties. To prevent theft, a DEX requires all transactionsto be digitally signed by the relevant asset holders. To pre-vent cheating, replicas organize history into an append-onlyblockchain. DEX replicas agree on blockchain state througha Byzantine-fault tolerant consensus protocol, typically somevariant of asynchronous Byzantine agreement [24] for pri-vate blockchains or synchronous mining [47] for public ones.

Unfortunately, conventional wisdom holds that DEXescannot scale to transaction volumes beyond a few thousandtransactions per second. This wisdom has led to many al-ternative blockchain scaling techniques, such as off-chaintrade matching [53], automated market-makers [14], andtransaction rollup systems [9]. However, these approacheseither trust a third party to ensure that orders are matchedwith the best available price, or sacrifice the ability to settraditional limit orders that only sell at or above a certainprice.The challenge with on-chain DEXes is that the precise

quantities spent and gained by each participant depend onhow offers are matched. One pair of trade offers might ex-change 1 EUR for 1.20 USD, while another swaps 1 USD for1.21 EUR. Realistic markets will offer many distinct waysto match buyers and sellers with slightly different results.The multitude of matchings makes it impractical to elimi-nate arbitrage, presents opportunities for DEX servers togain advantage by re-ordering transactions, and precludesparallelization in a deterministic replicated state machine.In particular, a DEX cannot naively employ optimistic con-currency control, as doing so would make order matchingdependent on non-deterministic inter-thread scheduling.This paper disproves the conventional wisdom about on-

chain DEX performance. We present SPEEDEX (Scalable,Parallelizable, and Economicially Efficient Distributed EX-change), a fully on-chain DEX capable of processing over100,000 transactions per second on a 32-core machine (Fig-ure 5) and designed to scale further on more powerful hard-ware. Like most DEXes, SPEEDEX processes transactions

1

https://orcid.org/0000-0003-3205-2457

in blocks—e.g., a block of 500,000 transactions every 5 sec-onds. Its fundamental principle is that all transactions in ablock commute: the block’s result is identical regardless ofthe order in which trades are executed, which allows effi-cient parallelization [27]. Our implementation of SPEEDEXis available at https://github.com/scslab/speedex.To make transactions commutative, SPEEDEX is struc-

tured as an Arrow-Debreu Exchange Market [18]: for a givenblock, each asset 𝐴 has a unique valuation 𝑝𝐴 (denominatedin an arbitrary unit). A block proposed to the consensus al-gorithm contains not only a set of transactions, but also avaluation for each asset. In-the-money offers are executedat the prices implied by these valuations (i.e., 1 unit of 𝐴trades for 2 units of 𝐵 if the valuation of𝐴 is twice that of 𝐵).This structure eliminates arbitrage, eliminates the need fora reserve currency, and enables us to adjust balances withhardware-level atomics, rather than locking.Implementing SPEEDEX introduces both theoretical, al-

gorithmic challenges and systems design challenges. Theprimary algorithmic challenge is efficient price computa-tion. The agents in our Arrow-Debreu markets have linearutilities, and there are many existing theoretical algorithmsfor computing equilbria in linear exchange markets. Theseinclude convex programs such as [32], combinatorial algo-rithms such as [37, 39, 42], and iterative methods such as a“Tâtonnement” process [30]. However, the performance ofall of these algorithms, directly applied, scales poorly, bothempirically and asymptotically, as the number of open of-fers to trade increases. We show that the theoretical marketinstances derived from our exchange have additional struc-ture that we can use to make them efficiently employablein practice. We explicitly correct approximation errors witha 2nd-stage linear program. To implement this exchange,we design a set of transaction semantics that enable naturalinteraction between users and an exchange while allowingconcurrent transaction processing, and while keeping dataorganized to permit the efficient answering of queries aboutthe state of the system from the price computation algorithm.

2 Related Work2.1 (Distributed) ExchangesSome blockchain platforms, like Stellar [10], provide a nativemechanism for posting and clearing trade offers. Automatedmarket makers, like Uniswap [14] or Bancor [41], are smartcontracts that trade with users directly, instead of actingas a matching intermediary. These contracts compute anexchange rate based on their currency reserve contents andoffer this rate to any trader [15]. The offered price typicallychanges after every trade.

0x [53] provides an interface for performing atomic assetswaps between untrusted parties, which can be used as abuilding block for exchanges that match offers off-chain.Loopring [12] gives a similar interface, but allows offers to

be matched in cycles, instead of just in pairs. StarkEx [9,20] provides a set of cryptographic primitives to enable acentralized exchange to match trades off-chain and provethat these operations were conducted correctly.To combat front-running, Clockwork [28] uses timelock

puzzles to allow an exchange to commit to processing an offerbefore it can see the offer’s contents. Zhang et. al. [57] andKelkar et. al. [44] combat front-running by limiting the powerof a byzantine replica to choose the ordering of transactionswithin a block.

Binance operates a distributed exchange [2] that computesper-asset-pair exchange rates. Offers created in that blocktrade at that ratio, while pre-existing offers only trade attheir limit prices. The exchange currently handles 10-30operations per second [3].The Gnosis exchange [6] also clears offers in batches

by solving an optimization problem. Gnosis uses a mixed-integer programming problem and lets solvers compete toproduce good solutions. A forthcoming deployment plans tohandle batches of 100 offers [13].

Budish, Cramton, and Shim [23] argue that outside of theblockchain context, markets should process orders in batchesto combat automated arbitrage.

2.2 Price ComputationThis work is primarily concerned with the special case of theArrow-Debreu exchange market [18] where every agent hasa linear utility function. Equilibria can be computed approxi-mately in these markets using combinatorial algorithms suchas those of Jain et. al [43] and Devanur et. al [33] and ex-actly via the ellipsoid method and simultaneous diophantineapproximation [42]. Duan et. al [37] constructs an exact com-binatorial algorithm, which Garg et. al. [39] extends to con-struct a combinatorial algorithm with strongly-polynomialrunning time.

Ye [54] describes a path-following interior point method,and Devanur et. al. [32] identify a straightforward convexprogram for computing equilibria.Others study natural iterative process, known as Tâton-

nement [17]. In simple terms, if demand for a good exceedssupply of the good, then the price of this good rises, andvice versa. Codenotti et. al. [29, 30] show that this processconverges to an approximate equilibrium in polynomial time.

2.3 Blockchain Scaling and ConcurrencyThe Lightning network [49] moves transactions off of theblockchain into a network of bilateral channels, where eachchannel provides a secure method for two parties to trans-act at high rate. This notion can be extended to “Layer-2channels” with arbitrary state and arbitrary state updates.Systems such as Plasma [48] extend the Layer-2 channel

model. Users lock funds within a root contract on a mainblockchain, then send transactions to Plasma aggregators.There are many variants on this model, such as [8, 11], each

2

https://github.com/scslab/speedex

with different capabilities, performance, and security prop-erties. System security requires fraud-proof mechanisms foridentifying malicious channel operators and (in the case ofZK Rollups) some requirements that users remain online.Ethereum’s 2.0 launch is planned to include blockchain

sharding [5], which would split the blockchain into semi-independently running chains. If different shards can processtransactions in parallel, then the system as a whole can pro-cess more transactions per second.

Saraph and Herlihy [50] argue that historically, optimisticconcurrency control could have made the Ethereum VirtualMachine 2 to 8 times faster. They find that a few contracts,such as token contracts, are responsible for a large fractionof concurrency conflicts.

Dickerson et. al. [35] allow concurrent transaction process-ing by using software transactional memory and by includ-ing in a new block enough information to deterministicallyresolve concurrency conflicts. An implementation using 3CPU cores for transaction processing gave a roughly 1.5x-2.5x speedup.Anjana et. al. [16] add a directed graph to each block

denoting transaction-ordering dependencies. Yu et. al. [56]analyze transactions before execution to identify conflictingtransaction pairs. Bartoletti et. al. [19] develop a frameworkfor static analysis of smart contracts, with a focus on staticidentification of transaction pairs that commute.Our approach is inspired by that of Clements et. al. [27],

who improve performance in the Linux kernel by designingcommutative syscall semantics.

3 System ArchitectureSPEEDEX is an asset exchange implemented as a replicatedstate machine in a blockchain architecture. Assets are issuedand traded by accounts. Accounts have public signature keysauthorized to spend their assets. Signed transactions are mul-ticast among block producers. At each round, one ormore pro-ducers propose candidate blocks extending the blockchainhistory. A set of validator nodes validates and selects oneof the blocks through a consensus mechanism. SPEEDEXis suitable for integration into a variety of blockchains, butbenefits from a consensus layer with relatively low latency(on the order of seconds), such as BA★ [40], SCP [45], orHotStuff [55].

Most central banks and digital currency issuers maintaina ledger tracking their currency holdings. SPEEDEX is notintended to replace these primary ledgers. Rather, we expectbanks and other regulated financial institutions to issue 1:1backed token deposits onto a SPEEDEX exchange and pro-vide interfaces for moving money on and off the exchange.

SPEEDEX supports four operations: account creation, of-fer creation, offer cancellation, and direct payment. Theirsemantics ensure that all operations in the same block com-mute, at the cost of a few restrictions on block contents—for

instance, one cannot send payments from an account in thesame block that the account is created. Specifically, at mostone transaction per block may alter an account’s metadata(such as the account’s public key or existence), and meta-data changes take effect only at the end of block execution.As payments and trading are the common case, we do notconsider this restriction a serious limitation.

To make trades commutative, SPEEDEX computes a fixed“valuation” 𝑝𝐴 for every asset 𝐴 traded in a block. The unitsof 𝑝𝐴 are meaningless, and can be thought of as a fictionalvaluation asset that exists only for the duration of a singleblock. However, valuations imply exchange rates betweendifferent assets—every sale of asset 𝐴 for asset 𝐵 occurs ata price of 𝑝𝐴/𝑝𝐵 . Since every exchange of a given asset pairoccurs at the same price, trading becomes commutative. Ineffect, users trade not with other offers, but with a fictional“market” entity using a fictional valuation asset. The mainalgorithmic challenge is computing valuations such that theexchange “clears”—i.e., the amount of each asset sold to the“market” equals the amount bought from the “market.”

Offers on SPEEDEX are “Sell Offers,” analogous to tradi-tional limit orders. For example, the sell offer (EUR,USD,100, 1.20) proposes to sell 100 EUR at a price no lower than$1.20/EUR.

A set of valuations clears a collection of sell offers if everyoffer can independently respond optimally to said valua-tions, and the total amount of each asset sold to the “market”equals the amount bought from the “market.” The optimalresponse of an offer (𝑆, 𝐵, 𝑒, 𝛼) to a set of valuations is totrade if 𝑝𝑆/𝑝𝐵 > 𝛼 (the trade price is greater than the offer’slimit price) and not to trade if 𝑝𝑆/𝑝𝐵 < 𝛼 . The exchange maypartially execute offers for which 𝑝𝑆/𝑝𝐵 = 𝛼 .Perhaps surprisingly, exact clearing prices provably al-

ways exist (see Appendix A). SPEEDEX approximates theseclearing prices in every block. Approximation error comesin two forms: first, like many exchanges, SPEEDEX chargesa small commission on every trade. Second, the amountof trade volume in one round is approximately optimal; atsome set of prices, there might be an imbalance betweenthe amount of an asset sold to the market and the amountbought from themarket. SPEEDEX corrects these imbalancesby possibly not executing some sell offers with in-the-moneylimit prices very close to the market exchange rate.Our implementation simply burns collected transaction

fees (effectively returning them to the issuer by reducingthe issuer’s liabilities). A real-world deployment could alter-natively compensate traders to provide liquidity (or othersocially beneficial behavior).

Internally, SPEEDEX views trade offers as agents in a vir-tual Arrow-Debreu exchange market [18]. SPEEDEX cantherefore use existing literature on polynomial-time equilib-rium computation in Arrow-Debreu exchange markets as astarting point for its price computation algorithm. Detailson the mapping from collections of sell offers to exchange

3

markets are in Appendix A. Interestingly, integrating offersto buy fixed amounts of an asset makes the equilibrium com-putation problem PPAD-hard, a complexity class conjecturedto be intractable (see Appendix B). However, if market mak-ers employ sell offers, we can set valuations based only onsell offers, but use those valuations to fill buy offers used forsimple cross-currency payments (Appendix E).

Beyond computational speed, SPEEDEX has several desir-able economic properties.

No front running. In real world financial markets, well-placed agents can spy on submitted offers, notice a newtransaction 𝑇 , and then submit a transaction 𝑇 ′ that likelyexecutes before 𝑇 and buys some asset 𝐴 only to re-sell itto 𝑇 at a slightly higher price. In a blockchain setting, 𝑇 ′can sometimes even be done as a single atomic action [31].However, since every transaction sees the same equilibriumprices in SPEEDEX, back-to-back buy and sell offers for asset𝐴 would simply cancel each other out. Relatedly, becauseevery offer sees the same prices, a user that wishes to tradeimmediately can set a very low minimum price and be allbut guaranteed to have their trade executed, but still at thecurrent market price.

No (internal) arbitrage. An agent selling asset 𝐴 in ex-change for asset 𝐵 will see a price of 𝑝𝐴/𝑝𝐵 . An agent trading𝐴 for 𝐵 via some intermediary asset 𝐶 will see exactly thesame price, since 𝑝𝐴/𝑝𝐶 ·𝑝𝐶/𝑝𝐵 = 𝑝𝐴/𝑝𝐵 . Hence, one can effi-ciently trade between assets without much pairwise liquiditywith no need to search for an optimal path. By contrast, manyinternational payments today go through USD because of alack of pairwise liquidity. Of course, there may be arbitrageopportunities between SPEEDEX and external markets. Butat least market makers on SPEEDEX can react to externalevents by adjusting limit prices, unlike automated marketmakers [14].

4 Commutative Exchange LogicTo build a new block of transactions (or validate an existingblock), SPEEDEX performs the following three actions:

1 Iterate over a pool of uncommitted transactions togather a new transaction block (or iterate over thetransactions in an existing block to validate said block).Nodes check transaction validitity, adjust account bal-ances, and collect new offers to trade.

2 Compute a market equilibrium (market-clearing assetprices).

3 Iterate over every trade offer, possibly executing thetrade offer based on the specification of the marketequilibrium.

We design transaction semantics so that (almost) everyoperation in the first step commutes, and thus the iterationcan effectively use many CPU cores in parallel.

Our exchange does not directly match offers with eachother. Instead, the exchange computes asset valuations, orprices, and every offer trades with the “market” at the ex-change ratio implied by the computed valuations. For exam-ple, if the exchange computes that the valuation of one unitof asset𝐴 is twice that of one unit of asset 𝐵, then every offerwould be offered the opportunity to trade some number 𝑥units of 𝐴 in exchange for 2𝑥 units of 𝐵. An offer (that isinterested in trading 𝐴 for 𝐵) accepts this offer if the offer’sminimum exchange rate is less than the offered ratio.The main challenge for the exchange is in computing

prices such that the market clears; that is, for every asset,the amount sold to the “market” is equal to the amountbought from the “market.” In fact, our exchange computesclearing prices approximately, and so these clearing pricesare accompanied by some additional metadata to accountfor approximation error.

The last piece of work done in a block is an iteration overall offers that execute (i.e. that accept the trade offered bythe “market”).Because offers all trade at one common set ofprices, and whether or not an offer executes depends only onthese prices (and some approximation metadata), the stepsof this iteration commute, and thus can be done in parallel.

The number of assets traded on SPEEDEX is much smallerthan the number of users. There might be, for example, afew dozen (𝑁 ) commonly traded assets but millions (𝑀) ofopen trade offers. Our price computation algorithm is notas effectively parallelizable as the rest of the computationinvolved in our exchange. Instead, we design our algorithmto have only a logarithmic dependence on the number ofopen offers to trade. The rest of this section discusses thisalgorithm.

4.1 Computation of Market EquilibriaThe core algorithmic challenge for SPEEDEX is the com-putation of market clearing prices. The open set of offersunder consideration are the new offers created in one blockof transactions, plus the unexecuted offers carried forwardfrom previous blocks. The output of our algorithm is “marketequilibrium”:

Definition 4.1 (Market Equilibrium). Let there be 𝑁 assetstraded by𝑀 offers.

A specification of a market equilibrium consists of:1. A valuation 𝑝𝐴 ∈ R>0 for each asset 𝐴.2. For each pair of assets (𝐴, 𝐵), an amount 𝑥𝐴,𝐵 ∈ R≥0

of asset 𝐴 sold by offers to directly purchase asset 𝐵.

Note that the units of these valuations are arbitrary; of-fers only interact with quotients of valuations. No reservecurrency is needed.

For a pair of assets (𝐴, 𝐵), our exchange sorts offers selling𝐴 to buy 𝐵 by their minimum prices; orders with higherminimum prices only are allowed to trade if all offers withlower minimum prices also trade. The quantities 𝑥𝐴,𝐵 thus

4

specify which offers get to trade. Note that the size of thisoutput has no dependence on𝑀 .

An equilibrium must satisfy certain constraints.

Asset Conservation. For each asset 𝐴, the amount of 𝐴sold to the market should not be less than the amount boughtfrom the market. In other words, Σ𝐵𝑥𝐴,𝐵 ≥ Σ𝐵𝑥𝐵,𝐴𝑝𝐵/𝑝𝐴.

Respect Offer Parameters. No offer is required to tradeat rates below its minimum rate threshold.

Optimal Response to Prices. Every offer selling 𝐴 in ex-change for 𝐵 with a minimum price strictly below 𝑝𝐴/𝑝𝐵fully executes. Furthermore, all trades that can happen atprices 𝑝 must happen.The last contraint ensures that our exchange actually

moves goods between different traders. Without it, we couldtrivially construct an “equilibrium” with any set of prices bysimply setting 𝑥𝐴,𝐵 = 0 for all (𝐴, 𝐵).One technical point is that offers with a minimum price

exactly equal to the market price might only partially exe-cute.In practice, we compute equilibria approximately. There

are two natural ways to approximate an market equilibrium.The first is to charge a small transaction commission, which,throughout this work, we denote as Y.

Definition 4.2 (Transaction Commission (Y)). An offer sell-ing𝑥 units of asset𝐴 in exchange for asset𝐵 receives𝑥 (𝑝𝐴/𝑝𝐵)(1−Y) units of 𝐵.

Wewill also allow the equilibrium to be “approximately op-timal.” We might want a constraint of the form “the amountof traded value is close to the maximum amount possibletrading at an exact equilibrium”. Working with such a con-straint would require knowledge of an exact equilibrium.Instead, we say that the behavior of individual offers is

approximately optimal. Intuitively, offers whose minimumprices are very close to the market prices are approximatelyindifferent to trading or not trading. We let the price com-putation algorithm specify whether or not these offers trade(which gives the algorithm some flexibility to ensure that nogoods are overdemanded). Throughout this work, we denotethis approximation metric by `.

Definition 4.3 (Approximately Optimal Offer Response (`)).If an offer’s minimum price 𝛼 is above the equilibrium ex-change rate 𝛽 , then the offer cannot execute. However, givena parameter `, we say that if 𝛼 < (1− `)𝛽 , it must trade fully,but if (1 − `)𝛽 ≤ 𝛼 ≤ 𝛽 , then the equilibrium is allowed tospecify that the offer executes fully, in part, or not at all.

4.2 Algorithm OutlineComputation of market equilibria is a well-studied problemin the theoretical literature; however, all of these theoreticalalgorithms, directly applied, would be far too slow for prac-tical usage. The reason is that these algorithms assume that

the number of open trade offers (𝑀) in the market is equal tothe number of goods being traded (𝑁 ). This is without lossof generality in a mathematical context.However, in a real-world asset exchange, the number of

assets traded is vastly smaller than the number of open tradeoffers. Our exchange supports trading up to few dozen dif-ferent assets (the experiments of §7 trade 𝑁 = 20 assets), butsupports millions of open trade offers (the experiments of§7 test𝑀 up to roughly 70 million). Our price computationalgorithm can tolerate a reasonable polynomial dependenceon 𝑁 , but any nontrivial dependence on 𝑀 will be far tooslow. Our algorithm has only a logarithmic dependence on𝑀 .Appendix I compares Tâtonnement with a convex programof [32] of size linear in𝑀 .

The Tâtonnement algorithm of Codenotti et. al. [29] is thestarting point for our implementation. Tâtonnement is aniterative algorithm; starting at some set of candidate prices,at every timestep, it computes the Aggregate Demand of themarket at that set of prices, and adjusts prices accordingly(i.e. the price of a good rises if demand exceeds supply).

Definition 4.4 (Aggregate Demand). The Aggregate De-mand of a set of open offers at prices 𝑝 is a vector 𝑍 (𝑝) ∈ R𝑁 .If every offer responds optimally to the market prices 𝑝 , thenthe 𝑖th coordinate 𝑍 (𝑝)𝑖 is the amount of good 𝑖 bought fromthe market minus the amount of good 𝑖 sold to the market.

For some good 𝑖 , if𝑍 (𝑝)𝑖 > 0, then 𝑝 cannot be an exact setof clearing prices, as more of good 𝑖 is demanded in paymentfrom the market than is sold to the market.

Tâtonnement captures the common intuition that raisingthe price of a good increases the supply of said good and de-creases demand, and vice versa.Given these prices, we thenuse the linear program of Appendix C to compute the max-imum trade volume that can happen at these prices (whileensuring that the market clears).SPEEDEX runs Tâtonnement until it produces a (Y, `)-

approximate equilibrium, or a time limit is reached. BecauseTâtonnement iteratively improves a candidate set of prices,Tâtonnement can be terminated at any point to get a candi-date set of prices. The output of a timed-out Tâtonnement runis a set of prices that the algorithm has made as close to mar-ket clearing prices as it is able, and this set of prices is still a(Y, ` ′)-approximate equilibrium for some ` ′ greater than thetarget parameter `.Therefore, in the rare cases when Tâtonnement does not

directly compute an equilibrium, our exchange still usesthe output prices of Tâtonnement to solve a linear program(the program of Appendix C, with small modifications) thatmaximizes the amount of value that changes hands.

4.3 Algorithmic Improvements4.3.1 Logarithmic Demand Oracle. Tâtonnement asksmany aggregate demand queries. To compute aggregate de-mand, our algorithm need only work with offers that sell one

5

good in exchange for one other good, and thus can do muchbetter than a naive iteration over all open trade offers. Wegroup offers by the assets they buy and sell, then sort offersin each category by price. An offer with a lower minimumprice will always be satisfied by a set of prices if an offer witha higher minimum is satisfied. Hence, the offers satisfied bymarket prices can be identified by binary searches.The aggregate demand function, as stated in Definition

4.4, is not continuous. Tâtonnement works best when theaggregate demand function is continuous and has few sharpderivatives; we approximate aggregate demand with a con-tinuous function with fewer sharp derivatives by allowingoffers to execute `-optimally (instead of exactly optimally).This approximation requires an extra binary search per assetpair. For details, see Appendix F.

4.3.2 Price Update Rule. Codenotti et. al [29] analyze aversion of Tâtonnement where prices are adjusted additively.That is, for some step size 𝛿 , [29] uses the update rule 𝑝𝑖 ←𝑝𝑖 + 𝛿𝑍 (𝑝). However, in our exchange, multiplying all pricesby a constant does not change the behavior of any trade offer.Updating prices multiplicatively, instead of additively, meansthat scaling all prices by a constant does not change the realeffect of a price update on the aggregate demand. This makesTâtonnement steps more consistent, and can dramaticallyreduce the number of steps required when some assets arevalued much more highly than others.

Although assets in our exchange have a smallest indivisi-ble unit, we treat them as fully divisible goods when runningTâtonnement. As such, redenominating the units of one assetdoes not change the structure of a problem instance. That isto say, selling one unit of an asset for price 𝑝 is equivalent toselling two halves of an asset, with each unit priced as 𝑝/2. Tomake our update rule invariant against redenomination, wemultiply asset amounts by prices. This gives a price updaterule of 𝑝𝑖 ← 𝑝𝑖 (1 +𝛿𝑝𝑖𝑍 (𝑝)). This last normalization brings asignificant performance improvement to Tâtonnement whensome assets units have very small valuations.One remaining question is the choice of the step size 𝛿 .

Codenotti et. al. in [30] show that there exists a polynomially-small 𝛿 such that taking a 𝛿-sized step always brings Tâton-nement closer to an equilibrium point. This 𝛿 is too small tobe effectively usable in practice (and the proof applies onlyto their additive price update rule). Instead, we choose a stepsize via a backtracking line search [22], a standard techniquein the optimization literature. When minimizing some func-tion, this technique picks the (approximately) largest stepsize possible such that a step in some descent direction ofthat step size leads to a point with a lower objective value.

Our computation problem lacks a convex objective to min-imize. We use as our objective the 𝑙2 norm of the aggregatedemand vector, weighted by asset price (

(Σ𝑁𝑖=1(𝑝𝑖𝑍 (𝑝)𝑖 )2)1/2).

There is not a theoretical motivation for this choice; rather,

we need a function that is smooth, easy to evaluate, nonneg-ative, and approaches 0 if and only if Tâtonnement is nearequilibrium. Qualitatively, many other heuristics (such asthe 𝑙2 norm squared, or other 𝑙𝑝 norms) perform similarly.

Specifically, in each round, we attempt to use the step sizeof the previous round. If this step does not cause the objectiveto increase too much, Tâtonnement takes a step of that sizeand increases the step size (line searches typically requirea nontrivial decrease in the objective. Tâtonnement doesnot appear to benefit from stricter conditions). Otherwise,Tâtonnement decreases step size and tries again.

4.3.3 ProblemNormalization. Many numerical optimiza-tion problems run most quickly when gradients are normal-ized (e.g. see [21]); Tâtonnement is no exception. The mainnormalization factors we have not yet taken into account arerelative differences in asset trade amounts. That is to say, Tâ-tonnement requires the fewest rounds of computation whenthe traded amounts of each asset, weighted by their prices,are roughly the same. We can normalize trade amounts bymultiplying by a normalizing constant; that is to say, if weknew in advance that one asset’s equilibrium trade volumewas 100x lower than another’s, we could multiply the as-set amounts of the less-traded asset by 100 when runningTâtonnement.

Of course, we cannot know the asset’s equilibrium tradevolume until we have computed an equilibrium. However,our exchange runs Tâtonnement in every block. In the realworld, most of the time, the traded volume of any asset staysroughly constant minute to minute. As such, we can derivenormalization constants based on the equilibria recordedin blocks in the recent past. We cannot necessarily rely onthis normalization in the case of market shocks, so we runsome copies of Tâtonnement using volume normalizationand some without. A real-world deployment could also in-corporate real-time price or volume data from other assetexchanges.Rather than choose one normalization strategy and step

size, our implementation runs several copies of Tâtonnementwithvarying strategies in parallel. As discussed in §5.2, there isa tradeoff between the number of different copies and thespeed of each copy. For details, see Appendix D

4.4 Rounding Tâtonnement via a Linear ProgramThe goal of Tâtonnement is to find a set of prices such thatthe linear program of Appendix C has a feasible solution.Rather than wait for Tâtonnement to reach what it knowsto be a feasible set of prices, we periodically check the linearprogram for feasibility directly. Solving a linear program ismore expensive than one round of Tâtonnement computa-tion, so we only check the linear program once every 1000rounds.

Solving the linear program after running Tâtonnement (asopposed to clearing offers via the raw output of Tâtonnement ,

6

e.g. according to the `-approximately optimal semantics dis-cussed in Appendix F) has some additional useful economicproperties.

First, we ensure that a minimal number of offers sell onlya fraction of their assets. Although assets in our exchangeare fungible currencies, we store assets as integers, so assetsnecessarily have some smallest indivisible unit. When ex-changes explicitly match offers, they often round in favorof one party at the expense of the other. However, offers inour exchange trade with a central market, so to avoid inad-vertently creating new copies of an asset, we must roundagainst every offer in favor of the market. Offers that exe-cute partially could be rounded against multiple times, sowe would like to minimize the total number of offers thatexecute only in part.

Secondly, the linear program ensures no more trades couldhappen at the market prices after executing a block of trades.

Finally, by giving as output, for each asset pair (𝐴, 𝐵), onlyan amount of 𝐴 traded in exchange for 𝐵, the exchange caneasily ensure that offers with the lowest minimum pricesare guaranteed to trade before offers with higher minimumprices.

5 System DesignWe implemented SPEEDEX (as evaluated in §6 and §7) usinga simple version of chain replication [51]; namely, one replicais declared the block producer, and the other replicas aredeclared block validators. Replicas are organized in a chain;when one replica accepts a block of transactions, it passes theblock on to the next replica in the line. We do not implementleader election. We use a simple consensus protocol becauseSPEEDEX does not rely on any specific consensus detail, andwe wish to measure precisely the performance our exchange,not a consensus protocol.

SPEEDEX is implemented using about 35000 lines of codewritten in C++20 (available open source at https://github.com/scslab/speedex), using Intel’s TBB library [7] for much ofthe parallel work coordination. We use the GNU Linear Pro-gramming Kit [46] for our linear program solver and LMDB[26] to manage data persistence and crash recovery.

5.1 Data OrganizationAccount balances are stored in a Merkle-Patricia trie. Butwhile batch trie updates and rehashes (given knowledge ofwhich entries are modified) can be done relatively quickly,individual random accesses can be slow. As such, we alsostore account balances in an in-memory vector, with updatespushed to the trie once per block.

For each pair of assets (𝐴, 𝐵), we build a trie storing offersselling asset 𝐴 in exchange for 𝐵. Finally, in each block, webuild a trie logging which accounts were modified.

Storing information in hashable tries both ensures thatnodes can efficiently compare database state (and catch er-rors) and prove to users information about the database state.Careful trie design accelerates several expensive databaseoperations (see §5.2, §5.3).

5.2 TâtonnementFractional values in Tâtonnement’s inner loops are repre-sented as fixed-point values, rather than floating-point val-ues. This substantially accelerates the arithmetic of eachround, and avoids (nonassociative) accumulation of floating-point operation error.The running times of Section 6 do not include times to

sort or preprocess offers. Naively sorting large lists takes along time; a fast sorting method is critical.

Note that our exchange runs on a blockchain, and thus (forreasons exogenous to Tâtonnement) it hashes its internalstate. We store offer minimum prices as fixed-point fractionalvalues. Therefore, we can use an offer’s price, written in big-endian, as the leading bits of the offer’s key within its trie.Constructing the trie thus automatically sorts offers by price.Additionally, the exchange executes offers with the lowestminimum prices; as such, the set of offers that execute inone round form a (small number of) dense subtrie(s). Postoffer execution, the remaining offers remain in a sorted order.Not re-sorting the entire list every block saves a substantialamount of time.

Sorting the set of newly created offers in a block can alsobe done largely in parallel. Each thread, for example, pri-vately accumulates tries recording newly created offers be-fore merging those tries into the main offer tries.

As mentioned in §4.3, there is a tradeoff between runningmany copies of Tâtonnement with different settings and theperformance of each copy of Tâtonnement. Each round ofTâtonnement requires many binary searches; as such, theruntime of one round of Tâtonnement largely depends onmemory latency and cache performance. Especially whenTâtonnement takes small steps (i.e, towards the end of Tâton-nement), one round reads almost exactly the same memorylocations as the previous round, and thus the cache missrate can be extremely low. More concurrent replicas of Tâ-tonnement mean more cache traffic and higher cache missrates.

Each of the binary searches in an aggregate demand queryruns independently, so parallelizing these searches acceler-ates each round of Tâtonnement. One primary thread com-putes price updates, waking helper threads when computingaggregate demand. However, each round of Tâtonnement isextremely fast even on one thread; for example, with 20 as-sets and tens of millions of offers to trade, one round requires100 to 200 microseconds of computation. Using the kernelto put helper threads to sleep when not in use introducessignificant synchronization overhead and can harm cacheperformance if helper threads migrate between cores. We

7



instead synchronize threads via spinlocks and a few memoryfences. In the 20 asset tests of §6, we see minimal benefitbeyond 4-6 helper threads, but this is enough to bring theaverage runtime per round down to 30 to 50 microseconds.Finally, as mentioned in §7, instead of directly using the

offer tries in aggregate demand queries, we precompute (byone pass over the offer trie) a list recording, for each uniqueprice, the total amount of the asset in question available forsale below the current price. Laying out all of the informa-tion for Tâtonnement contiguously in memory improves thecache performance of each aggregate demand query.

5.3 Fast Merkle-Patricia TriesSPEEDEX performs several expensive trie manipulations inevery block. These tries must often be written to or modifiedby many different threads. Designing trie manipulations tominimize memory contention was one of the most importantimplementation challenges for achieving scalability.We use tries with a fan-out of 16 (half a byte) and fixed-

length keys. Values only exist at the leaves, so we allocatepointers to child nodes and trie values as a union to save1-2 cache lines. Most trie operations require comparing twokeys to find the longest common prefix. Working with 64-bitintegers is faster than working with sequences of bytes.When iterating over a set of transactions, each thread

logs which accounts it modifies. We find it more efficientfor threads to build log tries privately, and then merge thesetries together after the iteration, than for threads to competefor a central log structure. One important optimization whenmerging a batch of tries is to re-divide the tries by prefixrange and merge each range separately. Merging tries con-sisting of disjoint key ranges is trivial, and different threadsresponsible for different prefix ranges never have to write tothe same memory locations.

Each trie node stores some metadata, such as the numberof leaves below a node. Tries that store offers also store thenumber of cancelled offers below each node and the totalamount of the asset offered for sale by offers below eachnode. Offers are cancelled via metadata updates (hardware-level atomics). The actual work of removing cancelled offerscan wait until after the exchange is done iterating over alist of transactions. This lets us avoid the more expensivesynchronization primitives we might need to concurrentlymodify a trie. The metadata makes finding and removingcancelled offers simple.

Worker threads also build a set of tries containing newlycreated offers. When one thread finishes its assigned work,it could immediately merge its private tries into the main setof offer tries. Instead, overall system performance improvesif threads wait (do other work) until all threads are doneworking, the privately created tries are all gathered together,and then the work of merging sets of tries is divided by assetpair.

The trie that records account state has the same key space(account ID) as the trie that records which accounts havebeen modified. This means that every node in the accountmodification trie has a corresponding node (with the sameprefix) in the account state trie. Updating and rehashingthe account database trie (operations that require lookingonly at accounts that were modified) can be done quicklyby assigning different threads to work on disjoint subtriesof the account modification trie. The correspondence thusmeans that each thread can work on disjoint subtries of theaccount state trie, and different threads will not contend forthe same piece of memory. Dividing work according to theaccount modification trie helps spreak work evenly acrossavailable CPU cores.

Leaves of the account modification trie contain the trans-actions sent by an account. This trie, therefore, implicitlysorts transactions by account ID. Block validation on sortedlists of transactions can be faster than on unsorted lists. Ifone account sends multiple transactions, these transactionsare likely to be handled by the same worker thread in thevalidator, reducing memory contention. Validators do notactually check to see if a block’s transactions are sorted,but an honest block producer could possibly get its blocksconfirmed slightly more quickly.

5.4 Data Storage and PersistenceProcessing transactions in a nondeterministic order makesrecovery from a database snapshot where a block has beenpartially applied quite difficult. Therefore, we would like asystem that can update all account balances modified in oneblock in one atomic transaction. We also want a databasewhere elements can be accessed directly in memory, andwhere database transactions support concurrent modifica-tion. Accounts in our database are frequently modified butonly created once. We therefore need a key-value store thatis designed primarily around supporting a mostly fixed setof fixed-length keys.

We did not find an existing database implementation wellsuited to this workload, so SPEEDEX uses a combinationof an in-memory cache and an ACID-compliant database(LMDB[26]). This combination suffices for our experiments,but construction of a database precisely designed for ourparticular workload is an interesting line of future work.

5.5 Block Headers Include Market EquilibriaChecking correctness of a market equilibrium is much fasterthan computing an equilibrium. Block producers, therefore,include in their block header a market equilibrium. This alsoallows nondeterminism in Tâtonnement (which we use whenwe run multiple instances of Tâtonnement in parallel withdifferent operational parameters).Block headers also include the transaction commission Y.

Fixed in the exchange protocol is a maximum commissionrate, and validators check that Y is smaller than themaximum.

8

The commission rate of one block might vary because ofthe linear programming step of price computation. Smallconstraint violations by the solving library (GLPK [46]), androunding error when converting between fixed-point andfloating-point values can cause a query with a commissionof 2−𝑘 to retur a solution with a 2−(𝑘−1) commission.We therefore run price computation at a target commis-

sion rate of half the maximum allowed rate. The factor twodifference is an artifact of our choice to represent Y by itsnegative log base 2 (which makes commission computationeasy).

We do not include the approximation parameter ` becauseTâtonnement does not always return a (Y, `)-approximateequilibrium. Furthermore, subtle differences between floating-point and fixed-point division can cause the LP solver toviolate the `-approximation guarantee (even when Tâton-nement has converged). In a real-world deployment, nodescan choose between competing blocks by the quality (i.emost trade volume) of the blocks’ equilibria.A node can plausibly deny misbehavior if it fails to oc-

casionally produce a (Y, `) equilibrium, but a node (or thebroader community) can detect if a node fails to producegood prices more than other nodes (and sanction the misbe-having node out of band). SPEEDEX, therefore, may benefitfrom a consensus model that includes a notion of trust rela-tionships or identity.Our block headers also include, for every pair of assets,

the minimum price of the offer with the highest minimumprice that trades in that block. Validators can compare theminimum price of a newly created offer with this marginalminimum price and decide immediately whether or not theoffer executes. Offers that are executed immediately neednot be loaded into the Merkle-Patricia tries.

5.6 Exclusion of Failed TransactionsIf a block of transaction contained a pair of conflicting trans-actions, then a validator would need some extra conflict-resolution information to determine which transaction suc-ceeds and which fails. Most blockchains use transaction or-dering for conflict-resolution.SPEEDEX leaves the choice of conflicting transactions

to the block producer. Specifically, the block producer isresponsible for identifying a set of transactions such thatthe entire block is valid (all transactions are valid, and afterprocessing the block, all account balances are nonnegative).Failed transactions are no-ops, so rather than send bothtransactions with a conflict-resolution bit, the block produceromits failed transactions to save network bandwidth andvalidator CPU time.

5.7 Sequence NumbersTo ensure each transaction executes only once, users marktheir transactions with sequence numbers. Many existing

blockchains require the sequence numbers from one accountto increase strictly sequentially.Our implementation allows small gaps in sequence num-

bers. This makes it easier for a block producer to accept manytransactions from one account in one block. Also, when vali-dating a block, a worker thread that finds a sequence numbergap cannot easily determine whether a transaction that fillsthe gap is present in said block.We also require that sequence numbers increase by at

most 64 within one block. The choice of the constant 64is arbitrary. However, limiting sequence numbers by somefixed constant enables a replica to track sequence numberswith only a bitvector and hardware-level atomics.

5.8 Side Effect VisibilitySPEEDEX adjusts asset amounts only through hardware-level atomic operations (i.e. operations on 8-byte chunks).One thread might read database writes from another thread’sonly partially processed transaction. Additions and subtrac-tions to a counter commute, but a block producer must takecare to ensure that partial database writes do not cause erro-neous acceptance of an invalid transaction.

Many operations in a transaction might fail if an accountbalance is too low; none fail, however, if an account balanceis too high. If some candidate transaction fails and is rolledback, but this candidate transaction increased an accountbalance in the shared database, some other thread could haveread the incorrectly high balance and accepted an invalidtransaction. As such, when building a new block of trans-actions, decreases to account balances are written to thedatabase immediately, while increases are buffered privatelyuntil the transaction is guaranteed to commit. Validators canskip this step.

Other transaction side effects are not made visible until theentire block of transactions commits. Thus, an offer cannotbe created and cancelled in the same block, and accountscannot send transactions in the block in which they arecreated.

6 Empirical Evaluation of TâtonnementWe find that the runtime of Tâtonnement depends primarilyon the target approximation accuracy, the number of opentrade offers, and the distribution of the open trade offers. Theruntime of Tâtonnement increases as the desired accuracyincreases (as Y and ` decrease). Surprisingly, the runtimeactually decreases as the number of open offers increases.And as discussed, Tâtonnement performs best when thevolume traded of each asset is roughly the same.

Our goal is to run Tâtonnement once in every block ofour exchange. If our goal is to produce one block every fewseconds, then Tâtonnement should run in well under onesecond most of the time. Unless otherwise specified, our im-plementation runs Tâtonnement with a timeout of 2 seconds.

9

Figure 1. Runtime of Tâtonnement on instances with500,000 offers and 20 assets with varying approximationparameters. The horizontal axis plots the commission rate(− log2(Y)), while each line represents a different accuracy ofapproximation for optimal offer behavior (− log2(`)).

6.1 Approximation Accuracy and Orderbook SizeFigure 1 plots the runtimes of Tâtonnement on a set of500,000 offers generated from a synthetic model (AppendixG), with a variety of accuracy targets. Runtimes are averagedover 5 runs. Any target approximation level where not everyrun fully converged within the time limit is excluded fromthe graph. Asset trade volumes are not normalized, but theyare generated so that equilibrium trade amounts are roughlyequal.

For comparison, BinanceDex [2] charges either a 0.1% fee(− log2(0.001) = 10) or a 0.04% fee (− log2(0.0004) = 11.3)[1]. Uniswap [14] charges a 0.3% fee (− log2(0.003) = 8.4).Coinbase charges between 0.5% and 4%, depending on thetransaction [4]. A comparable target of Y = ` = 10 is feasiblefor Tâtonnement. Runtimes increase slowly as target accu-racy increases, until a point where required runtime startsto rise dramatically.

As discussed in §4.3.1, Tâtonnement works best when theaggregate demand function has relatively few sharp deriva-tives. For small `, the behavior of one offer is almost exactlya step function; at ` = 0, an offer executes if and only if mar-ket prices are above the offer’s minimum price. The stepsof Tâtonnement often cross these minimum price thresh-olds. When there are few offers, crossing a single offer’sthreshold causes a comparatively large change in the ag-gregate demand of the market. This can mean that evenvery small Tâtonnement steps will overshoot an equilib-rium, leading to oscillation. Alternatively, as discussed in§4.3.2, Tâtonnement’s choice of step size is heuristic-guided.When very small Tâtonnement steps cause comparatively

Figure 2. Minimum number of offers for Tâtonnement torun in under 0.25 seconds (times averaged over 5 runs).

large heuristic changes (especially heuristic increases), Tâ-tonnement might decrease its step size down to its minimumbefore proceeding. The more frequently this happens, theslower Tâtonnement runs.In other words, Tâtonnement works best when there are

many offers (so each offer is a relatively small part of thetotal volume on the exchange) or when ` is large (in thegraphs, when − log2(`) is small). In the real world, SPEEDEXinstances with more participants might be able to run Tâton-nement to a more accurate level than less active instances.Figure 1 is running on an easy case for Tâtonnement.

There are many open offers, and the total volume traded foreach asset (price times amount) is roughly equal. Holdingthe offer distribution constant, Figure 2 displays the tradeoffbetween Tâtonnement runtime, approximation accuracy, andthe number of open trade offers. Each cell represents a onesetting of the approximation parameters (Y, `), and contains(an upper bound on) the minimum number of offers requiredfor Tâtonnement to run in under 0.25 seconds and producean (Y, `)-approximate equilibrium.In a time period when very few offers are sent to the ex-

change, the exchange might not be able to find prices suchthat the maximum possible number of offers can find a coun-terparty. When fewer offers clear in one block, more offersare left to the next block, which may make Tâtonnement eas-ier in the next block.

Note that while very small ` or Y requires a large numberof offers to trade, reasonable approximation settings such asY = ` = 2−10 or 2−15 do not.

10

6.2 Imbalanced DistributionsWe demonstrate in Appendix H that the performance ofTâtonnement is robust to varying the parameters of our datageneration model.In the real world, most trading data involves a reserve

currency (i.e. buying or selling assets denominated in USD),or, in the cryptocurrency world, have very low transactionrates. As such, there does not appear to be a good set of dataon which we can directly test Tâtonnement. As a next-bestalternative, we generate a dataset for Tâtonnement based offof historical price and market volume data.

We took the 20 assets that had the largest market volumeon April 29, 2021 (as reported by coingecko.com)1 and foreach asset, gathered 500 days of price and trade volume his-tory. We then generated 500 batches of 100,000 transactions.A new offer in batch 𝑖 sells asset 𝐴 (and buys asset 𝐵) withprobability proportional to the relative volume of asset 𝐴(and asset 𝐵, conditioned on 𝐵 ̸= 𝐴) on day 𝑖 , and demands aminimum price close to the real-world market price on day𝑖 . We also set our block size to be only 50,000 (as comparedto the 500,000+ transaction blocks of §7). This increases thedifficulty of Tâtonnement.

This dataset is particularly challenging for Tâtonnement,in that many assets are comparatively rarely traded, andcryptocurrency asset volumes and prices fluctuate signif-icantly. 3 of the currencies did not exist (i.e. 0 trade vol-ume) in the first half of the experiment. Our experimentran for 820 blocks, and Tâtonnement did not converge to a(Y = 2−10 ≈ 0.1%, ` = 2−7 ≈ 1%) equilibrium in 31 blocks.

When Tâtonnement does not quickly converge, its outputcandidate prices are close enough to a market equilibriumthat the linear programming step can still match a largeamount of trade volume. To quantify the effect of Tâton-nement nonconvergence on trade volume, we compute ineach block a “price asymmetry metric.” For each asset pair (𝐴,𝐵), we compute the minimum `𝐴,𝐵 such that all offers selling𝐴 for 𝐵 with minimum price below (1− `𝐴,𝐵)(𝑝𝐴/𝑝𝐵) execute.The price asymmetry metric is the volume-weighted averageof these `𝐴,𝐵s. Note that Tâtonnement convergence is akinto guaranteeing that `𝐴,𝐵 ≤ ` for all (𝐴, 𝐵).

In blocks where Tâtonnement converges, the price asym-metry average is 0.1% (the maximum observed was 0.2%),while when Tâtonnement does not converge within the timelimit, the price asymmetry average is 0.2% (with a maximumof 1.2%). In other words, even when Tâtonnement does notquickly converge on particularly difficult datasets, the outputprices are still close enough to equilibrium that the linearprogramming phase can match most of the open offers.

1The currencies are CAKE, ADA, BCH, BNB, BTC, DOGE, ETH, ETC, HT,LTC, MATIC, TRX, UNI, USDT, VET, XRP, ZEC, and USDT. We droppedBUSD because the list already included two other USD stablecoins, onewith lower volume and one with higher volume.

Qualitatively, Tâtonnement does not converge quicklywhen one asset is highly volatile. Tâtonnement may mispricethis volatile asset for a short time, but the other assets arerelatively unaffected.

7 Scalability EvaluationWe ran SPEEDEX on a cluster of four nodes in a Cloudlab[38] datacenter. One machine is designated the block pro-ducer, and the other machines validate blocks of transactionscreated by the producer. Each machine (Cloudlab’s machinetype “rs440”)has two 16-core Intel Xeon Gold 6130 CPUsrunning at 2.10 Ghz with hyperthreading enabled, 192 GBof memory, and a 240GB SSD (Intel series S4600) using theXFS filesystem.

For this experiment, we generated synthetic transactiondata according to the data model outlined in Appendix G.Each set of 100,000 transactions is generated as though theassets have some underlying valuations, and these valuationsare modified (akin to a Brownian motion) after every set.We generate 5000 sets of transactions, which are read intoa “mempool” by the block producer. The block producerattempts to keep the size of this mempool at about 2 million,reading in new sets of transactions as necessary, and theexperiments ran until all sets of transactions were exhausted.To show performance with as many open trade offers aspossible, accounts in this simulation do not cancel old offers.A small number of transactions in the dataset were generatedto be invalid noise.

The block producer targets each block to include 500,000to 600,000 transactions. To build blocks quickly, we do nottarget an exact amount; instead, threads “reserve” space toadd transactions to a new block, and make a best (but notperfect) effort to “release” unused space back to other threads(if, for example, a thread finds many failed transactions).This means our implementation builds smaller blocks whenoperating with many threads. When running with 4 threads,blocks included roughly 600,000, transactions, while whenrunning with 64 threads, blocks included 540,000-580,000transactions.Most of the expensive work is done by a dedicated set of

worker threads. To demonstrate scalability, we measure theruntime of the nodes in our exchange with varying numbersof these worker threads. This may artificially improve theperformance of the experiments using fewer threads. Ourimplementation is designed around having access to manycpu cores, so running SPEEDEX unmodified on a systemwith few CPU cores would not be a fair comparison.

Figure 3 displays the time that the block producer nodetook to produce each block in our experiments. Users trade20 different assets, and Tâtonnement is run with approxi-mation parameters (Y = 2−15 and ` = 2−10). The exchange’saccount database contains 10 million accounts at the start ofthe experiment. Trade volumes are roughly the same across

11

coingecko.com

0.0M 10.0M 20.0M 30.0M 40.0M 50.0M 60.0M 70.0M#Open Offers (Millions)

5

10

15

20

25

30

Runtime (s)

#Threads48163264

Figure 3. Runtime of block producer node, with varyingnumbers of worker threads, plotted over the number of openoffers on the exchange.

different assets, and prices vary as data is generated. Tâton-nement did not time out in any block during this experiment.The x-axis of Figure 3 is the number of open offers on

the exchange. The number of open offers on the exchangeis the biggest factor (other than the number of transactionsin a block) determining a block’s production time. Figure 3demonstrates that SPEEDEX can scale to support very largenumbers of open offers to trade, and that its performancebenefits from additional CPU cores.Figure 3 also displays a negative side effect of our data

persistence mechanism. The time to produce a block whenrunning with many threads occasionally is very large. Dur-ing these spikes, the node is doing nothing but waiting topersist a snapshot of the account database to disk. This snap-shot is produced every five rounds, and the disk writingis continually performed in the background. Running withfewer threads does not help the disk operate faster or givethe disk less work, but simply gives the disk more time towork.

Our experiments generate a sustained write workloadof 200-230 MB/s, which is roughly the speed our of SSD inisolated benchmarks. We were unable to install faster storagedevices on our machines, but standard techniques, such asstriping data over multiple, faster drives, could alleviate thisbottleneck. As discussed in §5.4, existing databases are notdesigned for our workload. Scaling to 100 million accountsor a billion accounts could require a new database design.The increase in runtime associated with an increase in

the number of open offers stems from a Tâtonnement op-timization. The one part of SPEEDEX that we cannot eas-ily accelerate by buying fancier hardware is Tâtonnement,and so we design our implementation towards making Tâ-tonnement as fast as possible. When running an aggregate


2.5

5.0

7.5

10.0

12.5

15.0

17.5

20.0

Runtime (s)

#Threads48163264

Figure 4. Runtime of block validator node, with varyingnumbers of worker threads, plotted over the number of openoffers on the exchange.

demand query, Tâtonnement could query the tries that storeoffers. Instead, Tâtonnement queries a prebuilt list, laid outcontiguously in memory, of tuples denoting “at price 𝑝 , theamount of the asset available for sale is 𝑥 .” This gives mem-ory accesses within one Tâtonnement query much bettercache locality. An implementation might choose to skip thisexpensive precomputation in some parameter regimes.

Nodes can always skip this work when validating a blockproduced by a different node. Figure 4 plots the runtimes fora node validating blocks of transactions. Note that validatorslargely avoid the increase in runtime associated with moreoffers open on the exchange. Not included is time idlingwhile waiting for the block producer to produce blocks.

Figure 4 plots the runtime for the first validator node inthe chain. The other validator nodes perform similarly.

These numbers do not include signature verification. Trans-actions are signed and the account database stores publickeys; our implementation simply does not actually checkthe signature. The block producer node can check signatureson incoming transactions before adding them to its mem-pool, and a block validator can check signatures entirelyin parallel to the rest of the block validation logic. Signa-ture checking could even be offloaded to a separate physicalmachine. Checking the signatures on 500,000 transactionson our hardware (including looking up public keys in ourdatabase) takes 1.5-2 seconds.

Note that the transaction rate of this system is limited bythe slowest replica, i.e. the block producer. Figure 5 plotsthe end-to-end transaction rate of our system. When usingthe full CPU resources of our hardware, our exchange canprocess more than 120,000 transactions per second.

For comparison,Wang et. al [52] benchmark the raw trans-action throughput of the Ethereum Virtual Machine; with

12


50000

100000

150000

200000

Tran

sact

ions

Per

Sec

ond

#Threads48163264

Figure 5. Transactions per second on SPEEDEX with vary-ing numbers of worker threads, plotted over the number ofopen offers on the exchange.

10,000 open accounts, the EVM processed transactions thatjust transfer tokens between accounts at a rate of only 13thousand per second (although this number includes signa-ture checking). Transactions with more complex logic areprocessed substantially more slowly.

A worst-case lower bound is that if Tâtonnement were toalways time out, our top-line transaction rate would fall tobetween 80,000 and 150,000 transactions per second.The persistence caveats of figure 3 also apply here. If

SPEEDEX is limited to not run faster than our SSD, it wouldstill achieve at least 70-100,000 transactions per second.

8 ConclusionThis work presents SPEEDEX, a fully on-chain distributed as-set exchange capable of processing more than 100,000 trans-actions per second on 32-core servers with tens of millionsof open offers. It achieves scalability by designing transac-tion semantics and an offer matching system so that trans-actions in a block commute; this lets the exchange effec-tively use many CPU cores while operating deterministically.SPEEDEX also displays several economic properties of inde-pendent interest, including elimination of front-running andof (internal) arbitrage.

AcknowledgmentsThis research was supported by the Stanford Future of Digi-tal Currency Initiative, the Stanford Center for BlockchainResearch, and the Army Research Office.

References[1] [n.d.]. Binance Chain Docs - Fees. https://docs.binance.org/guides/

concepts/fees.html Accessed 04/10/2021.

[2] [n.d.]. Binance Chain Docs - Match Engine. https://docs.binance.org/match.html Accessed 03/14/2021.

[3] [n.d.]. Binance Chain Explorer. Accessed 03/14/2021.[4] [n.d.]. Coinbase Pricing and Fees Disclosures. https:

//help.coinbase.com/en/coinbase/trading-and-funding/pricing-and-fees/fees Accessed 04/10/2021.

[5] [n.d.]. Eth2 Shard Chains. https://ethereum.org/en/eth2/shard-chains/ Accessed 03/11/2021.

[6] [n.d.]. An Exchange Protocol for the DecentralizedWeb. ([n. d.]). https://docs.gnosis.io/protocol/docs/introduction1/ and https://github.com/gnosis/dex-research/blob/08204510e3047c533ba9ee42bf24f980d087fa78/dFusion/dfusion.v1.pdf and https://github.com/gnosis/dex-research/blob/master/BatchAuctionOptimization/batchauctions.pdf.

[7] [n.d.]. Intel oneAPI Threading Building Blocks. https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/onetbb.html Accessed 5/6/2021.

[8] [n.d.]. Optimistic Rollups. https://docs.ethhub.io/ethereum-roadmap/layer-2-scaling/optimisticrollups/ Accessed 03/11/2021.

[9] [n.d.]. STARKEx. https://starkware.co/product/starkex/[10] [n.d.]. Stellar. https://www.stellar.org/[11] [n.d.]. ZK Rollups. https://docs.ethhub.io/ethereum-roadmap/layer-

2-scaling/zk-rollups/ Accessed 03/11/2021.[12] 2018. Loopring: A Decentralized Token Exchange Protocol. (8 Septem-

ber 2018). https://loopring.org/resources/enwhitepaper.pdf[13] 2021. GPv2 Objective Criterion. https://forum.gnosis.io/t/gpv2-

objective-criterion/1254 Accessed 04/30/2021.[14] Hayden Adams, Noah Zinsmeister, and Dan Robinson. 2020. Uniswap

v2 core. (2020).[15] Guillermo Angeris, Hsien-Tang Kao, Rei Chiang, Charlie Noyes, and

Tarun Chitra. 2019. An analysis of Uniswap markets. CryptoeconomicSystems Journal (2019).

[16] Parwat Singh Anjana, Sweta Kumari, Sathya Peri, Sachin Rathor, andArchit Somani. 2019. An efficient framework for optimistic concurrentexecution of smart contracts. In 2019 27th Euromicro InternationalConference on Parallel, Distributed and Network-Based Processing (PDP).IEEE, 83–92.

[17] Kenneth J Arrow, Henry D Block, and Leonid Hurwicz. 1959. On thestability of the competitive equilibrium, II. Econometrica: Journal ofthe Econometric Society (1959), 82–109.

[18] Kenneth J Arrow and Gerard Debreu. 1954. Existence of an equilibriumfor a competitive economy. Econometrica: Journal of the EconometricSociety (1954), 265–290.

[19] Massimo Bartoletti, Letterio Galletta, and Maurizio Murgia. 2020. Atrue concurrent model of smart contracts executions. In InternationalConference on Coordination Languages and Models. Springer, 243–260.

[20] Eli Ben-Sasson, Iddo Bentov, Yinon Horesh, and Michael Riabzev.2018. Scalable, transparent, and post-quantum secure computationalintegrity. (2018).

[21] Michele Benzi. 2002. Preconditioning techniques for large linear sys-tems: a survey. Journal of computational Physics 182, 2 (2002), 418–477.

[22] Stephen P Boyd and Lieven Vandenberghe. 2004. Convex optimization.Cambridge university press.

[23] Eric Budish, Peter Cramton, and John Shim. 2015. The high-frequencytrading arms race: Frequent batch auctions as amarket design response.The Quarterly Journal of Economics 130, 4 (2015), 1547–1621.

[24] Miguel Castro and Barbara Liskov. 1999. Practical byzantine faulttolerance. In 3rd Symposium on Operating Systems Design and Imple-mentation. New Orleans, LA, 173–186.

[25] Xi Chen, Dimitris Paparas, and Mihalis Yannakakis. 2017. The com-plexity of non-monotone markets. Journal of the ACM (JACM) 64, 3(2017), 1–56.

[26] Howard Chu and Symas Corporation. [n.d.]. Lightning Memory-Mapped Database Manager (LMDB). http://www.lmdb.tech/doc/

13

https://docs.binance.org/guides/concepts/fees.html

https://docs.binance.org/guides/concepts/fees.html

https://docs.binance.org/match.html

https://docs.binance.org/match.html

https://help.coinbase.com/en/coinbase/trading-and-funding/pricing-and-fees/fees



https://ethereum.org/en/eth2/shard-chains/

https://ethereum.org/en/eth2/shard-chains/

https://docs.gnosis.io/protocol/docs/introduction1/

https://docs.gnosis.io/protocol/docs/introduction1/

https://github.com/gnosis/dex-research/blob/08204510e3047c533ba9ee42bf24f980d087fa78/dFusion/dfusion.v1.pdf



https://github.com/gnosis/dex-research/blob/master/BatchAuctionOptimization/batchauctions.pdf

https://github.com/gnosis/dex-research/blob/master/BatchAuctionOptimization/batchauctions.pdf

https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/onetbb.html



https://docs.ethhub.io/ethereum-roadmap/layer-2-scaling/optimistic_rollups/

https://docs.ethhub.io/ethereum-roadmap/layer-2-scaling/optimistic_rollups/

https://starkware.co/product/starkex/

https://www.stellar.org/

https://docs.ethhub.io/ethereum-roadmap/layer-2-scaling/zk-rollups/

https://docs.ethhub.io/ethereum-roadmap/layer-2-scaling/zk-rollups/

https://loopring.org/resources/en_whitepaper.pdf

https://forum.gnosis.io/t/gpv2-objective-criterion/1254

https://forum.gnosis.io/t/gpv2-objective-criterion/1254

http://www.lmdb.tech/doc/

Accessed 04/29/2021.[27] Austin T Clements, M Frans Kaashoek, Nickolai Zeldovich, Robert T

Morris, and Eddie Kohler. 2015. The scalable commutativity rule: De-signing scalable software for multicore processors. ACM Transactionson Computer Systems (TOCS) 32, 4 (2015), 1–47.

[28] Dan Cline, Thaddeus Dryja, and Neha Narula. [n.d.]. ClockWork: AnExchange Protocol for Proofs of Non Front-Running. ([n. d.]).

[29] Bruno Codenotti, Benton McCune, and Kasturi Varadarajan. 2005.Market equilibrium via the excess demand function. In Proceedingsof the thirty-seventh annual ACM symposium on Theory of computing.74–83.

[30] Bruno Codenotti, Sriram V Pemmaraju, and Kasturi R Varadarajan.[n.d.]. On the polynomial time computation of equilibria for certainexchange economies.

[31] Philip Daian, Steven Goldfeder, Tyler Kell, Yunqi Li, Xueyuan Zhao,Iddo Bentov, Lorenz Breidenbach, and Ari Juels. 2019. Flash boys 2.0:Frontrunning, transaction reordering, and consensus instability indecentralized exchanges. arXiv preprint arXiv:1904.05234 (2019).

[32] Nikhil R Devanur, Jugal Garg, and László A Végh. 2016. A rationalconvex program for linear Arrow-Debreu markets. ACM Transactionson Economics and Computation (TEAC) 5, 1 (2016), 1–13.

[33] Nikhil R Devanur and Vijay V Vazirani. 2003. An improved approxi-mation scheme for computing Arrow-Debreu prices for the linear case.In International Conference on Foundations of Software Technology andTheoretical Computer Science. Springer, 149–155.

[34] Steven Diamond and Stephen Boyd. 2016. CVXPY: A Python-embedded modeling language for convex optimization. The Journal ofMachine Learning Research 17, 1 (2016), 2909–2913.

[35] Thomas Dickerson, Paul Gazzillo, Maurice Herlihy, and Eric Koskinen.2019. Adding concurrency to smart contracts. Distributed Computing(2019), 1–17.

[36] Alexander Domahidi, Eric Chu, and Stephen Boyd. 2013. ECOS: AnSOCP solver for embedded systems. In 2013 European Control Confer-ence (ECC). IEEE, 3071–3076.

[37] Ran Duan and Kurt Mehlhorn. 2015. A combinatorial polynomialalgorithm for the linear Arrow–Debreu market. Information and Com-putation 243 (2015), 112–132.

[38] Dmitry Duplyakin, Robert Ricci, Aleksander Maricq, Gary Wong,Jonathon Duerig, Eric Eide, Leigh Stoller, Mike Hibler, David John-son, Kirk Webb, Aditya Akella, Kuangching Wang, Glenn Ricart,Larry Landweber, Chip Elliott, Michael Zink, Emmanuel Cecchet,Snigdhaswin Kar, and Prabodh Mishra. 2019. The Design and Opera-tion of CloudLab. In Proceedings of the USENIX Annual Technical Confer-ence (ATC). 1–14. https://www.flux.utah.edu/paper/duplyakin-atc19

[39] Jugal Garg and László A Végh. 2019. A strongly polynomial algorithmfor linear exchange markets. In Proceedings of the 51st Annual ACMSIGACT Symposium on Theory of Computing. 54–65.

[40] Yossi Gilad, Rotem Hemo, Silvio Micali, Georgios Vlachos, and Nick-olai Zeldovich. 2017. Algorand: Scaling Byzantine Agreements forCryptocurrencies. In Proceedings of the 26th Symposium on Operat-ing Systems Principles (Shanghai, China) (SOSP ’17). Association forComputing Machinery, New York, NY, USA, 51–68. https://doi.org/10.1145/3132747.3132757

[41] Eyal Hertzog, Guy Benartzi, and Galia Benartzi. 2018. Bancor Protocol.(2018).

[42] Kamal Jain. 2007. A polynomial time algorithm for computing anArrow–Debreu market equilibrium for linear utilities. SIAM J. Comput.37, 1 (2007), 303–318.

[43] Kamal Jain, Mohammad Mahdian, and Amin Saberi. 2003. Approx-imating market equilibria. In Approximation, Randomization, andCombinatorial Optimization.. Algorithms and Techniques. Springer, 98–108.

[44] Mahimna Kelkar, Fan Zhang, Steven Goldfeder, and Ari Juels. 2020.Order-fairness for byzantine consensus. In Annual International Cryp-tology Conference. Springer, 451–480.

[45] Marta Lokhava, Giuliano Losa, David Mazières, Graydon Hoare, Nico-las Barry, Eli Gafni, Jonathan Jove, Rafał Malinowsky, and Jed McCaleb.2019. Fast and Secure Global Payments with Stellar. In Proceedings ofthe 27th ACM Symposium on Operating Systems Principles (Huntsville,Ontario, Canada) (SOSP ’19). Association for Computing Machinery,New York, NY, USA, 80–96. https://doi.org/10.1145/3341301.3359636

[46] Andrew Makhorin. 2008. GLPK (GNU linear programming kit).http://www.gnu.org/s/glpk/glpk.html (2008).

[47] Satoshi Nakamoto. 2008. Bitcoin: A peer-to-peer electronic cash sys-tem. http://bitcoin.org/bitcoin.pdf.

[48] Joseph Poon and Vitalik Buterin. 2017. Plasma: Scalable autonomoussmart contracts. White paper (2017), 1–47.

[49] Joseph Poon and Thaddeus Dryja. 2016. The bitcoin lightning network:Scalable off-chain instant payments.

[50] Vikram Saraph and Maurice Herlihy. 2019. An empirical study ofspeculative concurrency in ethereum smart contracts. arXiv preprintarXiv:1901.01376 (2019).

[51] Robbert Van Renesse and Fred B Schneider. 2004. Chain Replicationfor Supporting High Throughput and Availability.. In OSDI, Vol. 4.

[52] Gerui Wang, Shuo Wang, Vivek Bagaria, David Tse, and PramodViswanath. 2020. Prism removes consensus bottleneck for smartcontracts. In 2020 Crypto Valley Conference on Blockchain Technology(CVCBT). IEEE, 68–77.

[53] Will Warren and Amir Bandeali. 2017. 0x: An open protocol for de-centralized exchange on the Ethereum blockchain. (2017).

[54] Yinyu Ye. 2008. A path to the Arrow–Debreu competitive marketequilibrium. Mathematical Programming 111, 1-2 (2008), 315–348.

[55] Maofan Yin, Dahlia Malkhi, Michael K. Reiter, Guy Golan Gueta, andIttai Abraham. 2019. HotStuff: BFT Consensus with Linearity andResponsiveness. In Proceedings of the 2019 ACM Symposium on Prin-ciples of Distributed Computing (Toronto ON, Canada) (PODC ’19).Association for Computing Machinery, New York, NY, USA, 347–356.https://doi.org/10.1145/3293611.3331591

[56] Wei Yu, Kan Luo, Yi Ding, Guang You, and Kai Hu. 2018. A ParallelSmart Contract Model. In Proceedings of the 2018 International Confer-ence on Machine Learning and Machine Intelligence. 72–77.

[57] Yunhao Zhang, Srinath Setty, Qi Chen, Lidong Zhou, and LorenzoAlvisi. 2020. Byzantine Ordered Consensus without Byzantine Oli-garchy. In 14th {USENIX} Symposium on Operating Systems Designand Implementation ({OSDI} 20). 633–649.

Appendix A Arrow-Debreu ExchangeMarkets

Internally, SPEEDEX views a set of trade offers as an Arrow-Debreu exchange market [18]; the clearing price computa-tion problem maps to the Arrow-Debreu market equilibriumcomputation problem.

The Arrow-Debreu Exchange Market is a well studied con-cept, and as such there are many variants on the definition.Relevant to this work is a simple version of the model, wherethere are no stock dividends and no production. Further, ourexchange will look only at snapshots of the market, (i.e. onceper block), and thus we do not consider time-varying modelsor strategic agents.Agents in an exchange market come to the market with

some set of goods. They sell their goods to the market atthe market’s prices in exchange for “money,” which they

14

https://www.flux.utah.edu/paper/duplyakin-atc19

https://doi.org/10.1145/3132747.3132757

https://doi.org/10.1145/3132747.3132757

https://doi.org/10.1145/3341301.3359636

http://bitcoin.org/bitcoin.pdf

https://doi.org/10.1145/3293611.3331591

immediately spend at the market to buy their preferred setof goods. Their preferred set of goods is determined by theagent’s utility function. Specifically, agents buy the collectionof goods that maximizes their utility subject to their budgetconstraint.SPEEDEX users do not submit utility functions to the

exchange. But most natural offer types implicitly encode asimple utility function. Recall the sell offer.

Definition A.1 (Sell Offer). A Sell Offer (𝑆 , 𝐵, 𝑒 , 𝛼) is requestto sell 𝑒 units of good 𝑆 in exchange for some number 𝑘 unitsof good 𝐵, subject to the condition that 𝑘 ≥ 𝛼𝑒 .

The user who submits this offer implicitly says that theyvalue 𝑘 units of 𝐵 more than 𝑒 units of 𝑆 if and only if 𝑘 ≥ 𝛼𝑒 .Thus, the user’s preferences are representable as a simpleutility function.

Example A.2 (Sell Offer). Suppose a user submits a selloffer (𝑆 , 𝐵, 𝑒 , 𝛼). The optimal behavior of this offer (and theuser’s implicit preferences) is equivalent to maximizing thefunction 𝑢(𝑥𝑆 , 𝑥𝐵) = 𝛼𝑥𝐵 + 𝑥𝑆 (for 𝑥𝑆 , 𝑥𝐵 amounts of goods 𝑆and 𝐵).

Moreover, the utilites of agents derived from sell offers arelinear, and every offer has a nonzero valuation of the goodwhich it sells. This means that our market instances satisfycondition (*) of Devanur et. al [32], and thus clearing pricesalways exist.

To restate the earlier intuition, anArrow-Debreu ExchangeMarket (as discussed in this work) is the following:

DefinitionA.3 (Arrow-Debreu ExchangeMarket). AnArrow-Debreu Exchange Market consists of a set of goods [𝑁 ] anda set of agents [𝑀]. Every agent 𝑗 has a utility function 𝑢 𝑗

and an endowment 𝑒 𝑗 ∈ R𝑁≥0.When the market trades at prices 𝑝 ∈ R𝑁≥0, every agent

sells their endowment to the market in exchange for revenue𝑠 𝑗 = 𝑝 ·𝑒 𝑗 , which the agent immediately spends at the marketto buy back an optimal bundle of goods 𝑥 𝑗 ∈ R𝑁≥0 - that is,𝑥 𝑗 = arg max𝑥 :Σ𝑥𝑖𝑝𝑖 ≤𝑠 𝑗 𝑢 𝑗 (𝑥 ).

Definition A.4 (Market Equilibrium). An equilibrium of anArrow-Debreu market is a set of prices 𝑝 and an allocation𝑥 𝑗 for every agent 𝑗 , such that for all goods 𝑖 , Σ𝑗𝑒𝑖 𝑗 ≥ Σ𝑗𝑥𝑖 𝑗 ,and 𝑥 𝑗 is an optimal bundle for agent 𝑗 .

SPEEDEX uses two notions of approximation. The transac-tion commission Y maps onto exchange markets in a naturalway (agents receive only a (1 − Y) fraction of what they pur-chase). Our requirements for optimal offer execution (`) areslightly different from what typically appears in the theoret-ical literature.

Common to the theoretical literature (e.g. Definition 1 of[29]) is the following definition:

Definition A.5 ((`)-Approximately Optimal Bundle). Sup-pose an agent has utility 𝑢(·) and initial endowment 𝑒 , and

the market prices are 𝑝 . Let 𝑥∗ maximize 𝑢(·) subject to𝑥∗ · 𝑝 ≤ (𝑒 · 𝑝).

A bundle 𝑦 is (`)-approximate if 𝑢(𝑦) ≥ 𝑢(𝑥∗)(1 − `)

This definition is not strong enough for a real-world ex-change. In particular, it does not disallow execution of offerswith minimum prices higher than the equilibrium prices.Such orders should remain unchanged until market pricesrise.SPEEDEX uses the following stronger definition of ap-

proximately optimal behavior.

Definition A.6 (Strongly (Y, `)-Approximately Optimal Bun-dle for a Sell Offer). Consider a Sell offer selling asset 𝑒 unitsof 𝐴 in exchange for asset 𝐵 with minimum price 𝛼 .

Its behavior in amarket equilibriumwith prices 𝑝 is Strongly(`, Y)-Approximately Optimal if one of the following condi-tions holds.

1. If 𝑝𝐴/𝑝𝐵 > 𝛼 , the offer executes fully. That is, theoffer sells 𝑒 units of 𝐴 to the market, in exchange for𝑒(𝑝𝐴/𝑝𝐵)(1 − Y) units of 𝐵.

2. If 𝑝𝐴/𝑝𝐵(1 − `) < 𝛼 , the offer does not execute. Thatis, the offer retains its entire stock of 𝑒 units of 𝐴.

3. Otherwise, the offer executes partially at some rate_ ∈ [0, 1]. That is, the offer sells _𝑒 units of 𝐴 to themarket, in exchange for _𝑒(𝑝𝐴/𝑝𝐵)(1 − Y) units of 𝐵.

We say that an agent in an Arrow-Debreu market that isderived from an Sell Offer gets a strongly (`, Y)-approximatelyoptimal bundle if its bundle is strongly-(`, Y)-approximatelyoptimal for the original Sell Offer.

Combining these together gives the following definition.

DefinitionA.7 (Strongly (Y, `)-ApproximateMarket Equilib-rium). A strongly (Y, `)-aproximate equilibrium of an Arrow-Debreu market is a set of prices 𝑝 and an allocation 𝑥 𝑗 forevery agent 𝑗 , such that for all goods 𝑖 , Σ𝑗𝑒𝑖 𝑗 ≥ Σ𝑗𝑥𝑖 𝑗 , and 𝑥 𝑗is a strongly (Y, `)-approximately optimal bundle.

This definition corresponds to Definition 4.1 and its asso-ciated constraints.

Appendix B Buy and Sell Offers Togetherare PPAD-Complete

Integrating buy offers moves equilibrium computation to amuch harder complexity class.

An offer to buy a fixed amount of some asset would looklike the following:

Definition B.1. A Buy Offer (𝑆 , 𝐵, 𝑒 , 𝛼 , 𝑘) is a request tosell 𝑒 units of good 𝑆 for up to 𝑘 units of good 𝐵. This saleis subject to the condition that the actual amount sold tothe market is at least 𝛼 times the amount bought from themarket. (i.e., the price of an 𝑆 is at least 𝛼𝐵).

Tâtonnement requires that the market instance satisfya property known as “Weak Gross Substitutability (WGS)”

15

An instance of an Arrow-Debreu exchange market satisfiesWGS if an increase to the valuation of some good 𝑖 doesnot cause a decrease in the aggregate demand for good 𝑗 .Intuitively, if the valuation of one good rises, agents canpurchase less of this good from the market, and might chooseto buy something else instead.Exchange market instances based on collections of sell

offers satisfy WGS, but instances based on buy and sell offersdo not.

Example B.2. Suppose one user submits the buy offer (EUR,USD, 100, 1.10, 60), and this is the only buy offer on theexchange. This user wishes to buy up to 60 Dollars, so longas the price of the Euro is at least $1.10.

If the price of the Euro is $1.20, then this offer will purchase60 Dollars from the market and sell 50 Euros to the market.The aggregate demand, therefore, is (60 𝑈𝑆𝐷,−50 𝐸𝑈𝑅).

If the valuation of the Dollar rises, so that the price of aEuro is only 1.10, then the offer will still purchase 60 Dollars,but now needs will need to spend 5̃4 Euros to do so. Theaggregate demand, therefore, changes to (60𝑈𝑆𝐷,−54 𝐸𝑈𝑅).

In other words, a rise in the valuation of the Dollar causeda decrease in the aggregate demand for Euros.

Theorem B.3. Computing approximate equilibrium pricesin an Arrow-Debreu market generated by Sell Offers and BuyOffers is PPAD-complete.

Proof. The theorem is a restatement of Corollary 2.4 of [25].Corollary 2.4 follows from Theorem 7 of [25], and re-

quires agents with general linear utility functions, not justthe sparse ones generated by Sell Offers. However, the con-struction in Theorem 7 is quite general; when applied tothe context of agents with utilities generated by Buy or SellOffers, this requirement is not necessary.

In fact, the construction in this context uses only utilitiesthat could be generated by Sell or Buy Offers.

One technical detail is that some agents have utility onlyfor one good, and might have multiple types of goods inits endowment. An agent with linear utility and multipletypes of goods in its endowment can be replaced by multipleagents with the same utility, each one with only one typeof good, to the same overall effect on the market. An agentwith utility for only one type of good is akin to a Sell offerwith minimum price 0. □

Appendix C Rounding via LinearProgramming

Suppose our exchange currenty trades 𝑁 distinct assets. Re-call that the output of Tâtonnement run with approximationparameters (`, Y) is a set of prices 𝑝𝑖 for 𝑖 ∈ [𝑁 ].For any orderbook (say, selling asset 𝐴 in exchange for

asset 𝐵), let the constant 𝐿𝐴𝐵 be the minimum amount of 𝐴that must be sold at the given prices in order to satisfy the`-approximation guarantee, and let the constant 𝐻𝐴𝐵 be the

maximum amount of 𝐴 that can be sold at the given prices(i.e. the total amount of 𝐴 available for sale by offers withminimum prices below 𝑝𝐴/𝑝𝐵). Let the variable 𝑦𝐴𝐵 denotethe amount of 𝐴 sold for 𝐵. To satisfy our `-approximationguarantee, the linear program must maintain 𝐿𝐴𝐵 ≤ 𝑦𝐴𝐵 ≤𝐻𝐴𝐵 .To maximize the amount of “value” that trades hands,

our program maximizes Σ𝑖, 𝑗 ∈𝑁 :𝑖 ̸=𝑗𝑦𝑖 𝑗𝑝𝑖 . Finally, the linearprogram needs a constraint to make sure no units of anyasset are inadvertently created.Maximizing this objective means that no offers are left

crossing after executing all offers in a block according to theresults of this linear program - any crossing orders couldexecute to increase this objective.

The full linear program is as follows:

max Σ𝑎,𝑏∈[𝑁 ],𝑎 ̸=𝑏𝑝𝑎𝑦𝑎𝑏 (1)𝑠 .𝑡 . 𝐿𝑎𝑏 (𝑝) ≤ 𝑦𝑎𝑏 ≤ 𝐻𝑎𝑏 (𝑝) ∀(𝑎, 𝑏) ∈ [𝑁 ] × [𝑁 ], (𝑎 ̸= 𝑏)

(2)𝑝𝑎Σ𝑏∈[𝑁 ]𝑦𝑎𝑏 ≥ (1 − Y)Σ𝑏∈[𝑁 ]𝑝𝑏𝑦𝑏𝑎 ∀𝑎 ∈ [𝑁 ] (3)

(4)

In cases where Tâtonnement does not converge within atime limit, we modify this linear program by dropping thelower bounds on the 𝑦𝑎𝑏 variables.

Appendix D MultipleTâtonnement Instances

Our implementation of Tâtonnement chooses a step size viaan empirically chose heuristic. As such, the step direction inTâtonnement, given by the result of an aggregate demandcomputation, is not guaranteed to be a descent direction ofthe heuristic. In this case, the heuristic value will alwaysincrease, no matter how small a step size is chosen. Code-notti et. al [30] show that there is a step size that alwaysbrings Tâtonnement closer to an equilibrium. To bypass thiscase, we declare a minimum step size and ensure that if theline search would drop the step size below this minimumthreshold, Tâtonnement takes a step anyway and progresses.Choosing a minimum size that is too small can mean Tâ-

tonnement needsmore rounds to finish its computation, but aminimum size that is too large can result in Tâtonnement os-cillating around an equilibrium without terminating. Ratherthan encode a fixed minimum size, we run several parallelinstances of Tâtonnement, each with a different minimumstep size.

Our implementation runs 6 copies of Tâtonnement, threeusing volume normalization and three without. Within aset of three copies, we vary the minimum step size by afactor of 211. These numbers are chosen arbitrarily, and areal-world deployment might choose a different parameterregime. These parameters are chosen to balance inclusion of

16

a wide variety of parameter regimes (for robustness) againstCPU cache performance (for speed).

Appendix E ExtensionsWe chose not to implement buy offers because they movethe price computation problem to a (conjectured to be) in-tractable complexity class (Appendix B). It is still possiblethat Tâtonnement could handle a small number of buy of-fers. A logarithmic demand oracle with buy offers can beimplemented analogously to our demand oracle for sell of-fers. Alternatively, one could compute prices only using selloffers, and then offer the computed prices to the buy offersas well in the linear programming step.Tâtonnement requires the aggregate demand of the mar-

ket to satisy a property called “Weak Gross Substitutability(WGS)” (Appendix B). Some natural types of offers (beyondsell offers) and automated agents satisfy this property. Oneoffer might sell multiple types of goods at once, or might liketo buy any one of multiple digital assets that are backed bythe same real-world currency. Note also that an agent basedon Uniswap’s [14] constant-product rule (i.e. an agent thattrades to maximize the product of its held asset amounts)satisfies WGS. It may be difficult to evaluate the actions ofmany complicated agents efficiently within the inner loopof Tâtonnement, but an exchange could implement a smallnumber of automated market-makers to provide a baselinelevel of market liquidity.

Appendix F Continuous Approximation ofAggregate Demand

Recall that every round of Tâtonnement computes the ag-gregate demand (Definition 4.4) of the set of open offers, inresponse to the current candidate set of prices. The goal ofTâtonnement is to find some price 𝑝 such that the amountof every good supplied to the market exceeds the amountbought from the market. But when the aggregate demand𝑍 (·) is not a continuous function, such a point may not exist.

Example F.1. Suppose there are two assets 𝐴 and 𝐵. Sup-pose also that one offer wishes to sell 100 units of 𝐴 in ex-change for 𝐵 if 𝑝𝐴/𝑝𝐵 ≥ 1, and another offer wishes to sell 1unit of 𝐵 in exchange for 𝐴 if 𝑝𝐵/𝑝𝐴 ≥ 1.

Then no matter the value of 𝑝𝐴/𝑝𝐵 , one of the goods willbe overdemanded. Clearly this holds if 𝑝𝐴/𝑝𝐵 ̸= 1. At 𝑝𝐴 = 𝑝𝐵 ,both offers execute, but 𝐵 is still overdemanded by 99 units.

One technicality is that a specification of an exact marketequilibrium needs to to specify the behavior of offers whoseminimum prices are equal to the market prices. For example,in the above situation, setting 𝑝𝐴 = 𝑝𝐵 and that both offerssell only one unit (and not necessarily their entire stock) oftheir respective goods in exchange for one unit of the otherwould constitute an equilibrium.

The aggregate demand function is essentially a summa-tion of the behavior of every offer. To make the 𝑍 (·) into acontinuous function, we approximate the behavior of eachindividual offer by a continuous function.

Definition F.2 (Linearly-Interpolated `-Approximate OfferSemantics). Suppose an offer demands a minimum priceratio of 𝛼 , and the equilibrium prices give a price ratio of 𝛽 .As in the exact case, if𝛼 > 𝛽 , the offer does not execute, and if𝛼 < 𝛽(1− `), the offer does execute in full. In the interveninggap, offers linearly interpolate—i.e. a (𝛽 − 𝛼)/(`𝛽)-fractionof the offer executes.

Applying this approximation gives a continuous function𝑍` (𝑝). Furthermore, any point 𝑝 such that 𝑍` (𝑝)𝑖 ≤ 0 for all𝑖 constitutes a (Y = 0, `)-approximate equilibrium. AllowingY > 0 is akin to allowing 𝑍` (𝑝)𝑖 to be merely close to 0,instead of requiring that it be strictly below 0 for all 𝑖 .As mentioned in §4.3, the aggregate demand of a set of

offers that are all selling asset 𝐴 in exchange for asset 𝐵 canbe computed via a binary search, if the offers are sorted bytheir minimum exchange rates.Specifically, for a collection of offers 𝑋 , say that an offer

𝑥 ∈ 𝑋 sells 𝑒𝑥 units of 𝐴 and demands a minimum price of𝑟𝑥 , and let 𝑟 = 𝑝𝐴/𝑝𝐵 be a queried exchange rate. A binarysearch computes 𝐸1 = Σ𝑥 ∈𝑋 :𝑟𝑥 ≤𝑟𝑒𝑥 .To compute the aggregate demand approximation, we

need to also compute the quantities 𝐸2 = Σ𝑥 ∈𝑋 :𝑟𝑥 ≤𝑟 (1−`)𝑒𝑥 ,𝐹1 = Σ𝑥 ∈𝑋 :𝑟𝑥 ≤𝑟𝑒𝑥𝑟𝑥 , and 𝐹2 = Σ𝑥 ∈𝑋 :𝑟𝑥 ≤𝑟 (1−`)𝑒𝑥𝑟𝑥 . With someadditional preprocessing, 𝐹𝑖 can be computed with the samebinary search as 𝐸𝑖 for 𝑖 = 1, 2.The aggregate demand function reduces to:

Σ𝑥 ∈𝑋 :𝑟𝑥<𝑟 (1−`)𝑒𝑥 + Σ𝑥 ∈𝑋 :𝑟 (1−`)≤𝑟𝑥 ≤𝑟𝑒𝑥 (𝑟 − 𝑟𝑥 )

𝑟 `

= 𝐸2 +𝐸1 − 𝐸2

`− 𝐹1 − 𝐹2

𝑟 `

Appendix G Synthetic Data ModelThe transactions used in our experiments are (unless other-wise discussed) drawn from the following data model. De-fault parameters, used in our experiments unless otherwisenoted, are noted in parentheses.

Transactions are generated in batches. Every transactionhas some chance of either creating an offer( 90%), sendinga payment ( 10%), or creating an account ( .1%) (and givingthat account some funds). Newly created offers have somechance (1%) of being cancelled at some point in the next fewbatches (between 5 and 50).

All create offer operations are drawn relative to the sameset of underlying asset valuations. Supposing that these valu-ations were computed as a market equilibrium, some fraction(90%) are generated to accept these valuations. These “good”offers are generated in cycles; that is, the generator chooses

17

10, 10 15, 10 15, 15 20, 15Some Large Offers, 500000 Offers 0.005 0.006 0.006 0.037Some Large Offers, 50000 Offers 0.008 0.009 0.027 0.038Some Large Offers, 5000 Offers 0.006 0.007 0.048 0.106Some Large Offers, 500 Offers 0.060 0.066 1.826 1.877Outlier Prices, 500000 Offers 0.008 0.012 0.017 0.036Outlier Prices, 50000 Offers 0.012 0.014 0.030 0.053Outlier Prices, 5000 Offers 0.015 0.020 * *Outlier Prices, 500 Offers 0.033 0.037 * *10% Good Offers, 500000 Offers 0.005 0.006 0.006 0.03910% Good Offers, 50000 Offers 0.008 0.009 0.009 0.03710% Good Offers, 5000 Offers 0.008 0.008 0.035 0.05310% Good Offers, 500 Offers 0.024 0.034 0.820 1.277None Near Market Prices, 500000 Offers 0.002 0.002 0.001 0.002None Near Market Prices, 50000 Offers 0.004 0.006 0.020 0.028None Near Market Prices, 5000 Offers 0.006 0.009 0.105 0.188None Near Market Prices, 500 Offers 0.024 0.066 1.708 2.487Close to Market Prices, 500000 Offers 0.008 0.046 0.046 0.044Close to Market Prices, 50000 Offers 0.011 0.044 0.042 0.040Close to Market Prices, 5000 Offers 0.013 0.039 0.035 0.031Close to Market Prices, 500 Offers 0.030 0.037 0.078 0.073Dispersed from Market Prices, 500000 Offers 0.002 0.003 0.003 0.005Dispersed from Market Prices, 50000 Offers 0.005 0.006 0.009 0.042Dispersed from Market Prices, 5000 Offers 0.005 0.009 0.099 0.279Dispersed from Market Prices, 500 Offers 0.022 0.064 * *Uneved Trade Volumes, 500000 Offers 0.180 0.180 0.133 0.145Uneved Trade Volumes, 50000 Offers 0.169 0.170 0.129 0.152Uneved Trade Volumes, 5000 Offers 0.257 0.280 0.976 1.045Uneved Trade Volumes, 500 Offers * * * *

Figure 6. Runtimes of Tâtonnement with varying datasets,averaged over 5 runs. (*) denotes that not all runs termi-nated within a 5 second timeout. Column headers denoteapproximation parameters, as (− log2(Y),− log2(`)).

a random set of assets (of size between 2 and 7) and a tradeamount, then constructs offers that send this trade amountalong the cycle. “Bad” offers are generated individually, andtrade between random assets.

The minimum exchange rate on “good” offers is set to beslightly lower than the underlying exchange rate (between0% and 2% lower) and the minimum rate on “bad” offersis set to be slightly higher than the underlying exchangerate(between 0% and 2% higher).

Offer trade amounts are randomly drawn (between 1,000and 1,000,000) and normalzed (i.e. divided) by asset price.

In the first batch, the underlying prices are random values(between 1 and 1,000). After each batch of transactions, theprices are updated according to a brownian drift formula(𝑝 → 𝑝𝑒N(0,𝜎)) (𝜎 = 0.05).

Source accounts behind transactions are drawn from anexponential distribution (with parameter 10−6).

Transactions sequence numbers are set sequentially.Transactions within a batch are shuffled. Some transac-

tions (10%) are shuffled between adjacent batches.

Appendix H Data Generation RobustnessChecks

In this section, we vary some of the parameters of our datagenerationmodel to study howTâtonnement’s runtime varieswith the problem input distribution. Runtimes are averagedover 5 trials (where each trial has a different dataset, drawnfrom the same distribution).

Figure 6 shows Tâtonnement runtimes after varying sev-eral different parameters within our synthetic data model.In particular:

Some Large Offers 10% of the trade offers are 10x the size(on average) of the rest.

Outlier Prices One asset has valuation set to 10,000, anotherhas valuation set to 0.1. The rest are distributed be-tween 1 and 1,000.

10% Good Offers Only 10% of the offers accept the marketprices (default is 90%)

None Near Market Prices No offers are generatedwithin 10%of the market prices.

Close To Market Prices All offers haveminimumpriceswithin1% of market prices.

Dispersed From Market Prices Offers have minimum pricesin an especially large range around the market prices.

Uneven Trade Volumes Asset volume is heavily skewed to-wards some assets (asset volume normalization is notactivated).

The first point of note is that the only parameter changewith a significant, consistent effect on Tâtonnement runtimeis the skew in the asset volume.

As discussed, Tâtonnement does sometimes time out. Thistends to happen when there are few offers to trade and withhigh accuracy requirements.

Tâtonnement also seems to perform slightly better whenthere are at least some offers near to the market clearingprices. Offers near the clearing prices can give the linearprogram more flexibility. This explains the slowdown when` is very small (the rightmost column) in the “None NearMarket Prices” and ”Dispersed FromMarket Prices” datasets.

Appendix I Comparison with ConvexSolver

As a comparison point for Tâtonnement, we also imple-mented the convex program of [32] using the CVXPY li-brary [34] backed by the ECOS convex solver [36]. Tâton-nement was run with Y = 2−15 and ` = 2−10.The exact runtimes are not directly comparable; the two

systems measure error in different ways, and the convexsolver is a general tool, while Tâtonnement is specificallydesigned for one application. The takeaway from Figure 7are the scalability trends. The size of the problem given tothe convex solver scales linearly with the number of open of-fers, and unsurprisingly, so does the runtime. An asymptoticimprovement to the runtime of the theoretical price compu-tation algorithms, as we implemented for Tâtonnement withthe logarithmic demand oracle (§4.3.1), is crucial for con-structing an algorithm that can run quickly on large probleminstances.

That said, it is possible that when the number of transac-tions is very small (i.e. only a few hundred), a pricing algo-rithm based on solving a convex program could outperformTâtonnement.

18

Figure 7. Runtimes of a convex solver compared againstruntimes for Tâtonnement. The convex program on 20 and50 asset instances performed very similarly as when run on10 asset instances. These instances are not included on thegraph for visual clarity.

Interestingly, using a similar technique (sorting offers byminimum price), the convex program of [32] can be simpli-fied to a program whose size does not scale with the numberof open offers, and whose objective can be computed intime logarithmic in the number of open offers. The catch isthat this transformation (done naively) makes the objectivenondifferentiable, and nondifferentiable objectives can bevery difficult for many convex optimization algorithms. Alogarithmic-size transformation for this program that pre-serves the differentiability (or twice-differentiability) of theobjective is an interesting direction for future work.

19

SPEEDEX: A Scalable, Parallelizable, and Economically ...

Documents

Transcript of SPEEDEX: A Scalable, Parallelizable, and Economically ...