SPEEDEX: A Scalable, Parallelizable, and Economically ...
Transcript of SPEEDEX: A Scalable, Parallelizable, and Economically ...
SPEEDEX: A Scalable, Parallelizable, andEconomically Efficient Distributed EXchangeGeoffrey RamseyerStanford University
Stanford, California, [email protected]
Ashish GoelStanford University
Stanford, California, [email protected]
David MaziรจresStanford University
Stanford, California, USA
AbstractSPEEDEX is a decentralized exchange (DEX) letting partici-pants securely trade assets without giving any single partyundue control over the market. SPEEDEX offers several ad-vantages over prior DEXes. It achieves high throughputโover 100,000 transactions per second on 32-core servers,even with 70M open offers. It eliminates internal arbitrageopportunities, so that a direct trade from asset๐ด to ๐ต alwaysreceives as good a price as trading through some third assetsuch as USD. Finally, it prevents frontrunning attacks thatwould otherwise increase the effective bid-ask spread forsmall traders. SPEEDEXโs key design insight is to use anArrow-Debreu exchange market structure that fixes the val-uation of assets for all trades in a given block of transactions.Not only does this market structure provide fairness acrosstrades, it makes trade operations commutative and henceefficiently parallelizable.
1 IntroductionDigital currencies are moving closer to mainstream adoption.Examples include central bank digital currencies (CBDCs)such as Chinaโs DC/EP, commercial efforts such as Diem,and numerous decentralized-blockchain-based stablecoinssuch as Tether, Dai, and USDC. Available digital currenciesdiffer wildly in terms of privacy, openness, smart contractsupport, performance, regulatory risk, solvency guarantees,compliance features, retail vs. wholesale suitability, and cen-tralization of the underlying ledger. Because of these dif-ferences, and because financial stability demands differentmonetary policy in different countries, we cannot hope fora one-size-fits-all global digital currency. Instead, to realizethe full potential of digital currencies (and digital assets ingeneral), we need a trading platform where many digitalcurrencies can efficiently coexist.Trading currencies requires an asset exchange to match
and execute offers from agreeable counterparties. The idealdigital currency exchange would avoid giving any centralauthority undue power over the global flow of currency. Theideal exchange would also be transparent, auditable, andresistant to servers profitably front-running small tradersby siphoning off the bid-ask spread. Finally, the ideal ex-change would present no arbitrage opportunities: when sell-ing currency๐ด to buy currency ๐ต, there would be no need to
create more complex trades through other currencies (e.g.,๐ดโ ๐ถ โ ๐ต) to get a better price.
In a digital asset context, the gold standard for avoidingcentralized control is to operate a decentralized exchange, orDEX: a transparent exchange implemented as a determinis-tic replicated state machine maintained by many differentparties. To prevent theft, a DEX requires all transactionsto be digitally signed by the relevant asset holders. To pre-vent cheating, replicas organize history into an append-onlyblockchain. DEX replicas agree on blockchain state througha Byzantine-fault tolerant consensus protocol, typically somevariant of asynchronous Byzantine agreement [24] for pri-vate blockchains or synchronous mining [47] for public ones.
Unfortunately, conventional wisdom holds that DEXescannot scale to transaction volumes beyond a few thousandtransactions per second. This wisdom has led to many al-ternative blockchain scaling techniques, such as off-chaintrade matching [53], automated market-makers [14], andtransaction rollup systems [9]. However, these approacheseither trust a third party to ensure that orders are matchedwith the best available price, or sacrifice the ability to settraditional limit orders that only sell at or above a certainprice.The challenge with on-chain DEXes is that the precise
quantities spent and gained by each participant depend onhow offers are matched. One pair of trade offers might ex-change 1 EUR for 1.20 USD, while another swaps 1 USD for1.21 EUR. Realistic markets will offer many distinct waysto match buyers and sellers with slightly different results.The multitude of matchings makes it impractical to elimi-nate arbitrage, presents opportunities for DEX servers togain advantage by re-ordering transactions, and precludesparallelization in a deterministic replicated state machine.In particular, a DEX cannot naively employ optimistic con-currency control, as doing so would make order matchingdependent on non-deterministic inter-thread scheduling.This paper disproves the conventional wisdom about on-
chain DEX performance. We present SPEEDEX (Scalable,Parallelizable, and Economicially Efficient Distributed EX-change), a fully on-chain DEX capable of processing over100,000 transactions per second on a 32-core machine (Fig-ure 5) and designed to scale further on more powerful hard-ware. Like most DEXes, SPEEDEX processes transactions
1
in blocksโe.g., a block of 500,000 transactions every 5 sec-onds. Its fundamental principle is that all transactions in ablock commute: the blockโs result is identical regardless ofthe order in which trades are executed, which allows effi-cient parallelization [27]. Our implementation of SPEEDEXis available at https://github.com/scslab/speedex.To make transactions commutative, SPEEDEX is struc-
tured as an Arrow-Debreu Exchange Market [18]: for a givenblock, each asset ๐ด has a unique valuation ๐๐ด (denominatedin an arbitrary unit). A block proposed to the consensus al-gorithm contains not only a set of transactions, but also avaluation for each asset. In-the-money offers are executedat the prices implied by these valuations (i.e., 1 unit of ๐ดtrades for 2 units of ๐ต if the valuation of๐ด is twice that of ๐ต).This structure eliminates arbitrage, eliminates the need fora reserve currency, and enables us to adjust balances withhardware-level atomics, rather than locking.Implementing SPEEDEX introduces both theoretical, al-
gorithmic challenges and systems design challenges. Theprimary algorithmic challenge is efficient price computa-tion. The agents in our Arrow-Debreu markets have linearutilities, and there are many existing theoretical algorithmsfor computing equilbria in linear exchange markets. Theseinclude convex programs such as [32], combinatorial algo-rithms such as [37, 39, 42], and iterative methods such as aโTรขtonnementโ process [30]. However, the performance ofall of these algorithms, directly applied, scales poorly, bothempirically and asymptotically, as the number of open of-fers to trade increases. We show that the theoretical marketinstances derived from our exchange have additional struc-ture that we can use to make them efficiently employablein practice. We explicitly correct approximation errors witha 2nd-stage linear program. To implement this exchange,we design a set of transaction semantics that enable naturalinteraction between users and an exchange while allowingconcurrent transaction processing, and while keeping dataorganized to permit the efficient answering of queries aboutthe state of the system from the price computation algorithm.
2 Related Work2.1 (Distributed) ExchangesSome blockchain platforms, like Stellar [10], provide a nativemechanism for posting and clearing trade offers. Automatedmarket makers, like Uniswap [14] or Bancor [41], are smartcontracts that trade with users directly, instead of actingas a matching intermediary. These contracts compute anexchange rate based on their currency reserve contents andoffer this rate to any trader [15]. The offered price typicallychanges after every trade.
0x [53] provides an interface for performing atomic assetswaps between untrusted parties, which can be used as abuilding block for exchanges that match offers off-chain.Loopring [12] gives a similar interface, but allows offers to
be matched in cycles, instead of just in pairs. StarkEx [9,20] provides a set of cryptographic primitives to enable acentralized exchange to match trades off-chain and provethat these operations were conducted correctly.To combat front-running, Clockwork [28] uses timelock
puzzles to allow an exchange to commit to processing an offerbefore it can see the offerโs contents. Zhang et. al. [57] andKelkar et. al. [44] combat front-running by limiting the powerof a byzantine replica to choose the ordering of transactionswithin a block.
Binance operates a distributed exchange [2] that computesper-asset-pair exchange rates. Offers created in that blocktrade at that ratio, while pre-existing offers only trade attheir limit prices. The exchange currently handles 10-30operations per second [3].The Gnosis exchange [6] also clears offers in batches
by solving an optimization problem. Gnosis uses a mixed-integer programming problem and lets solvers compete toproduce good solutions. A forthcoming deployment plans tohandle batches of 100 offers [13].
Budish, Cramton, and Shim [23] argue that outside of theblockchain context, markets should process orders in batchesto combat automated arbitrage.
2.2 Price ComputationThis work is primarily concerned with the special case of theArrow-Debreu exchange market [18] where every agent hasa linear utility function. Equilibria can be computed approxi-mately in these markets using combinatorial algorithms suchas those of Jain et. al [43] and Devanur et. al [33] and ex-actly via the ellipsoid method and simultaneous diophantineapproximation [42]. Duan et. al [37] constructs an exact com-binatorial algorithm, which Garg et. al. [39] extends to con-struct a combinatorial algorithm with strongly-polynomialrunning time.
Ye [54] describes a path-following interior point method,and Devanur et. al. [32] identify a straightforward convexprogram for computing equilibria.Others study natural iterative process, known as Tรขton-
nement [17]. In simple terms, if demand for a good exceedssupply of the good, then the price of this good rises, andvice versa. Codenotti et. al. [29, 30] show that this processconverges to an approximate equilibrium in polynomial time.
2.3 Blockchain Scaling and ConcurrencyThe Lightning network [49] moves transactions off of theblockchain into a network of bilateral channels, where eachchannel provides a secure method for two parties to trans-act at high rate. This notion can be extended to โLayer-2channelsโ with arbitrary state and arbitrary state updates.Systems such as Plasma [48] extend the Layer-2 channel
model. Users lock funds within a root contract on a mainblockchain, then send transactions to Plasma aggregators.There are many variants on this model, such as [8, 11], each
2
with different capabilities, performance, and security prop-erties. System security requires fraud-proof mechanisms foridentifying malicious channel operators and (in the case ofZK Rollups) some requirements that users remain online.Ethereumโs 2.0 launch is planned to include blockchain
sharding [5], which would split the blockchain into semi-independently running chains. If different shards can processtransactions in parallel, then the system as a whole can pro-cess more transactions per second.
Saraph and Herlihy [50] argue that historically, optimisticconcurrency control could have made the Ethereum VirtualMachine 2 to 8 times faster. They find that a few contracts,such as token contracts, are responsible for a large fractionof concurrency conflicts.
Dickerson et. al. [35] allow concurrent transaction process-ing by using software transactional memory and by includ-ing in a new block enough information to deterministicallyresolve concurrency conflicts. An implementation using 3CPU cores for transaction processing gave a roughly 1.5x-2.5x speedup.Anjana et. al. [16] add a directed graph to each block
denoting transaction-ordering dependencies. Yu et. al. [56]analyze transactions before execution to identify conflictingtransaction pairs. Bartoletti et. al. [19] develop a frameworkfor static analysis of smart contracts, with a focus on staticidentification of transaction pairs that commute.Our approach is inspired by that of Clements et. al. [27],
who improve performance in the Linux kernel by designingcommutative syscall semantics.
3 System ArchitectureSPEEDEX is an asset exchange implemented as a replicatedstate machine in a blockchain architecture. Assets are issuedand traded by accounts. Accounts have public signature keysauthorized to spend their assets. Signed transactions are mul-ticast among block producers. At each round, one ormore pro-ducers propose candidate blocks extending the blockchainhistory. A set of validator nodes validates and selects oneof the blocks through a consensus mechanism. SPEEDEXis suitable for integration into a variety of blockchains, butbenefits from a consensus layer with relatively low latency(on the order of seconds), such as BAโ [40], SCP [45], orHotStuff [55].
Most central banks and digital currency issuers maintaina ledger tracking their currency holdings. SPEEDEX is notintended to replace these primary ledgers. Rather, we expectbanks and other regulated financial institutions to issue 1:1backed token deposits onto a SPEEDEX exchange and pro-vide interfaces for moving money on and off the exchange.
SPEEDEX supports four operations: account creation, of-fer creation, offer cancellation, and direct payment. Theirsemantics ensure that all operations in the same block com-mute, at the cost of a few restrictions on block contentsโfor
instance, one cannot send payments from an account in thesame block that the account is created. Specifically, at mostone transaction per block may alter an accountโs metadata(such as the accountโs public key or existence), and meta-data changes take effect only at the end of block execution.As payments and trading are the common case, we do notconsider this restriction a serious limitation.
To make trades commutative, SPEEDEX computes a fixedโvaluationโ ๐๐ด for every asset ๐ด traded in a block. The unitsof ๐๐ด are meaningless, and can be thought of as a fictionalvaluation asset that exists only for the duration of a singleblock. However, valuations imply exchange rates betweendifferent assetsโevery sale of asset ๐ด for asset ๐ต occurs ata price of ๐๐ด/๐๐ต . Since every exchange of a given asset pairoccurs at the same price, trading becomes commutative. Ineffect, users trade not with other offers, but with a fictionalโmarketโ entity using a fictional valuation asset. The mainalgorithmic challenge is computing valuations such that theexchange โclearsโโi.e., the amount of each asset sold to theโmarketโ equals the amount bought from the โmarket.โ
Offers on SPEEDEX are โSell Offers,โ analogous to tradi-tional limit orders. For example, the sell offer (EUR,USD,100, 1.20) proposes to sell 100 EUR at a price no lower than$1.20/EUR.
A set of valuations clears a collection of sell offers if everyoffer can independently respond optimally to said valua-tions, and the total amount of each asset sold to the โmarketโequals the amount bought from the โmarket.โ The optimalresponse of an offer (๐, ๐ต, ๐, ๐ผ) to a set of valuations is totrade if ๐๐/๐๐ต > ๐ผ (the trade price is greater than the offerโslimit price) and not to trade if ๐๐/๐๐ต < ๐ผ . The exchange maypartially execute offers for which ๐๐/๐๐ต = ๐ผ .Perhaps surprisingly, exact clearing prices provably al-
ways exist (see Appendix A). SPEEDEX approximates theseclearing prices in every block. Approximation error comesin two forms: first, like many exchanges, SPEEDEX chargesa small commission on every trade. Second, the amountof trade volume in one round is approximately optimal; atsome set of prices, there might be an imbalance betweenthe amount of an asset sold to the market and the amountbought from themarket. SPEEDEX corrects these imbalancesby possibly not executing some sell offers with in-the-moneylimit prices very close to the market exchange rate.Our implementation simply burns collected transaction
fees (effectively returning them to the issuer by reducingthe issuerโs liabilities). A real-world deployment could alter-natively compensate traders to provide liquidity (or othersocially beneficial behavior).
Internally, SPEEDEX views trade offers as agents in a vir-tual Arrow-Debreu exchange market [18]. SPEEDEX cantherefore use existing literature on polynomial-time equilib-rium computation in Arrow-Debreu exchange markets as astarting point for its price computation algorithm. Detailson the mapping from collections of sell offers to exchange
3
markets are in Appendix A. Interestingly, integrating offersto buy fixed amounts of an asset makes the equilibrium com-putation problem PPAD-hard, a complexity class conjecturedto be intractable (see Appendix B). However, if market mak-ers employ sell offers, we can set valuations based only onsell offers, but use those valuations to fill buy offers used forsimple cross-currency payments (Appendix E).
Beyond computational speed, SPEEDEX has several desir-able economic properties.
No front running. In real world financial markets, well-placed agents can spy on submitted offers, notice a newtransaction ๐ , and then submit a transaction ๐ โฒ that likelyexecutes before ๐ and buys some asset ๐ด only to re-sell itto ๐ at a slightly higher price. In a blockchain setting, ๐ โฒcan sometimes even be done as a single atomic action [31].However, since every transaction sees the same equilibriumprices in SPEEDEX, back-to-back buy and sell offers for asset๐ด would simply cancel each other out. Relatedly, becauseevery offer sees the same prices, a user that wishes to tradeimmediately can set a very low minimum price and be allbut guaranteed to have their trade executed, but still at thecurrent market price.
No (internal) arbitrage. An agent selling asset ๐ด in ex-change for asset ๐ต will see a price of ๐๐ด/๐๐ต . An agent trading๐ด for ๐ต via some intermediary asset ๐ถ will see exactly thesame price, since ๐๐ด/๐๐ถ ยท๐๐ถ/๐๐ต = ๐๐ด/๐๐ต . Hence, one can effi-ciently trade between assets without much pairwise liquiditywith no need to search for an optimal path. By contrast, manyinternational payments today go through USD because of alack of pairwise liquidity. Of course, there may be arbitrageopportunities between SPEEDEX and external markets. Butat least market makers on SPEEDEX can react to externalevents by adjusting limit prices, unlike automated marketmakers [14].
4 Commutative Exchange LogicTo build a new block of transactions (or validate an existingblock), SPEEDEX performs the following three actions:
1 Iterate over a pool of uncommitted transactions togather a new transaction block (or iterate over thetransactions in an existing block to validate said block).Nodes check transaction validitity, adjust account bal-ances, and collect new offers to trade.
2 Compute a market equilibrium (market-clearing assetprices).
3 Iterate over every trade offer, possibly executing thetrade offer based on the specification of the marketequilibrium.
We design transaction semantics so that (almost) everyoperation in the first step commutes, and thus the iterationcan effectively use many CPU cores in parallel.
Our exchange does not directly match offers with eachother. Instead, the exchange computes asset valuations, orprices, and every offer trades with the โmarketโ at the ex-change ratio implied by the computed valuations. For exam-ple, if the exchange computes that the valuation of one unitof asset๐ด is twice that of one unit of asset ๐ต, then every offerwould be offered the opportunity to trade some number ๐ฅunits of ๐ด in exchange for 2๐ฅ units of ๐ต. An offer (that isinterested in trading ๐ด for ๐ต) accepts this offer if the offerโsminimum exchange rate is less than the offered ratio.The main challenge for the exchange is in computing
prices such that the market clears; that is, for every asset,the amount sold to the โmarketโ is equal to the amountbought from the โmarket.โ In fact, our exchange computesclearing prices approximately, and so these clearing pricesare accompanied by some additional metadata to accountfor approximation error.
The last piece of work done in a block is an iteration overall offers that execute (i.e. that accept the trade offered bythe โmarketโ).Because offers all trade at one common set ofprices, and whether or not an offer executes depends only onthese prices (and some approximation metadata), the stepsof this iteration commute, and thus can be done in parallel.
The number of assets traded on SPEEDEX is much smallerthan the number of users. There might be, for example, afew dozen (๐ ) commonly traded assets but millions (๐) ofopen trade offers. Our price computation algorithm is notas effectively parallelizable as the rest of the computationinvolved in our exchange. Instead, we design our algorithmto have only a logarithmic dependence on the number ofopen offers to trade. The rest of this section discusses thisalgorithm.
4.1 Computation of Market EquilibriaThe core algorithmic challenge for SPEEDEX is the com-putation of market clearing prices. The open set of offersunder consideration are the new offers created in one blockof transactions, plus the unexecuted offers carried forwardfrom previous blocks. The output of our algorithm is โmarketequilibriumโ:
Definition 4.1 (Market Equilibrium). Let there be ๐ assetstraded by๐ offers.
A specification of a market equilibrium consists of:1. A valuation ๐๐ด โ R>0 for each asset ๐ด.2. For each pair of assets (๐ด, ๐ต), an amount ๐ฅ๐ด,๐ต โ Rโฅ0
of asset ๐ด sold by offers to directly purchase asset ๐ต.
Note that the units of these valuations are arbitrary; of-fers only interact with quotients of valuations. No reservecurrency is needed.
For a pair of assets (๐ด, ๐ต), our exchange sorts offers selling๐ด to buy ๐ต by their minimum prices; orders with higherminimum prices only are allowed to trade if all offers withlower minimum prices also trade. The quantities ๐ฅ๐ด,๐ต thus
4
specify which offers get to trade. Note that the size of thisoutput has no dependence on๐ .
An equilibrium must satisfy certain constraints.
Asset Conservation. For each asset ๐ด, the amount of ๐ดsold to the market should not be less than the amount boughtfrom the market. In other words, ฮฃ๐ต๐ฅ๐ด,๐ต โฅ ฮฃ๐ต๐ฅ๐ต,๐ด๐๐ต/๐๐ด.
Respect Offer Parameters. No offer is required to tradeat rates below its minimum rate threshold.
Optimal Response to Prices. Every offer selling ๐ด in ex-change for ๐ต with a minimum price strictly below ๐๐ด/๐๐ตfully executes. Furthermore, all trades that can happen atprices ๐ must happen.The last contraint ensures that our exchange actually
moves goods between different traders. Without it, we couldtrivially construct an โequilibriumโ with any set of prices bysimply setting ๐ฅ๐ด,๐ต = 0 for all (๐ด, ๐ต).One technical point is that offers with a minimum price
exactly equal to the market price might only partially exe-cute.In practice, we compute equilibria approximately. There
are two natural ways to approximate an market equilibrium.The first is to charge a small transaction commission, which,throughout this work, we denote as Y.
Definition 4.2 (Transaction Commission (Y)). An offer sell-ing๐ฅ units of asset๐ด in exchange for asset๐ต receives๐ฅ (๐๐ด/๐๐ต)(1โY) units of ๐ต.
Wewill also allow the equilibrium to be โapproximately op-timal.โ We might want a constraint of the form โthe amountof traded value is close to the maximum amount possibletrading at an exact equilibriumโ. Working with such a con-straint would require knowledge of an exact equilibrium.Instead, we say that the behavior of individual offers is
approximately optimal. Intuitively, offers whose minimumprices are very close to the market prices are approximatelyindifferent to trading or not trading. We let the price com-putation algorithm specify whether or not these offers trade(which gives the algorithm some flexibility to ensure that nogoods are overdemanded). Throughout this work, we denotethis approximation metric by `.
Definition 4.3 (Approximately Optimal Offer Response (`)).If an offerโs minimum price ๐ผ is above the equilibrium ex-change rate ๐ฝ , then the offer cannot execute. However, givena parameter `, we say that if ๐ผ < (1โ `)๐ฝ , it must trade fully,but if (1 โ `)๐ฝ โค ๐ผ โค ๐ฝ , then the equilibrium is allowed tospecify that the offer executes fully, in part, or not at all.
4.2 Algorithm OutlineComputation of market equilibria is a well-studied problemin the theoretical literature; however, all of these theoreticalalgorithms, directly applied, would be far too slow for prac-tical usage. The reason is that these algorithms assume that
the number of open trade offers (๐) in the market is equal tothe number of goods being traded (๐ ). This is without lossof generality in a mathematical context.However, in a real-world asset exchange, the number of
assets traded is vastly smaller than the number of open tradeoffers. Our exchange supports trading up to few dozen dif-ferent assets (the experiments of ยง7 trade ๐ = 20 assets), butsupports millions of open trade offers (the experiments ofยง7 test๐ up to roughly 70 million). Our price computationalgorithm can tolerate a reasonable polynomial dependenceon ๐ , but any nontrivial dependence on ๐ will be far tooslow. Our algorithm has only a logarithmic dependence on๐ .Appendix I compares Tรขtonnement with a convex programof [32] of size linear in๐ .
The Tรขtonnement algorithm of Codenotti et. al. [29] is thestarting point for our implementation. Tรขtonnement is aniterative algorithm; starting at some set of candidate prices,at every timestep, it computes the Aggregate Demand of themarket at that set of prices, and adjusts prices accordingly(i.e. the price of a good rises if demand exceeds supply).
Definition 4.4 (Aggregate Demand). The Aggregate De-mand of a set of open offers at prices ๐ is a vector ๐ (๐) โ R๐ .If every offer responds optimally to the market prices ๐ , thenthe ๐th coordinate ๐ (๐)๐ is the amount of good ๐ bought fromthe market minus the amount of good ๐ sold to the market.
For some good ๐ , if๐ (๐)๐ > 0, then ๐ cannot be an exact setof clearing prices, as more of good ๐ is demanded in paymentfrom the market than is sold to the market.
Tรขtonnement captures the common intuition that raisingthe price of a good increases the supply of said good and de-creases demand, and vice versa.Given these prices, we thenuse the linear program of Appendix C to compute the max-imum trade volume that can happen at these prices (whileensuring that the market clears).SPEEDEX runs Tรขtonnement until it produces a (Y, `)-
approximate equilibrium, or a time limit is reached. BecauseTรขtonnement iteratively improves a candidate set of prices,Tรขtonnement can be terminated at any point to get a candi-date set of prices. The output of a timed-out Tรขtonnement runis a set of prices that the algorithm has made as close to mar-ket clearing prices as it is able, and this set of prices is still a(Y, ` โฒ)-approximate equilibrium for some ` โฒ greater than thetarget parameter `.Therefore, in the rare cases when Tรขtonnement does not
directly compute an equilibrium, our exchange still usesthe output prices of Tรขtonnement to solve a linear program(the program of Appendix C, with small modifications) thatmaximizes the amount of value that changes hands.
4.3 Algorithmic Improvements4.3.1 Logarithmic Demand Oracle. Tรขtonnement asksmany aggregate demand queries. To compute aggregate de-mand, our algorithm need only work with offers that sell one
5
good in exchange for one other good, and thus can do muchbetter than a naive iteration over all open trade offers. Wegroup offers by the assets they buy and sell, then sort offersin each category by price. An offer with a lower minimumprice will always be satisfied by a set of prices if an offer witha higher minimum is satisfied. Hence, the offers satisfied bymarket prices can be identified by binary searches.The aggregate demand function, as stated in Definition
4.4, is not continuous. Tรขtonnement works best when theaggregate demand function is continuous and has few sharpderivatives; we approximate aggregate demand with a con-tinuous function with fewer sharp derivatives by allowingoffers to execute `-optimally (instead of exactly optimally).This approximation requires an extra binary search per assetpair. For details, see Appendix F.
4.3.2 Price Update Rule. Codenotti et. al [29] analyze aversion of Tรขtonnement where prices are adjusted additively.That is, for some step size ๐ฟ , [29] uses the update rule ๐๐ โ๐๐ + ๐ฟ๐ (๐). However, in our exchange, multiplying all pricesby a constant does not change the behavior of any trade offer.Updating prices multiplicatively, instead of additively, meansthat scaling all prices by a constant does not change the realeffect of a price update on the aggregate demand. This makesTรขtonnement steps more consistent, and can dramaticallyreduce the number of steps required when some assets arevalued much more highly than others.
Although assets in our exchange have a smallest indivisi-ble unit, we treat them as fully divisible goods when runningTรขtonnement. As such, redenominating the units of one assetdoes not change the structure of a problem instance. That isto say, selling one unit of an asset for price ๐ is equivalent toselling two halves of an asset, with each unit priced as ๐/2. Tomake our update rule invariant against redenomination, wemultiply asset amounts by prices. This gives a price updaterule of ๐๐ โ ๐๐ (1 +๐ฟ๐๐๐ (๐)). This last normalization brings asignificant performance improvement to Tรขtonnement whensome assets units have very small valuations.One remaining question is the choice of the step size ๐ฟ .
Codenotti et. al. in [30] show that there exists a polynomially-small ๐ฟ such that taking a ๐ฟ-sized step always brings Tรขton-nement closer to an equilibrium point. This ๐ฟ is too small tobe effectively usable in practice (and the proof applies onlyto their additive price update rule). Instead, we choose a stepsize via a backtracking line search [22], a standard techniquein the optimization literature. When minimizing some func-tion, this technique picks the (approximately) largest stepsize possible such that a step in some descent direction ofthat step size leads to a point with a lower objective value.
Our computation problem lacks a convex objective to min-imize. We use as our objective the ๐2 norm of the aggregatedemand vector, weighted by asset price (
(ฮฃ๐๐=1(๐๐๐ (๐)๐ )2)1/2).
There is not a theoretical motivation for this choice; rather,
we need a function that is smooth, easy to evaluate, nonneg-ative, and approaches 0 if and only if Tรขtonnement is nearequilibrium. Qualitatively, many other heuristics (such asthe ๐2 norm squared, or other ๐๐ norms) perform similarly.
Specifically, in each round, we attempt to use the step sizeof the previous round. If this step does not cause the objectiveto increase too much, Tรขtonnement takes a step of that sizeand increases the step size (line searches typically requirea nontrivial decrease in the objective. Tรขtonnement doesnot appear to benefit from stricter conditions). Otherwise,Tรขtonnement decreases step size and tries again.
4.3.3 ProblemNormalization. Many numerical optimiza-tion problems run most quickly when gradients are normal-ized (e.g. see [21]); Tรขtonnement is no exception. The mainnormalization factors we have not yet taken into account arerelative differences in asset trade amounts. That is to say, Tรข-tonnement requires the fewest rounds of computation whenthe traded amounts of each asset, weighted by their prices,are roughly the same. We can normalize trade amounts bymultiplying by a normalizing constant; that is to say, if weknew in advance that one assetโs equilibrium trade volumewas 100x lower than anotherโs, we could multiply the as-set amounts of the less-traded asset by 100 when runningTรขtonnement.
Of course, we cannot know the assetโs equilibrium tradevolume until we have computed an equilibrium. However,our exchange runs Tรขtonnement in every block. In the realworld, most of the time, the traded volume of any asset staysroughly constant minute to minute. As such, we can derivenormalization constants based on the equilibria recordedin blocks in the recent past. We cannot necessarily rely onthis normalization in the case of market shocks, so we runsome copies of Tรขtonnement using volume normalizationand some without. A real-world deployment could also in-corporate real-time price or volume data from other assetexchanges.Rather than choose one normalization strategy and step
size, our implementation runs several copies of Tรขtonnementwithvarying strategies in parallel. As discussed in ยง5.2, there isa tradeoff between the number of different copies and thespeed of each copy. For details, see Appendix D
4.4 Rounding Tรขtonnement via a Linear ProgramThe goal of Tรขtonnement is to find a set of prices such thatthe linear program of Appendix C has a feasible solution.Rather than wait for Tรขtonnement to reach what it knowsto be a feasible set of prices, we periodically check the linearprogram for feasibility directly. Solving a linear program ismore expensive than one round of Tรขtonnement computa-tion, so we only check the linear program once every 1000rounds.
Solving the linear program after running Tรขtonnement (asopposed to clearing offers via the raw output of Tรขtonnement ,
6
e.g. according to the `-approximately optimal semantics dis-cussed in Appendix F) has some additional useful economicproperties.
First, we ensure that a minimal number of offers sell onlya fraction of their assets. Although assets in our exchangeare fungible currencies, we store assets as integers, so assetsnecessarily have some smallest indivisible unit. When ex-changes explicitly match offers, they often round in favorof one party at the expense of the other. However, offers inour exchange trade with a central market, so to avoid inad-vertently creating new copies of an asset, we must roundagainst every offer in favor of the market. Offers that exe-cute partially could be rounded against multiple times, sowe would like to minimize the total number of offers thatexecute only in part.
Secondly, the linear program ensures no more trades couldhappen at the market prices after executing a block of trades.
Finally, by giving as output, for each asset pair (๐ด, ๐ต), onlyan amount of ๐ด traded in exchange for ๐ต, the exchange caneasily ensure that offers with the lowest minimum pricesare guaranteed to trade before offers with higher minimumprices.
5 System DesignWe implemented SPEEDEX (as evaluated in ยง6 and ยง7) usinga simple version of chain replication [51]; namely, one replicais declared the block producer, and the other replicas aredeclared block validators. Replicas are organized in a chain;when one replica accepts a block of transactions, it passes theblock on to the next replica in the line. We do not implementleader election. We use a simple consensus protocol becauseSPEEDEX does not rely on any specific consensus detail, andwe wish to measure precisely the performance our exchange,not a consensus protocol.
SPEEDEX is implemented using about 35000 lines of codewritten in C++20 (available open source at https://github.com/scslab/speedex), using Intelโs TBB library [7] for much ofthe parallel work coordination. We use the GNU Linear Pro-gramming Kit [46] for our linear program solver and LMDB[26] to manage data persistence and crash recovery.
5.1 Data OrganizationAccount balances are stored in a Merkle-Patricia trie. Butwhile batch trie updates and rehashes (given knowledge ofwhich entries are modified) can be done relatively quickly,individual random accesses can be slow. As such, we alsostore account balances in an in-memory vector, with updatespushed to the trie once per block.
For each pair of assets (๐ด, ๐ต), we build a trie storing offersselling asset ๐ด in exchange for ๐ต. Finally, in each block, webuild a trie logging which accounts were modified.
Storing information in hashable tries both ensures thatnodes can efficiently compare database state (and catch er-rors) and prove to users information about the database state.Careful trie design accelerates several expensive databaseoperations (see ยง5.2, ยง5.3).
5.2 TรขtonnementFractional values in Tรขtonnementโs inner loops are repre-sented as fixed-point values, rather than floating-point val-ues. This substantially accelerates the arithmetic of eachround, and avoids (nonassociative) accumulation of floating-point operation error.The running times of Section 6 do not include times to
sort or preprocess offers. Naively sorting large lists takes along time; a fast sorting method is critical.
Note that our exchange runs on a blockchain, and thus (forreasons exogenous to Tรขtonnement) it hashes its internalstate. We store offer minimum prices as fixed-point fractionalvalues. Therefore, we can use an offerโs price, written in big-endian, as the leading bits of the offerโs key within its trie.Constructing the trie thus automatically sorts offers by price.Additionally, the exchange executes offers with the lowestminimum prices; as such, the set of offers that execute inone round form a (small number of) dense subtrie(s). Postoffer execution, the remaining offers remain in a sorted order.Not re-sorting the entire list every block saves a substantialamount of time.
Sorting the set of newly created offers in a block can alsobe done largely in parallel. Each thread, for example, pri-vately accumulates tries recording newly created offers be-fore merging those tries into the main offer tries.
As mentioned in ยง4.3, there is a tradeoff between runningmany copies of Tรขtonnement with different settings and theperformance of each copy of Tรขtonnement. Each round ofTรขtonnement requires many binary searches; as such, theruntime of one round of Tรขtonnement largely depends onmemory latency and cache performance. Especially whenTรขtonnement takes small steps (i.e, towards the end of Tรขton-nement), one round reads almost exactly the same memorylocations as the previous round, and thus the cache missrate can be extremely low. More concurrent replicas of Tรข-tonnement mean more cache traffic and higher cache missrates.
Each of the binary searches in an aggregate demand queryruns independently, so parallelizing these searches acceler-ates each round of Tรขtonnement. One primary thread com-putes price updates, waking helper threads when computingaggregate demand. However, each round of Tรขtonnement isextremely fast even on one thread; for example, with 20 as-sets and tens of millions of offers to trade, one round requires100 to 200 microseconds of computation. Using the kernelto put helper threads to sleep when not in use introducessignificant synchronization overhead and can harm cacheperformance if helper threads migrate between cores. We
7
instead synchronize threads via spinlocks and a few memoryfences. In the 20 asset tests of ยง6, we see minimal benefitbeyond 4-6 helper threads, but this is enough to bring theaverage runtime per round down to 30 to 50 microseconds.Finally, as mentioned in ยง7, instead of directly using the
offer tries in aggregate demand queries, we precompute (byone pass over the offer trie) a list recording, for each uniqueprice, the total amount of the asset in question available forsale below the current price. Laying out all of the informa-tion for Tรขtonnement contiguously in memory improves thecache performance of each aggregate demand query.
5.3 Fast Merkle-Patricia TriesSPEEDEX performs several expensive trie manipulations inevery block. These tries must often be written to or modifiedby many different threads. Designing trie manipulations tominimize memory contention was one of the most importantimplementation challenges for achieving scalability.We use tries with a fan-out of 16 (half a byte) and fixed-
length keys. Values only exist at the leaves, so we allocatepointers to child nodes and trie values as a union to save1-2 cache lines. Most trie operations require comparing twokeys to find the longest common prefix. Working with 64-bitintegers is faster than working with sequences of bytes.When iterating over a set of transactions, each thread
logs which accounts it modifies. We find it more efficientfor threads to build log tries privately, and then merge thesetries together after the iteration, than for threads to competefor a central log structure. One important optimization whenmerging a batch of tries is to re-divide the tries by prefixrange and merge each range separately. Merging tries con-sisting of disjoint key ranges is trivial, and different threadsresponsible for different prefix ranges never have to write tothe same memory locations.
Each trie node stores some metadata, such as the numberof leaves below a node. Tries that store offers also store thenumber of cancelled offers below each node and the totalamount of the asset offered for sale by offers below eachnode. Offers are cancelled via metadata updates (hardware-level atomics). The actual work of removing cancelled offerscan wait until after the exchange is done iterating over alist of transactions. This lets us avoid the more expensivesynchronization primitives we might need to concurrentlymodify a trie. The metadata makes finding and removingcancelled offers simple.
Worker threads also build a set of tries containing newlycreated offers. When one thread finishes its assigned work,it could immediately merge its private tries into the main setof offer tries. Instead, overall system performance improvesif threads wait (do other work) until all threads are doneworking, the privately created tries are all gathered together,and then the work of merging sets of tries is divided by assetpair.
The trie that records account state has the same key space(account ID) as the trie that records which accounts havebeen modified. This means that every node in the accountmodification trie has a corresponding node (with the sameprefix) in the account state trie. Updating and rehashingthe account database trie (operations that require lookingonly at accounts that were modified) can be done quicklyby assigning different threads to work on disjoint subtriesof the account modification trie. The correspondence thusmeans that each thread can work on disjoint subtries of theaccount state trie, and different threads will not contend forthe same piece of memory. Dividing work according to theaccount modification trie helps spreak work evenly acrossavailable CPU cores.
Leaves of the account modification trie contain the trans-actions sent by an account. This trie, therefore, implicitlysorts transactions by account ID. Block validation on sortedlists of transactions can be faster than on unsorted lists. Ifone account sends multiple transactions, these transactionsare likely to be handled by the same worker thread in thevalidator, reducing memory contention. Validators do notactually check to see if a blockโs transactions are sorted,but an honest block producer could possibly get its blocksconfirmed slightly more quickly.
5.4 Data Storage and PersistenceProcessing transactions in a nondeterministic order makesrecovery from a database snapshot where a block has beenpartially applied quite difficult. Therefore, we would like asystem that can update all account balances modified in oneblock in one atomic transaction. We also want a databasewhere elements can be accessed directly in memory, andwhere database transactions support concurrent modifica-tion. Accounts in our database are frequently modified butonly created once. We therefore need a key-value store thatis designed primarily around supporting a mostly fixed setof fixed-length keys.
We did not find an existing database implementation wellsuited to this workload, so SPEEDEX uses a combinationof an in-memory cache and an ACID-compliant database(LMDB[26]). This combination suffices for our experiments,but construction of a database precisely designed for ourparticular workload is an interesting line of future work.
5.5 Block Headers Include Market EquilibriaChecking correctness of a market equilibrium is much fasterthan computing an equilibrium. Block producers, therefore,include in their block header a market equilibrium. This alsoallows nondeterminism in Tรขtonnement (which we use whenwe run multiple instances of Tรขtonnement in parallel withdifferent operational parameters).Block headers also include the transaction commission Y.
Fixed in the exchange protocol is a maximum commissionrate, and validators check that Y is smaller than themaximum.
8
The commission rate of one block might vary because ofthe linear programming step of price computation. Smallconstraint violations by the solving library (GLPK [46]), androunding error when converting between fixed-point andfloating-point values can cause a query with a commissionof 2โ๐ to retur a solution with a 2โ(๐โ1) commission.We therefore run price computation at a target commis-
sion rate of half the maximum allowed rate. The factor twodifference is an artifact of our choice to represent Y by itsnegative log base 2 (which makes commission computationeasy).
We do not include the approximation parameter ` becauseTรขtonnement does not always return a (Y, `)-approximateequilibrium. Furthermore, subtle differences between floating-point and fixed-point division can cause the LP solver toviolate the `-approximation guarantee (even when Tรขton-nement has converged). In a real-world deployment, nodescan choose between competing blocks by the quality (i.emost trade volume) of the blocksโ equilibria.A node can plausibly deny misbehavior if it fails to oc-
casionally produce a (Y, `) equilibrium, but a node (or thebroader community) can detect if a node fails to producegood prices more than other nodes (and sanction the misbe-having node out of band). SPEEDEX, therefore, may benefitfrom a consensus model that includes a notion of trust rela-tionships or identity.Our block headers also include, for every pair of assets,
the minimum price of the offer with the highest minimumprice that trades in that block. Validators can compare theminimum price of a newly created offer with this marginalminimum price and decide immediately whether or not theoffer executes. Offers that are executed immediately neednot be loaded into the Merkle-Patricia tries.
5.6 Exclusion of Failed TransactionsIf a block of transaction contained a pair of conflicting trans-actions, then a validator would need some extra conflict-resolution information to determine which transaction suc-ceeds and which fails. Most blockchains use transaction or-dering for conflict-resolution.SPEEDEX leaves the choice of conflicting transactions
to the block producer. Specifically, the block producer isresponsible for identifying a set of transactions such thatthe entire block is valid (all transactions are valid, and afterprocessing the block, all account balances are nonnegative).Failed transactions are no-ops, so rather than send bothtransactions with a conflict-resolution bit, the block produceromits failed transactions to save network bandwidth andvalidator CPU time.
5.7 Sequence NumbersTo ensure each transaction executes only once, users marktheir transactions with sequence numbers. Many existing
blockchains require the sequence numbers from one accountto increase strictly sequentially.Our implementation allows small gaps in sequence num-
bers. This makes it easier for a block producer to accept manytransactions from one account in one block. Also, when vali-dating a block, a worker thread that finds a sequence numbergap cannot easily determine whether a transaction that fillsthe gap is present in said block.We also require that sequence numbers increase by at
most 64 within one block. The choice of the constant 64is arbitrary. However, limiting sequence numbers by somefixed constant enables a replica to track sequence numberswith only a bitvector and hardware-level atomics.
5.8 Side Effect VisibilitySPEEDEX adjusts asset amounts only through hardware-level atomic operations (i.e. operations on 8-byte chunks).One thread might read database writes from another threadโsonly partially processed transaction. Additions and subtrac-tions to a counter commute, but a block producer must takecare to ensure that partial database writes do not cause erro-neous acceptance of an invalid transaction.
Many operations in a transaction might fail if an accountbalance is too low; none fail, however, if an account balanceis too high. If some candidate transaction fails and is rolledback, but this candidate transaction increased an accountbalance in the shared database, some other thread could haveread the incorrectly high balance and accepted an invalidtransaction. As such, when building a new block of trans-actions, decreases to account balances are written to thedatabase immediately, while increases are buffered privatelyuntil the transaction is guaranteed to commit. Validators canskip this step.
Other transaction side effects are not made visible until theentire block of transactions commits. Thus, an offer cannotbe created and cancelled in the same block, and accountscannot send transactions in the block in which they arecreated.
6 Empirical Evaluation of TรขtonnementWe find that the runtime of Tรขtonnement depends primarilyon the target approximation accuracy, the number of opentrade offers, and the distribution of the open trade offers. Theruntime of Tรขtonnement increases as the desired accuracyincreases (as Y and ` decrease). Surprisingly, the runtimeactually decreases as the number of open offers increases.And as discussed, Tรขtonnement performs best when thevolume traded of each asset is roughly the same.
Our goal is to run Tรขtonnement once in every block ofour exchange. If our goal is to produce one block every fewseconds, then Tรขtonnement should run in well under onesecond most of the time. Unless otherwise specified, our im-plementation runs Tรขtonnement with a timeout of 2 seconds.
9
Figure 1. Runtime of Tรขtonnement on instances with500,000 offers and 20 assets with varying approximationparameters. The horizontal axis plots the commission rate(โ log2(Y)), while each line represents a different accuracy ofapproximation for optimal offer behavior (โ log2(`)).
6.1 Approximation Accuracy and Orderbook SizeFigure 1 plots the runtimes of Tรขtonnement on a set of500,000 offers generated from a synthetic model (AppendixG), with a variety of accuracy targets. Runtimes are averagedover 5 runs. Any target approximation level where not everyrun fully converged within the time limit is excluded fromthe graph. Asset trade volumes are not normalized, but theyare generated so that equilibrium trade amounts are roughlyequal.
For comparison, BinanceDex [2] charges either a 0.1% fee(โ log2(0.001) = 10) or a 0.04% fee (โ log2(0.0004) = 11.3)[1]. Uniswap [14] charges a 0.3% fee (โ log2(0.003) = 8.4).Coinbase charges between 0.5% and 4%, depending on thetransaction [4]. A comparable target of Y = ` = 10 is feasiblefor Tรขtonnement. Runtimes increase slowly as target accu-racy increases, until a point where required runtime startsto rise dramatically.
As discussed in ยง4.3.1, Tรขtonnement works best when theaggregate demand function has relatively few sharp deriva-tives. For small `, the behavior of one offer is almost exactlya step function; at ` = 0, an offer executes if and only if mar-ket prices are above the offerโs minimum price. The stepsof Tรขtonnement often cross these minimum price thresh-olds. When there are few offers, crossing a single offerโsthreshold causes a comparatively large change in the ag-gregate demand of the market. This can mean that evenvery small Tรขtonnement steps will overshoot an equilib-rium, leading to oscillation. Alternatively, as discussed inยง4.3.2, Tรขtonnementโs choice of step size is heuristic-guided.When very small Tรขtonnement steps cause comparatively
Figure 2. Minimum number of offers for Tรขtonnement torun in under 0.25 seconds (times averaged over 5 runs).
large heuristic changes (especially heuristic increases), Tรข-tonnement might decrease its step size down to its minimumbefore proceeding. The more frequently this happens, theslower Tรขtonnement runs.In other words, Tรขtonnement works best when there are
many offers (so each offer is a relatively small part of thetotal volume on the exchange) or when ` is large (in thegraphs, when โ log2(`) is small). In the real world, SPEEDEXinstances with more participants might be able to run Tรขton-nement to a more accurate level than less active instances.Figure 1 is running on an easy case for Tรขtonnement.
There are many open offers, and the total volume traded foreach asset (price times amount) is roughly equal. Holdingthe offer distribution constant, Figure 2 displays the tradeoffbetween Tรขtonnement runtime, approximation accuracy, andthe number of open trade offers. Each cell represents a onesetting of the approximation parameters (Y, `), and contains(an upper bound on) the minimum number of offers requiredfor Tรขtonnement to run in under 0.25 seconds and producean (Y, `)-approximate equilibrium.In a time period when very few offers are sent to the ex-
change, the exchange might not be able to find prices suchthat the maximum possible number of offers can find a coun-terparty. When fewer offers clear in one block, more offersare left to the next block, which may make Tรขtonnement eas-ier in the next block.
Note that while very small ` or Y requires a large numberof offers to trade, reasonable approximation settings such asY = ` = 2โ10 or 2โ15 do not.
10
6.2 Imbalanced DistributionsWe demonstrate in Appendix H that the performance ofTรขtonnement is robust to varying the parameters of our datageneration model.In the real world, most trading data involves a reserve
currency (i.e. buying or selling assets denominated in USD),or, in the cryptocurrency world, have very low transactionrates. As such, there does not appear to be a good set of dataon which we can directly test Tรขtonnement. As a next-bestalternative, we generate a dataset for Tรขtonnement based offof historical price and market volume data.
We took the 20 assets that had the largest market volumeon April 29, 2021 (as reported by coingecko.com)1 and foreach asset, gathered 500 days of price and trade volume his-tory. We then generated 500 batches of 100,000 transactions.A new offer in batch ๐ sells asset ๐ด (and buys asset ๐ต) withprobability proportional to the relative volume of asset ๐ด(and asset ๐ต, conditioned on ๐ต ฬธ= ๐ด) on day ๐ , and demands aminimum price close to the real-world market price on day๐ . We also set our block size to be only 50,000 (as comparedto the 500,000+ transaction blocks of ยง7). This increases thedifficulty of Tรขtonnement.
This dataset is particularly challenging for Tรขtonnement,in that many assets are comparatively rarely traded, andcryptocurrency asset volumes and prices fluctuate signif-icantly. 3 of the currencies did not exist (i.e. 0 trade vol-ume) in the first half of the experiment. Our experimentran for 820 blocks, and Tรขtonnement did not converge to a(Y = 2โ10 โ 0.1%, ` = 2โ7 โ 1%) equilibrium in 31 blocks.
When Tรขtonnement does not quickly converge, its outputcandidate prices are close enough to a market equilibriumthat the linear programming step can still match a largeamount of trade volume. To quantify the effect of Tรขton-nement nonconvergence on trade volume, we compute ineach block a โprice asymmetry metric.โ For each asset pair (๐ด,๐ต), we compute the minimum `๐ด,๐ต such that all offers selling๐ด for ๐ต with minimum price below (1โ `๐ด,๐ต)(๐๐ด/๐๐ต) execute.The price asymmetry metric is the volume-weighted averageof these `๐ด,๐ตs. Note that Tรขtonnement convergence is akinto guaranteeing that `๐ด,๐ต โค ` for all (๐ด, ๐ต).
In blocks where Tรขtonnement converges, the price asym-metry average is 0.1% (the maximum observed was 0.2%),while when Tรขtonnement does not converge within the timelimit, the price asymmetry average is 0.2% (with a maximumof 1.2%). In other words, even when Tรขtonnement does notquickly converge on particularly difficult datasets, the outputprices are still close enough to equilibrium that the linearprogramming phase can match most of the open offers.
1The currencies are CAKE, ADA, BCH, BNB, BTC, DOGE, ETH, ETC, HT,LTC, MATIC, TRX, UNI, USDT, VET, XRP, ZEC, and USDT. We droppedBUSD because the list already included two other USD stablecoins, onewith lower volume and one with higher volume.
Qualitatively, Tรขtonnement does not converge quicklywhen one asset is highly volatile. Tรขtonnement may mispricethis volatile asset for a short time, but the other assets arerelatively unaffected.
7 Scalability EvaluationWe ran SPEEDEX on a cluster of four nodes in a Cloudlab[38] datacenter. One machine is designated the block pro-ducer, and the other machines validate blocks of transactionscreated by the producer. Each machine (Cloudlabโs machinetype โrs440โ)has two 16-core Intel Xeon Gold 6130 CPUsrunning at 2.10 Ghz with hyperthreading enabled, 192 GBof memory, and a 240GB SSD (Intel series S4600) using theXFS filesystem.
For this experiment, we generated synthetic transactiondata according to the data model outlined in Appendix G.Each set of 100,000 transactions is generated as though theassets have some underlying valuations, and these valuationsare modified (akin to a Brownian motion) after every set.We generate 5000 sets of transactions, which are read intoa โmempoolโ by the block producer. The block producerattempts to keep the size of this mempool at about 2 million,reading in new sets of transactions as necessary, and theexperiments ran until all sets of transactions were exhausted.To show performance with as many open trade offers aspossible, accounts in this simulation do not cancel old offers.A small number of transactions in the dataset were generatedto be invalid noise.
The block producer targets each block to include 500,000to 600,000 transactions. To build blocks quickly, we do nottarget an exact amount; instead, threads โreserveโ space toadd transactions to a new block, and make a best (but notperfect) effort to โreleaseโ unused space back to other threads(if, for example, a thread finds many failed transactions).This means our implementation builds smaller blocks whenoperating with many threads. When running with 4 threads,blocks included roughly 600,000, transactions, while whenrunning with 64 threads, blocks included 540,000-580,000transactions.Most of the expensive work is done by a dedicated set of
worker threads. To demonstrate scalability, we measure theruntime of the nodes in our exchange with varying numbersof these worker threads. This may artificially improve theperformance of the experiments using fewer threads. Ourimplementation is designed around having access to manycpu cores, so running SPEEDEX unmodified on a systemwith few CPU cores would not be a fair comparison.
Figure 3 displays the time that the block producer nodetook to produce each block in our experiments. Users trade20 different assets, and Tรขtonnement is run with approxi-mation parameters (Y = 2โ15 and ` = 2โ10). The exchangeโsaccount database contains 10 million accounts at the start ofthe experiment. Trade volumes are roughly the same across
11
0.0M 10.0M 20.0M 30.0M 40.0M 50.0M 60.0M 70.0M#Open Offers (Millions)
5
10
15
20
25
30
Runtime (s)
#Threads48163264
Figure 3. Runtime of block producer node, with varyingnumbers of worker threads, plotted over the number of openoffers on the exchange.
different assets, and prices vary as data is generated. Tรขton-nement did not time out in any block during this experiment.The x-axis of Figure 3 is the number of open offers on
the exchange. The number of open offers on the exchangeis the biggest factor (other than the number of transactionsin a block) determining a blockโs production time. Figure 3demonstrates that SPEEDEX can scale to support very largenumbers of open offers to trade, and that its performancebenefits from additional CPU cores.Figure 3 also displays a negative side effect of our data
persistence mechanism. The time to produce a block whenrunning with many threads occasionally is very large. Dur-ing these spikes, the node is doing nothing but waiting topersist a snapshot of the account database to disk. This snap-shot is produced every five rounds, and the disk writingis continually performed in the background. Running withfewer threads does not help the disk operate faster or givethe disk less work, but simply gives the disk more time towork.
Our experiments generate a sustained write workloadof 200-230 MB/s, which is roughly the speed our of SSD inisolated benchmarks. We were unable to install faster storagedevices on our machines, but standard techniques, such asstriping data over multiple, faster drives, could alleviate thisbottleneck. As discussed in ยง5.4, existing databases are notdesigned for our workload. Scaling to 100 million accountsor a billion accounts could require a new database design.The increase in runtime associated with an increase in
the number of open offers stems from a Tรขtonnement op-timization. The one part of SPEEDEX that we cannot eas-ily accelerate by buying fancier hardware is Tรขtonnement,and so we design our implementation towards making Tรข-tonnement as fast as possible. When running an aggregate
0.0M 10.0M 20.0M 30.0M 40.0M 50.0M 60.0M 70.0M#Open Offers (Millions)
2.5
5.0
7.5
10.0
12.5
15.0
17.5
20.0
Runtime (s)
#Threads48163264
Figure 4. Runtime of block validator node, with varyingnumbers of worker threads, plotted over the number of openoffers on the exchange.
demand query, Tรขtonnement could query the tries that storeoffers. Instead, Tรขtonnement queries a prebuilt list, laid outcontiguously in memory, of tuples denoting โat price ๐ , theamount of the asset available for sale is ๐ฅ .โ This gives mem-ory accesses within one Tรขtonnement query much bettercache locality. An implementation might choose to skip thisexpensive precomputation in some parameter regimes.
Nodes can always skip this work when validating a blockproduced by a different node. Figure 4 plots the runtimes fora node validating blocks of transactions. Note that validatorslargely avoid the increase in runtime associated with moreoffers open on the exchange. Not included is time idlingwhile waiting for the block producer to produce blocks.
Figure 4 plots the runtime for the first validator node inthe chain. The other validator nodes perform similarly.
These numbers do not include signature verification. Trans-actions are signed and the account database stores publickeys; our implementation simply does not actually checkthe signature. The block producer node can check signatureson incoming transactions before adding them to its mem-pool, and a block validator can check signatures entirelyin parallel to the rest of the block validation logic. Signa-ture checking could even be offloaded to a separate physicalmachine. Checking the signatures on 500,000 transactionson our hardware (including looking up public keys in ourdatabase) takes 1.5-2 seconds.
Note that the transaction rate of this system is limited bythe slowest replica, i.e. the block producer. Figure 5 plotsthe end-to-end transaction rate of our system. When usingthe full CPU resources of our hardware, our exchange canprocess more than 120,000 transactions per second.
For comparison,Wang et. al [52] benchmark the raw trans-action throughput of the Ethereum Virtual Machine; with
12
0.0M 10.0M 20.0M 30.0M 40.0M 50.0M 60.0M 70.0M#Open Offers (Millions)
50000
100000
150000
200000
Tran
sact
ions
Per
Sec
ond
#Threads48163264
Figure 5. Transactions per second on SPEEDEX with vary-ing numbers of worker threads, plotted over the number ofopen offers on the exchange.
10,000 open accounts, the EVM processed transactions thatjust transfer tokens between accounts at a rate of only 13thousand per second (although this number includes signa-ture checking). Transactions with more complex logic areprocessed substantially more slowly.
A worst-case lower bound is that if Tรขtonnement were toalways time out, our top-line transaction rate would fall tobetween 80,000 and 150,000 transactions per second.The persistence caveats of figure 3 also apply here. If
SPEEDEX is limited to not run faster than our SSD, it wouldstill achieve at least 70-100,000 transactions per second.
8 ConclusionThis work presents SPEEDEX, a fully on-chain distributed as-set exchange capable of processing more than 100,000 trans-actions per second on 32-core servers with tens of millionsof open offers. It achieves scalability by designing transac-tion semantics and an offer matching system so that trans-actions in a block commute; this lets the exchange effec-tively use many CPU cores while operating deterministically.SPEEDEX also displays several economic properties of inde-pendent interest, including elimination of front-running andof (internal) arbitrage.
AcknowledgmentsThis research was supported by the Stanford Future of Digi-tal Currency Initiative, the Stanford Center for BlockchainResearch, and the Army Research Office.
References[1] [n.d.]. Binance Chain Docs - Fees. https://docs.binance.org/guides/
concepts/fees.html Accessed 04/10/2021.
[2] [n.d.]. Binance Chain Docs - Match Engine. https://docs.binance.org/match.html Accessed 03/14/2021.
[3] [n.d.]. Binance Chain Explorer. Accessed 03/14/2021.[4] [n.d.]. Coinbase Pricing and Fees Disclosures. https:
//help.coinbase.com/en/coinbase/trading-and-funding/pricing-and-fees/fees Accessed 04/10/2021.
[5] [n.d.]. Eth2 Shard Chains. https://ethereum.org/en/eth2/shard-chains/ Accessed 03/11/2021.
[6] [n.d.]. An Exchange Protocol for the DecentralizedWeb. ([n. d.]). https://docs.gnosis.io/protocol/docs/introduction1/ and https://github.com/gnosis/dex-research/blob/08204510e3047c533ba9ee42bf24f980d087fa78/dFusion/dfusion.v1.pdf and https://github.com/gnosis/dex-research/blob/master/BatchAuctionOptimization/batchauctions.pdf.
[7] [n.d.]. Intel oneAPI Threading Building Blocks. https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/onetbb.html Accessed 5/6/2021.
[8] [n.d.]. Optimistic Rollups. https://docs.ethhub.io/ethereum-roadmap/layer-2-scaling/optimisticrollups/ Accessed 03/11/2021.
[9] [n.d.]. STARKEx. https://starkware.co/product/starkex/[10] [n.d.]. Stellar. https://www.stellar.org/[11] [n.d.]. ZK Rollups. https://docs.ethhub.io/ethereum-roadmap/layer-
2-scaling/zk-rollups/ Accessed 03/11/2021.[12] 2018. Loopring: A Decentralized Token Exchange Protocol. (8 Septem-
ber 2018). https://loopring.org/resources/enwhitepaper.pdf[13] 2021. GPv2 Objective Criterion. https://forum.gnosis.io/t/gpv2-
objective-criterion/1254 Accessed 04/30/2021.[14] Hayden Adams, Noah Zinsmeister, and Dan Robinson. 2020. Uniswap
v2 core. (2020).[15] Guillermo Angeris, Hsien-Tang Kao, Rei Chiang, Charlie Noyes, and
Tarun Chitra. 2019. An analysis of Uniswap markets. CryptoeconomicSystems Journal (2019).
[16] Parwat Singh Anjana, Sweta Kumari, Sathya Peri, Sachin Rathor, andArchit Somani. 2019. An efficient framework for optimistic concurrentexecution of smart contracts. In 2019 27th Euromicro InternationalConference on Parallel, Distributed and Network-Based Processing (PDP).IEEE, 83โ92.
[17] Kenneth J Arrow, Henry D Block, and Leonid Hurwicz. 1959. On thestability of the competitive equilibrium, II. Econometrica: Journal ofthe Econometric Society (1959), 82โ109.
[18] Kenneth J Arrow and Gerard Debreu. 1954. Existence of an equilibriumfor a competitive economy. Econometrica: Journal of the EconometricSociety (1954), 265โ290.
[19] Massimo Bartoletti, Letterio Galletta, and Maurizio Murgia. 2020. Atrue concurrent model of smart contracts executions. In InternationalConference on Coordination Languages and Models. Springer, 243โ260.
[20] Eli Ben-Sasson, Iddo Bentov, Yinon Horesh, and Michael Riabzev.2018. Scalable, transparent, and post-quantum secure computationalintegrity. (2018).
[21] Michele Benzi. 2002. Preconditioning techniques for large linear sys-tems: a survey. Journal of computational Physics 182, 2 (2002), 418โ477.
[22] Stephen P Boyd and Lieven Vandenberghe. 2004. Convex optimization.Cambridge university press.
[23] Eric Budish, Peter Cramton, and John Shim. 2015. The high-frequencytrading arms race: Frequent batch auctions as amarket design response.The Quarterly Journal of Economics 130, 4 (2015), 1547โ1621.
[24] Miguel Castro and Barbara Liskov. 1999. Practical byzantine faulttolerance. In 3rd Symposium on Operating Systems Design and Imple-mentation. New Orleans, LA, 173โ186.
[25] Xi Chen, Dimitris Paparas, and Mihalis Yannakakis. 2017. The com-plexity of non-monotone markets. Journal of the ACM (JACM) 64, 3(2017), 1โ56.
[26] Howard Chu and Symas Corporation. [n.d.]. Lightning Memory-Mapped Database Manager (LMDB). http://www.lmdb.tech/doc/
13
Accessed 04/29/2021.[27] Austin T Clements, M Frans Kaashoek, Nickolai Zeldovich, Robert T
Morris, and Eddie Kohler. 2015. The scalable commutativity rule: De-signing scalable software for multicore processors. ACM Transactionson Computer Systems (TOCS) 32, 4 (2015), 1โ47.
[28] Dan Cline, Thaddeus Dryja, and Neha Narula. [n.d.]. ClockWork: AnExchange Protocol for Proofs of Non Front-Running. ([n. d.]).
[29] Bruno Codenotti, Benton McCune, and Kasturi Varadarajan. 2005.Market equilibrium via the excess demand function. In Proceedingsof the thirty-seventh annual ACM symposium on Theory of computing.74โ83.
[30] Bruno Codenotti, Sriram V Pemmaraju, and Kasturi R Varadarajan.[n.d.]. On the polynomial time computation of equilibria for certainexchange economies.
[31] Philip Daian, Steven Goldfeder, Tyler Kell, Yunqi Li, Xueyuan Zhao,Iddo Bentov, Lorenz Breidenbach, and Ari Juels. 2019. Flash boys 2.0:Frontrunning, transaction reordering, and consensus instability indecentralized exchanges. arXiv preprint arXiv:1904.05234 (2019).
[32] Nikhil R Devanur, Jugal Garg, and Lรกszlรณ A Vรฉgh. 2016. A rationalconvex program for linear Arrow-Debreu markets. ACM Transactionson Economics and Computation (TEAC) 5, 1 (2016), 1โ13.
[33] Nikhil R Devanur and Vijay V Vazirani. 2003. An improved approxi-mation scheme for computing Arrow-Debreu prices for the linear case.In International Conference on Foundations of Software Technology andTheoretical Computer Science. Springer, 149โ155.
[34] Steven Diamond and Stephen Boyd. 2016. CVXPY: A Python-embedded modeling language for convex optimization. The Journal ofMachine Learning Research 17, 1 (2016), 2909โ2913.
[35] Thomas Dickerson, Paul Gazzillo, Maurice Herlihy, and Eric Koskinen.2019. Adding concurrency to smart contracts. Distributed Computing(2019), 1โ17.
[36] Alexander Domahidi, Eric Chu, and Stephen Boyd. 2013. ECOS: AnSOCP solver for embedded systems. In 2013 European Control Confer-ence (ECC). IEEE, 3071โ3076.
[37] Ran Duan and Kurt Mehlhorn. 2015. A combinatorial polynomialalgorithm for the linear ArrowโDebreu market. Information and Com-putation 243 (2015), 112โ132.
[38] Dmitry Duplyakin, Robert Ricci, Aleksander Maricq, Gary Wong,Jonathon Duerig, Eric Eide, Leigh Stoller, Mike Hibler, David John-son, Kirk Webb, Aditya Akella, Kuangching Wang, Glenn Ricart,Larry Landweber, Chip Elliott, Michael Zink, Emmanuel Cecchet,Snigdhaswin Kar, and Prabodh Mishra. 2019. The Design and Opera-tion of CloudLab. In Proceedings of the USENIX Annual Technical Confer-ence (ATC). 1โ14. https://www.flux.utah.edu/paper/duplyakin-atc19
[39] Jugal Garg and Lรกszlรณ A Vรฉgh. 2019. A strongly polynomial algorithmfor linear exchange markets. In Proceedings of the 51st Annual ACMSIGACT Symposium on Theory of Computing. 54โ65.
[40] Yossi Gilad, Rotem Hemo, Silvio Micali, Georgios Vlachos, and Nick-olai Zeldovich. 2017. Algorand: Scaling Byzantine Agreements forCryptocurrencies. In Proceedings of the 26th Symposium on Operat-ing Systems Principles (Shanghai, China) (SOSP โ17). Association forComputing Machinery, New York, NY, USA, 51โ68. https://doi.org/10.1145/3132747.3132757
[41] Eyal Hertzog, Guy Benartzi, and Galia Benartzi. 2018. Bancor Protocol.(2018).
[42] Kamal Jain. 2007. A polynomial time algorithm for computing anArrowโDebreu market equilibrium for linear utilities. SIAM J. Comput.37, 1 (2007), 303โ318.
[43] Kamal Jain, Mohammad Mahdian, and Amin Saberi. 2003. Approx-imating market equilibria. In Approximation, Randomization, andCombinatorial Optimization.. Algorithms and Techniques. Springer, 98โ108.
[44] Mahimna Kelkar, Fan Zhang, Steven Goldfeder, and Ari Juels. 2020.Order-fairness for byzantine consensus. In Annual International Cryp-tology Conference. Springer, 451โ480.
[45] Marta Lokhava, Giuliano Losa, David Maziรจres, Graydon Hoare, Nico-las Barry, Eli Gafni, Jonathan Jove, Rafaล Malinowsky, and Jed McCaleb.2019. Fast and Secure Global Payments with Stellar. In Proceedings ofthe 27th ACM Symposium on Operating Systems Principles (Huntsville,Ontario, Canada) (SOSP โ19). Association for Computing Machinery,New York, NY, USA, 80โ96. https://doi.org/10.1145/3341301.3359636
[46] Andrew Makhorin. 2008. GLPK (GNU linear programming kit).http://www.gnu.org/s/glpk/glpk.html (2008).
[47] Satoshi Nakamoto. 2008. Bitcoin: A peer-to-peer electronic cash sys-tem. http://bitcoin.org/bitcoin.pdf.
[48] Joseph Poon and Vitalik Buterin. 2017. Plasma: Scalable autonomoussmart contracts. White paper (2017), 1โ47.
[49] Joseph Poon and Thaddeus Dryja. 2016. The bitcoin lightning network:Scalable off-chain instant payments.
[50] Vikram Saraph and Maurice Herlihy. 2019. An empirical study ofspeculative concurrency in ethereum smart contracts. arXiv preprintarXiv:1901.01376 (2019).
[51] Robbert Van Renesse and Fred B Schneider. 2004. Chain Replicationfor Supporting High Throughput and Availability.. In OSDI, Vol. 4.
[52] Gerui Wang, Shuo Wang, Vivek Bagaria, David Tse, and PramodViswanath. 2020. Prism removes consensus bottleneck for smartcontracts. In 2020 Crypto Valley Conference on Blockchain Technology(CVCBT). IEEE, 68โ77.
[53] Will Warren and Amir Bandeali. 2017. 0x: An open protocol for de-centralized exchange on the Ethereum blockchain. (2017).
[54] Yinyu Ye. 2008. A path to the ArrowโDebreu competitive marketequilibrium. Mathematical Programming 111, 1-2 (2008), 315โ348.
[55] Maofan Yin, Dahlia Malkhi, Michael K. Reiter, Guy Golan Gueta, andIttai Abraham. 2019. HotStuff: BFT Consensus with Linearity andResponsiveness. In Proceedings of the 2019 ACM Symposium on Prin-ciples of Distributed Computing (Toronto ON, Canada) (PODC โ19).Association for Computing Machinery, New York, NY, USA, 347โ356.https://doi.org/10.1145/3293611.3331591
[56] Wei Yu, Kan Luo, Yi Ding, Guang You, and Kai Hu. 2018. A ParallelSmart Contract Model. In Proceedings of the 2018 International Confer-ence on Machine Learning and Machine Intelligence. 72โ77.
[57] Yunhao Zhang, Srinath Setty, Qi Chen, Lidong Zhou, and LorenzoAlvisi. 2020. Byzantine Ordered Consensus without Byzantine Oli-garchy. In 14th {USENIX} Symposium on Operating Systems Designand Implementation ({OSDI} 20). 633โ649.
Appendix A Arrow-Debreu ExchangeMarkets
Internally, SPEEDEX views a set of trade offers as an Arrow-Debreu exchange market [18]; the clearing price computa-tion problem maps to the Arrow-Debreu market equilibriumcomputation problem.
The Arrow-Debreu Exchange Market is a well studied con-cept, and as such there are many variants on the definition.Relevant to this work is a simple version of the model, wherethere are no stock dividends and no production. Further, ourexchange will look only at snapshots of the market, (i.e. onceper block), and thus we do not consider time-varying modelsor strategic agents.Agents in an exchange market come to the market with
some set of goods. They sell their goods to the market atthe marketโs prices in exchange for โmoney,โ which they
14
immediately spend at the market to buy their preferred setof goods. Their preferred set of goods is determined by theagentโs utility function. Specifically, agents buy the collectionof goods that maximizes their utility subject to their budgetconstraint.SPEEDEX users do not submit utility functions to the
exchange. But most natural offer types implicitly encode asimple utility function. Recall the sell offer.
Definition A.1 (Sell Offer). A Sell Offer (๐ , ๐ต, ๐ , ๐ผ) is requestto sell ๐ units of good ๐ in exchange for some number ๐ unitsof good ๐ต, subject to the condition that ๐ โฅ ๐ผ๐ .
The user who submits this offer implicitly says that theyvalue ๐ units of ๐ต more than ๐ units of ๐ if and only if ๐ โฅ ๐ผ๐ .Thus, the userโs preferences are representable as a simpleutility function.
Example A.2 (Sell Offer). Suppose a user submits a selloffer (๐ , ๐ต, ๐ , ๐ผ). The optimal behavior of this offer (and theuserโs implicit preferences) is equivalent to maximizing thefunction ๐ข(๐ฅ๐ , ๐ฅ๐ต) = ๐ผ๐ฅ๐ต + ๐ฅ๐ (for ๐ฅ๐ , ๐ฅ๐ต amounts of goods ๐and ๐ต).
Moreover, the utilites of agents derived from sell offers arelinear, and every offer has a nonzero valuation of the goodwhich it sells. This means that our market instances satisfycondition (*) of Devanur et. al [32], and thus clearing pricesalways exist.
To restate the earlier intuition, anArrow-Debreu ExchangeMarket (as discussed in this work) is the following:
DefinitionA.3 (Arrow-Debreu ExchangeMarket). AnArrow-Debreu Exchange Market consists of a set of goods [๐ ] anda set of agents [๐]. Every agent ๐ has a utility function ๐ข ๐
and an endowment ๐ ๐ โ R๐โฅ0.When the market trades at prices ๐ โ R๐โฅ0, every agent
sells their endowment to the market in exchange for revenue๐ ๐ = ๐ ยท๐ ๐ , which the agent immediately spends at the marketto buy back an optimal bundle of goods ๐ฅ ๐ โ R๐โฅ0 - that is,๐ฅ ๐ = arg max๐ฅ :ฮฃ๐ฅ๐๐๐ โค๐ ๐ ๐ข ๐ (๐ฅ ).
Definition A.4 (Market Equilibrium). An equilibrium of anArrow-Debreu market is a set of prices ๐ and an allocation๐ฅ ๐ for every agent ๐ , such that for all goods ๐ , ฮฃ๐๐๐ ๐ โฅ ฮฃ๐๐ฅ๐ ๐ ,and ๐ฅ ๐ is an optimal bundle for agent ๐ .
SPEEDEX uses two notions of approximation. The transac-tion commission Y maps onto exchange markets in a naturalway (agents receive only a (1 โ Y) fraction of what they pur-chase). Our requirements for optimal offer execution (`) areslightly different from what typically appears in the theoret-ical literature.
Common to the theoretical literature (e.g. Definition 1 of[29]) is the following definition:
Definition A.5 ((`)-Approximately Optimal Bundle). Sup-pose an agent has utility ๐ข(ยท) and initial endowment ๐ , and
the market prices are ๐ . Let ๐ฅโ maximize ๐ข(ยท) subject to๐ฅโ ยท ๐ โค (๐ ยท ๐).
A bundle ๐ฆ is (`)-approximate if ๐ข(๐ฆ) โฅ ๐ข(๐ฅโ)(1 โ `)
This definition is not strong enough for a real-world ex-change. In particular, it does not disallow execution of offerswith minimum prices higher than the equilibrium prices.Such orders should remain unchanged until market pricesrise.SPEEDEX uses the following stronger definition of ap-
proximately optimal behavior.
Definition A.6 (Strongly (Y, `)-Approximately Optimal Bun-dle for a Sell Offer). Consider a Sell offer selling asset ๐ unitsof ๐ด in exchange for asset ๐ต with minimum price ๐ผ .
Its behavior in amarket equilibriumwith prices ๐ is Strongly(`, Y)-Approximately Optimal if one of the following condi-tions holds.
1. If ๐๐ด/๐๐ต > ๐ผ , the offer executes fully. That is, theoffer sells ๐ units of ๐ด to the market, in exchange for๐(๐๐ด/๐๐ต)(1 โ Y) units of ๐ต.
2. If ๐๐ด/๐๐ต(1 โ `) < ๐ผ , the offer does not execute. Thatis, the offer retains its entire stock of ๐ units of ๐ด.
3. Otherwise, the offer executes partially at some rate_ โ [0, 1]. That is, the offer sells _๐ units of ๐ด to themarket, in exchange for _๐(๐๐ด/๐๐ต)(1 โ Y) units of ๐ต.
We say that an agent in an Arrow-Debreu market that isderived from an Sell Offer gets a strongly (`, Y)-approximatelyoptimal bundle if its bundle is strongly-(`, Y)-approximatelyoptimal for the original Sell Offer.
Combining these together gives the following definition.
DefinitionA.7 (Strongly (Y, `)-ApproximateMarket Equilib-rium). A strongly (Y, `)-aproximate equilibrium of an Arrow-Debreu market is a set of prices ๐ and an allocation ๐ฅ ๐ forevery agent ๐ , such that for all goods ๐ , ฮฃ๐๐๐ ๐ โฅ ฮฃ๐๐ฅ๐ ๐ , and ๐ฅ ๐is a strongly (Y, `)-approximately optimal bundle.
This definition corresponds to Definition 4.1 and its asso-ciated constraints.
Appendix B Buy and Sell Offers Togetherare PPAD-Complete
Integrating buy offers moves equilibrium computation to amuch harder complexity class.
An offer to buy a fixed amount of some asset would looklike the following:
Definition B.1. A Buy Offer (๐ , ๐ต, ๐ , ๐ผ , ๐) is a request tosell ๐ units of good ๐ for up to ๐ units of good ๐ต. This saleis subject to the condition that the actual amount sold tothe market is at least ๐ผ times the amount bought from themarket. (i.e., the price of an ๐ is at least ๐ผ๐ต).
Tรขtonnement requires that the market instance satisfya property known as โWeak Gross Substitutability (WGS)โ
15
An instance of an Arrow-Debreu exchange market satisfiesWGS if an increase to the valuation of some good ๐ doesnot cause a decrease in the aggregate demand for good ๐ .Intuitively, if the valuation of one good rises, agents canpurchase less of this good from the market, and might chooseto buy something else instead.Exchange market instances based on collections of sell
offers satisfy WGS, but instances based on buy and sell offersdo not.
Example B.2. Suppose one user submits the buy offer (EUR,USD, 100, 1.10, 60), and this is the only buy offer on theexchange. This user wishes to buy up to 60 Dollars, so longas the price of the Euro is at least $1.10.
If the price of the Euro is $1.20, then this offer will purchase60 Dollars from the market and sell 50 Euros to the market.The aggregate demand, therefore, is (60 ๐๐๐ท,โ50 ๐ธ๐๐ ).
If the valuation of the Dollar rises, so that the price of aEuro is only 1.10, then the offer will still purchase 60 Dollars,but now needs will need to spend 5ฬ4 Euros to do so. Theaggregate demand, therefore, changes to (60๐๐๐ท,โ54 ๐ธ๐๐ ).
In other words, a rise in the valuation of the Dollar causeda decrease in the aggregate demand for Euros.
Theorem B.3. Computing approximate equilibrium pricesin an Arrow-Debreu market generated by Sell Offers and BuyOffers is PPAD-complete.
Proof. The theorem is a restatement of Corollary 2.4 of [25].Corollary 2.4 follows from Theorem 7 of [25], and re-
quires agents with general linear utility functions, not justthe sparse ones generated by Sell Offers. However, the con-struction in Theorem 7 is quite general; when applied tothe context of agents with utilities generated by Buy or SellOffers, this requirement is not necessary.
In fact, the construction in this context uses only utilitiesthat could be generated by Sell or Buy Offers.
One technical detail is that some agents have utility onlyfor one good, and might have multiple types of goods inits endowment. An agent with linear utility and multipletypes of goods in its endowment can be replaced by multipleagents with the same utility, each one with only one typeof good, to the same overall effect on the market. An agentwith utility for only one type of good is akin to a Sell offerwith minimum price 0. โก
Appendix C Rounding via LinearProgramming
Suppose our exchange currenty trades ๐ distinct assets. Re-call that the output of Tรขtonnement run with approximationparameters (`, Y) is a set of prices ๐๐ for ๐ โ [๐ ].For any orderbook (say, selling asset ๐ด in exchange for
asset ๐ต), let the constant ๐ฟ๐ด๐ต be the minimum amount of ๐ดthat must be sold at the given prices in order to satisfy the`-approximation guarantee, and let the constant ๐ป๐ด๐ต be the
maximum amount of ๐ด that can be sold at the given prices(i.e. the total amount of ๐ด available for sale by offers withminimum prices below ๐๐ด/๐๐ต). Let the variable ๐ฆ๐ด๐ต denotethe amount of ๐ด sold for ๐ต. To satisfy our `-approximationguarantee, the linear program must maintain ๐ฟ๐ด๐ต โค ๐ฆ๐ด๐ต โค๐ป๐ด๐ต .To maximize the amount of โvalueโ that trades hands,
our program maximizes ฮฃ๐, ๐ โ๐ :๐ ฬธ=๐๐ฆ๐ ๐๐๐ . Finally, the linearprogram needs a constraint to make sure no units of anyasset are inadvertently created.Maximizing this objective means that no offers are left
crossing after executing all offers in a block according to theresults of this linear program - any crossing orders couldexecute to increase this objective.
The full linear program is as follows:
max ฮฃ๐,๐โ[๐ ],๐ ฬธ=๐๐๐๐ฆ๐๐ (1)๐ .๐ก . ๐ฟ๐๐ (๐) โค ๐ฆ๐๐ โค ๐ป๐๐ (๐) โ(๐, ๐) โ [๐ ] ร [๐ ], (๐ ฬธ= ๐)
(2)๐๐ฮฃ๐โ[๐ ]๐ฆ๐๐ โฅ (1 โ Y)ฮฃ๐โ[๐ ]๐๐๐ฆ๐๐ โ๐ โ [๐ ] (3)
(4)
In cases where Tรขtonnement does not converge within atime limit, we modify this linear program by dropping thelower bounds on the ๐ฆ๐๐ variables.
Appendix D MultipleTรขtonnement Instances
Our implementation of Tรขtonnement chooses a step size viaan empirically chose heuristic. As such, the step direction inTรขtonnement, given by the result of an aggregate demandcomputation, is not guaranteed to be a descent direction ofthe heuristic. In this case, the heuristic value will alwaysincrease, no matter how small a step size is chosen. Code-notti et. al [30] show that there is a step size that alwaysbrings Tรขtonnement closer to an equilibrium. To bypass thiscase, we declare a minimum step size and ensure that if theline search would drop the step size below this minimumthreshold, Tรขtonnement takes a step anyway and progresses.Choosing a minimum size that is too small can mean Tรข-
tonnement needsmore rounds to finish its computation, but aminimum size that is too large can result in Tรขtonnement os-cillating around an equilibrium without terminating. Ratherthan encode a fixed minimum size, we run several parallelinstances of Tรขtonnement, each with a different minimumstep size.
Our implementation runs 6 copies of Tรขtonnement, threeusing volume normalization and three without. Within aset of three copies, we vary the minimum step size by afactor of 211. These numbers are chosen arbitrarily, and areal-world deployment might choose a different parameterregime. These parameters are chosen to balance inclusion of
16
a wide variety of parameter regimes (for robustness) againstCPU cache performance (for speed).
Appendix E ExtensionsWe chose not to implement buy offers because they movethe price computation problem to a (conjectured to be) in-tractable complexity class (Appendix B). It is still possiblethat Tรขtonnement could handle a small number of buy of-fers. A logarithmic demand oracle with buy offers can beimplemented analogously to our demand oracle for sell of-fers. Alternatively, one could compute prices only using selloffers, and then offer the computed prices to the buy offersas well in the linear programming step.Tรขtonnement requires the aggregate demand of the mar-
ket to satisy a property called โWeak Gross Substitutability(WGS)โ (Appendix B). Some natural types of offers (beyondsell offers) and automated agents satisfy this property. Oneoffer might sell multiple types of goods at once, or might liketo buy any one of multiple digital assets that are backed bythe same real-world currency. Note also that an agent basedon Uniswapโs [14] constant-product rule (i.e. an agent thattrades to maximize the product of its held asset amounts)satisfies WGS. It may be difficult to evaluate the actions ofmany complicated agents efficiently within the inner loopof Tรขtonnement, but an exchange could implement a smallnumber of automated market-makers to provide a baselinelevel of market liquidity.
Appendix F Continuous Approximation ofAggregate Demand
Recall that every round of Tรขtonnement computes the ag-gregate demand (Definition 4.4) of the set of open offers, inresponse to the current candidate set of prices. The goal ofTรขtonnement is to find some price ๐ such that the amountof every good supplied to the market exceeds the amountbought from the market. But when the aggregate demand๐ (ยท) is not a continuous function, such a point may not exist.
Example F.1. Suppose there are two assets ๐ด and ๐ต. Sup-pose also that one offer wishes to sell 100 units of ๐ด in ex-change for ๐ต if ๐๐ด/๐๐ต โฅ 1, and another offer wishes to sell 1unit of ๐ต in exchange for ๐ด if ๐๐ต/๐๐ด โฅ 1.
Then no matter the value of ๐๐ด/๐๐ต , one of the goods willbe overdemanded. Clearly this holds if ๐๐ด/๐๐ต ฬธ= 1. At ๐๐ด = ๐๐ต ,both offers execute, but ๐ต is still overdemanded by 99 units.
One technicality is that a specification of an exact marketequilibrium needs to to specify the behavior of offers whoseminimum prices are equal to the market prices. For example,in the above situation, setting ๐๐ด = ๐๐ต and that both offerssell only one unit (and not necessarily their entire stock) oftheir respective goods in exchange for one unit of the otherwould constitute an equilibrium.
The aggregate demand function is essentially a summa-tion of the behavior of every offer. To make the ๐ (ยท) into acontinuous function, we approximate the behavior of eachindividual offer by a continuous function.
Definition F.2 (Linearly-Interpolated `-Approximate OfferSemantics). Suppose an offer demands a minimum priceratio of ๐ผ , and the equilibrium prices give a price ratio of ๐ฝ .As in the exact case, if๐ผ > ๐ฝ , the offer does not execute, and if๐ผ < ๐ฝ(1โ `), the offer does execute in full. In the interveninggap, offers linearly interpolateโi.e. a (๐ฝ โ ๐ผ)/(`๐ฝ)-fractionof the offer executes.
Applying this approximation gives a continuous function๐` (๐). Furthermore, any point ๐ such that ๐` (๐)๐ โค 0 for all๐ constitutes a (Y = 0, `)-approximate equilibrium. AllowingY > 0 is akin to allowing ๐` (๐)๐ to be merely close to 0,instead of requiring that it be strictly below 0 for all ๐ .As mentioned in ยง4.3, the aggregate demand of a set of
offers that are all selling asset ๐ด in exchange for asset ๐ต canbe computed via a binary search, if the offers are sorted bytheir minimum exchange rates.Specifically, for a collection of offers ๐ , say that an offer
๐ฅ โ ๐ sells ๐๐ฅ units of ๐ด and demands a minimum price of๐๐ฅ , and let ๐ = ๐๐ด/๐๐ต be a queried exchange rate. A binarysearch computes ๐ธ1 = ฮฃ๐ฅ โ๐ :๐๐ฅ โค๐๐๐ฅ .To compute the aggregate demand approximation, we
need to also compute the quantities ๐ธ2 = ฮฃ๐ฅ โ๐ :๐๐ฅ โค๐ (1โ`)๐๐ฅ ,๐น1 = ฮฃ๐ฅ โ๐ :๐๐ฅ โค๐๐๐ฅ๐๐ฅ , and ๐น2 = ฮฃ๐ฅ โ๐ :๐๐ฅ โค๐ (1โ`)๐๐ฅ๐๐ฅ . With someadditional preprocessing, ๐น๐ can be computed with the samebinary search as ๐ธ๐ for ๐ = 1, 2.The aggregate demand function reduces to:
ฮฃ๐ฅ โ๐ :๐๐ฅ<๐ (1โ`)๐๐ฅ + ฮฃ๐ฅ โ๐ :๐ (1โ`)โค๐๐ฅ โค๐๐๐ฅ (๐ โ ๐๐ฅ )
๐ `
= ๐ธ2 +๐ธ1 โ ๐ธ2
`โ ๐น1 โ ๐น2
๐ `
Appendix G Synthetic Data ModelThe transactions used in our experiments are (unless other-wise discussed) drawn from the following data model. De-fault parameters, used in our experiments unless otherwisenoted, are noted in parentheses.
Transactions are generated in batches. Every transactionhas some chance of either creating an offer( 90%), sendinga payment ( 10%), or creating an account ( .1%) (and givingthat account some funds). Newly created offers have somechance (1%) of being cancelled at some point in the next fewbatches (between 5 and 50).
All create offer operations are drawn relative to the sameset of underlying asset valuations. Supposing that these valu-ations were computed as a market equilibrium, some fraction(90%) are generated to accept these valuations. These โgoodโoffers are generated in cycles; that is, the generator chooses
17
10, 10 15, 10 15, 15 20, 15Some Large Offers, 500000 Offers 0.005 0.006 0.006 0.037Some Large Offers, 50000 Offers 0.008 0.009 0.027 0.038Some Large Offers, 5000 Offers 0.006 0.007 0.048 0.106Some Large Offers, 500 Offers 0.060 0.066 1.826 1.877Outlier Prices, 500000 Offers 0.008 0.012 0.017 0.036Outlier Prices, 50000 Offers 0.012 0.014 0.030 0.053Outlier Prices, 5000 Offers 0.015 0.020 * *Outlier Prices, 500 Offers 0.033 0.037 * *10% Good Offers, 500000 Offers 0.005 0.006 0.006 0.03910% Good Offers, 50000 Offers 0.008 0.009 0.009 0.03710% Good Offers, 5000 Offers 0.008 0.008 0.035 0.05310% Good Offers, 500 Offers 0.024 0.034 0.820 1.277None Near Market Prices, 500000 Offers 0.002 0.002 0.001 0.002None Near Market Prices, 50000 Offers 0.004 0.006 0.020 0.028None Near Market Prices, 5000 Offers 0.006 0.009 0.105 0.188None Near Market Prices, 500 Offers 0.024 0.066 1.708 2.487Close to Market Prices, 500000 Offers 0.008 0.046 0.046 0.044Close to Market Prices, 50000 Offers 0.011 0.044 0.042 0.040Close to Market Prices, 5000 Offers 0.013 0.039 0.035 0.031Close to Market Prices, 500 Offers 0.030 0.037 0.078 0.073Dispersed from Market Prices, 500000 Offers 0.002 0.003 0.003 0.005Dispersed from Market Prices, 50000 Offers 0.005 0.006 0.009 0.042Dispersed from Market Prices, 5000 Offers 0.005 0.009 0.099 0.279Dispersed from Market Prices, 500 Offers 0.022 0.064 * *Uneved Trade Volumes, 500000 Offers 0.180 0.180 0.133 0.145Uneved Trade Volumes, 50000 Offers 0.169 0.170 0.129 0.152Uneved Trade Volumes, 5000 Offers 0.257 0.280 0.976 1.045Uneved Trade Volumes, 500 Offers * * * *
Figure 6. Runtimes of Tรขtonnement with varying datasets,averaged over 5 runs. (*) denotes that not all runs termi-nated within a 5 second timeout. Column headers denoteapproximation parameters, as (โ log2(Y),โ log2(`)).
a random set of assets (of size between 2 and 7) and a tradeamount, then constructs offers that send this trade amountalong the cycle. โBadโ offers are generated individually, andtrade between random assets.
The minimum exchange rate on โgoodโ offers is set to beslightly lower than the underlying exchange rate (between0% and 2% lower) and the minimum rate on โbadโ offersis set to be slightly higher than the underlying exchangerate(between 0% and 2% higher).
Offer trade amounts are randomly drawn (between 1,000and 1,000,000) and normalzed (i.e. divided) by asset price.
In the first batch, the underlying prices are random values(between 1 and 1,000). After each batch of transactions, theprices are updated according to a brownian drift formula(๐ โ ๐๐N(0,๐)) (๐ = 0.05).
Source accounts behind transactions are drawn from anexponential distribution (with parameter 10โ6).
Transactions sequence numbers are set sequentially.Transactions within a batch are shuffled. Some transac-
tions (10%) are shuffled between adjacent batches.
Appendix H Data Generation RobustnessChecks
In this section, we vary some of the parameters of our datagenerationmodel to study howTรขtonnementโs runtime varieswith the problem input distribution. Runtimes are averagedover 5 trials (where each trial has a different dataset, drawnfrom the same distribution).
Figure 6 shows Tรขtonnement runtimes after varying sev-eral different parameters within our synthetic data model.In particular:
Some Large Offers 10% of the trade offers are 10x the size(on average) of the rest.
Outlier Prices One asset has valuation set to 10,000, anotherhas valuation set to 0.1. The rest are distributed be-tween 1 and 1,000.
10% Good Offers Only 10% of the offers accept the marketprices (default is 90%)
None Near Market Prices No offers are generatedwithin 10%of the market prices.
Close To Market Prices All offers haveminimumpriceswithin1% of market prices.
Dispersed From Market Prices Offers have minimum pricesin an especially large range around the market prices.
Uneven Trade Volumes Asset volume is heavily skewed to-wards some assets (asset volume normalization is notactivated).
The first point of note is that the only parameter changewith a significant, consistent effect on Tรขtonnement runtimeis the skew in the asset volume.
As discussed, Tรขtonnement does sometimes time out. Thistends to happen when there are few offers to trade and withhigh accuracy requirements.
Tรขtonnement also seems to perform slightly better whenthere are at least some offers near to the market clearingprices. Offers near the clearing prices can give the linearprogram more flexibility. This explains the slowdown when` is very small (the rightmost column) in the โNone NearMarket Pricesโ and โDispersed FromMarket Pricesโ datasets.
Appendix I Comparison with ConvexSolver
As a comparison point for Tรขtonnement, we also imple-mented the convex program of [32] using the CVXPY li-brary [34] backed by the ECOS convex solver [36]. Tรขton-nement was run with Y = 2โ15 and ` = 2โ10.The exact runtimes are not directly comparable; the two
systems measure error in different ways, and the convexsolver is a general tool, while Tรขtonnement is specificallydesigned for one application. The takeaway from Figure 7are the scalability trends. The size of the problem given tothe convex solver scales linearly with the number of open of-fers, and unsurprisingly, so does the runtime. An asymptoticimprovement to the runtime of the theoretical price compu-tation algorithms, as we implemented for Tรขtonnement withthe logarithmic demand oracle (ยง4.3.1), is crucial for con-structing an algorithm that can run quickly on large probleminstances.
That said, it is possible that when the number of transac-tions is very small (i.e. only a few hundred), a pricing algo-rithm based on solving a convex program could outperformTรขtonnement.
18
Figure 7. Runtimes of a convex solver compared againstruntimes for Tรขtonnement. The convex program on 20 and50 asset instances performed very similarly as when run on10 asset instances. These instances are not included on thegraph for visual clarity.
Interestingly, using a similar technique (sorting offers byminimum price), the convex program of [32] can be simpli-fied to a program whose size does not scale with the numberof open offers, and whose objective can be computed intime logarithmic in the number of open offers. The catch isthat this transformation (done naively) makes the objectivenondifferentiable, and nondifferentiable objectives can bevery difficult for many convex optimization algorithms. Alogarithmic-size transformation for this program that pre-serves the differentiability (or twice-differentiability) of theobjective is an interesting direction for future work.
19