When the cookie meets the blockchain: Privacy risks of web ... · the fundamental tension between...

19
When the cookie meets the blockchain: Privacy risks of web payments via cryptocurrencies Steven Goldfeder * , Harry Kalodner * , Dillon Reisman , Arvind Narayanan * Princeton University * {stevenag, kalodner, arvindn}@cs.princeton.edu [email protected] Abstract—We show how third-party web trackers can deanonymize users of cryptocurrencies. We present two distinct but complementary attacks. On most shopping websites, third party trackers receive information about user purchases for purposes of advertising and analytics. We show that, if the user pays using a cryptocurrency, trackers typically possess enough information about the purchase to uniquely identify the transaction on the blockchain, link it to the user’s cookie, and further to the user’s real identity. Our second attack shows that if the tracker is able to link two purchases of the same user to the blockchain in this manner, it can identify the user’s entire cluster of addresses and transactions on the blockchain, even if the user employs blockchain anonymity techniques such as CoinJoin. The attacks are passive and hence can be retroactively applied to past purchases. We discuss several mitigations, but none are perfect. I. I NTRODUCTION Eight years after Bitcoin’s introduction, the ability to pay online using cryptocurrencies is common: prominent mer- chants such as Microsoft, Newegg, and Overstock support it. Cryptocurrency users tend to value financial privacy, and it is a major reason for choosing to pay with Bitcoin [1]. Yet, websites including shopping sites are known to be rife with third-party tracking [2]. In this paper, we study the impact of online tracking on the privacy of Bitcoin users. First, we show that online trackers are able to see sensitive details of payment flows, such as the identities and prices of items added to shopping carts. Crucially, in many cases they receive sufficient information about a purchase to link it uniquely to a transaction on the Bitcoin blockchain. 1 This core linkage can be expanded in both directions: based on tracking cookies, the transaction can be linked to the user’s activities across the web. And based on well-known Bitcoin address clustering techniques [3], [4], it can be linked to their other Bitcoin transactions. This basic attack can be made worse in several ways. We find that many merchant sites send even more information to trackers, such as the transaction-specific Bitcoin address. This acts as a high-entropy identifier and makes linking to the blockchain trivial. We also show that many merchants addi- tionally leak users’ PII (name, email address, etc.) to trackers, allowing trackers to link not only users’ web profiles but also blockchain transactions to their identities. Finally, malicious trackers may use JavaScript to extract Bitcoin addresses or PII 1 Throughout we study Bitcoin since it has the most support for online payments, but our findings apply to many other cryptocurrencies. from web pages even if it is not leaked to them by default. We show that this is possible on the vast majority of merchant sites. Of course, Bitcoin does not guarantee unlinkability of transactions. But while linking of a user’s Bitcoin addresses with each other is well known [3]–[6], our attack shows how to link addresses to external information, including identity. The main defense against linkage attacks is mixing [7], [8]. The best known mixing technique is CoinJoin, in which users send coins to each other in a way that hides the link between their old and new coins. Our second main contribution is showing the effectiveness of the cluster intersection attack, a previously known attack against mixing. Specifically, we show that a small amount of additional information, namely that two (or more) transactions were made by the same entity, is sufficient to undo the effect of mixing (see Figure 1). While such auxiliary information is available to many potential entities — merchants, other counterparties such as websites that accept donations, intermediaries such as payment processors, and potentially network eavesdroppers — web trackers are in the ideal position to carry out this attack. Based on the above two attacks, we present the following findings. We present a taxonomy of information leaks to trackers on e-commerce websites. We focus on leaks that allow linking a payment flow to a blockchain transaction. We compiled a list of 130 online merchants that accept Bitcoin, and analyzed their websites by extending the functionality of the open-source OpenWPM web privacy measurement tool [2]. We find that at least 53/130 of merchants leak payment information to a total of at least 40 third parties, most frequently from shopping cart pages. The vast majority of these represent intentional sharing of purchase data with third parties for advertising and analytics purposes. In addition, we find that many merchant websites have far more serious (and likely unintentional) information leaks that directly reveal the exact transaction on the blockchain to dozens of trackers. Turning to the Bitcoin blockchain, we use empirical mea- surement to estimate the uniqueness of transactions as a function of the adversary’s uncertainty about the transaction’s timestamp and value (Section V). We find that unique linkage is possible in over 60% of cases for realistic values of these pa- rameters, and that in the vast majority of cases, the anonymity set size is 5 or less. The attack degrades gracefully as the adversary’s uncertainty increases. Note that in the case of the 1 arXiv:1708.04748v1 [cs.CR] 16 Aug 2017

Transcript of When the cookie meets the blockchain: Privacy risks of web ... · the fundamental tension between...

  • When the cookie meets the blockchain:Privacy risks of web payments via cryptocurrencies

    Steven Goldfeder∗, Harry Kalodner∗, Dillon Reisman†, Arvind Narayanan∗Princeton University

    ∗{stevenag, kalodner, arvindn}@cs.princeton.edu†[email protected]

    Abstract—We show how third-party web trackers candeanonymize users of cryptocurrencies. We present two distinctbut complementary attacks. On most shopping websites, thirdparty trackers receive information about user purchases forpurposes of advertising and analytics. We show that, if theuser pays using a cryptocurrency, trackers typically possessenough information about the purchase to uniquely identify thetransaction on the blockchain, link it to the user’s cookie, andfurther to the user’s real identity. Our second attack shows thatif the tracker is able to link two purchases of the same user to theblockchain in this manner, it can identify the user’s entire clusterof addresses and transactions on the blockchain, even if the useremploys blockchain anonymity techniques such as CoinJoin. Theattacks are passive and hence can be retroactively applied to pastpurchases. We discuss several mitigations, but none are perfect.

    I. INTRODUCTION

    Eight years after Bitcoin’s introduction, the ability to payonline using cryptocurrencies is common: prominent mer-chants such as Microsoft, Newegg, and Overstock support it.Cryptocurrency users tend to value financial privacy, and itis a major reason for choosing to pay with Bitcoin [1]. Yet,websites including shopping sites are known to be rife withthird-party tracking [2]. In this paper, we study the impact ofonline tracking on the privacy of Bitcoin users.

    First, we show that online trackers are able to see sensitivedetails of payment flows, such as the identities and pricesof items added to shopping carts. Crucially, in many casesthey receive sufficient information about a purchase to linkit uniquely to a transaction on the Bitcoin blockchain.1 Thiscore linkage can be expanded in both directions: based ontracking cookies, the transaction can be linked to the user’sactivities across the web. And based on well-known Bitcoinaddress clustering techniques [3], [4], it can be linked to theirother Bitcoin transactions.

    This basic attack can be made worse in several ways. Wefind that many merchant sites send even more informationto trackers, such as the transaction-specific Bitcoin address.This acts as a high-entropy identifier and makes linking to theblockchain trivial. We also show that many merchants addi-tionally leak users’ PII (name, email address, etc.) to trackers,allowing trackers to link not only users’ web profiles but alsoblockchain transactions to their identities. Finally, malicioustrackers may use JavaScript to extract Bitcoin addresses or PII

    1Throughout we study Bitcoin since it has the most support for onlinepayments, but our findings apply to many other cryptocurrencies.

    from web pages even if it is not leaked to them by default.We show that this is possible on the vast majority of merchantsites.

    Of course, Bitcoin does not guarantee unlinkability oftransactions. But while linking of a user’s Bitcoin addresseswith each other is well known [3]–[6], our attack shows howto link addresses to external information, including identity.

    The main defense against linkage attacks is mixing [7], [8].The best known mixing technique is CoinJoin, in which userssend coins to each other in a way that hides the link betweentheir old and new coins. Our second main contribution isshowing the effectiveness of the cluster intersection attack,a previously known attack against mixing. Specifically, weshow that a small amount of additional information, namelythat two (or more) transactions were made by the sameentity, is sufficient to undo the effect of mixing (see Figure1). While such auxiliary information is available to manypotential entities — merchants, other counterparties such aswebsites that accept donations, intermediaries such as paymentprocessors, and potentially network eavesdroppers — webtrackers are in the ideal position to carry out this attack.

    Based on the above two attacks, we present the followingfindings. We present a taxonomy of information leaks totrackers on e-commerce websites. We focus on leaks thatallow linking a payment flow to a blockchain transaction. Wecompiled a list of 130 online merchants that accept Bitcoin,and analyzed their websites by extending the functionality ofthe open-source OpenWPM web privacy measurement tool[2]. We find that at least 53/130 of merchants leak paymentinformation to a total of at least 40 third parties, mostfrequently from shopping cart pages. The vast majority ofthese represent intentional sharing of purchase data with thirdparties for advertising and analytics purposes. In addition, wefind that many merchant websites have far more serious (andlikely unintentional) information leaks that directly reveal theexact transaction on the blockchain to dozens of trackers.

    Turning to the Bitcoin blockchain, we use empirical mea-surement to estimate the uniqueness of transactions as afunction of the adversary’s uncertainty about the transaction’stimestamp and value (Section V). We find that unique linkageis possible in over 60% of cases for realistic values of these pa-rameters, and that in the vast majority of cases, the anonymityset size is 5 or less. The attack degrades gracefully as theadversary’s uncertainty increases. Note that in the case of the

    1

    arX

    iv:1

    708.

    0474

    8v1

    [cs

    .CR

    ] 1

    6 A

    ug 2

    017

  • Fig. 1. An illustration of the full scope of our attack.Consider three websites that happen to have the same embedded tracker. Alice makes purchases and pays with Bitcoin on thefirst two sites, and logs in on the third. Merchant A leaks a QR code of the transaction’s Bitcoin address to the tracker, merchantB leaks a purchase amount, and merchant C leaks Alice’s PII. Such leaks are commonplace today, and usually intentional(Section IV). The tracker links these three purchases based on Alice’s browser cookie. Further, the tracker obtains enoughinformation to uniquely (or near-uniquely) identify coins on the Bitcoin blockchain that correspond to the two purchases.However, Alice took the precaution of putting her bitcoins through CoinJoin before making purchases. Thus, either transactionindividually could not have been traced back to Alice’s wallet, but there is only one wallet that participated in both CoinJoins,and is hence revealed to be Alice’s.

    unintentional leaks mentioned above, there is no uncertainty,and unique linkage is always possible.

    Next, we evaluate the efficacy of the cluster intersectionattack against CoinJoin (Section VI; in Section VIII we discussthe applicability to other types of mixing). By identifyinga corpus of 78,697 CoinJoin transactions on the Bitcoinblockchain over a two-year period, we present realistic simu-lations of a victim who mixes coins from her wallet and thenmakes payment transactions that are observed by the adver-sary. For example, if the victim employs 3 rounds of CoinJoinand the adversary observes two of the victim’s payments, hecan link them back to her wallet (despite mixing) with 98%accuracy. Multiple rounds of mixing increase privacy, but thosegains are quickly stripped away if the adversary observes morethan 2 payments.

    Finally, we evaluate our attack end to end (Section VII). Wemade 21 purchases on 20 merchant websites. For 11 of thesepurchases, we used freshly mixed coins to attempt to deterlinkage. There were 25 pairs of purchases made with mixedcoins for which there was at least one tracker that receivedleaked data about both purchases. We find that in 20 of these25 cases, the tracker can identify the user’s wallet despite theuse of mixing.

    Our attack highlights the dangers of pervasive web tracking:Bitcoin is often used for sensitive activities, making thecompromise of Bitcoin privacy a far more serious threat thantargeted advertising. In Section VIII we discuss mitigationsthat merchants can deploy. None is a complete solution, giventhe fundamental tension between privacy and the analyticsneeds of modern e-commerce. Indeed, most of the privacy-

    breaching data flows we identify are intentional and notaccidental (Section IV).

    The main self-defense available to users today is to usetracking-protection tools such as Ghostery or uBlock Origin,but we note several limitations. First, since our attack is pas-sive, trackers have already accumulated data in their logs thatenable them to retrospectively carry out the attack. Second,tracking protection tools aren’t perfect and contain both falsepositives (resulting in broken functionality) and false negatives(resulting in missed trackers). In Section IV we show thateven with tracking protection enabled, 25 merchants still leaksensitive information to third parties. Third, merchants, pay-ment processors, and even network eavesdroppers are potentialadversaries for some of the attacks we describe, and trackingprotection does not help against these adversaries. Finally, inSection VIII we also discuss how our techniques can aid lawenforcement investigations.

    II. BACKGROUND AND RELATED WORK

    Our work brings together two previously unrelated areas ofprivacy research: web tracking and anonymity of cryptocur-rencies. We describe each in turn.

    Online tracking. Since the web’s inception, the number ofthird parties that track and record user activity has exploded.[9]–[12]. In this paper we use the terms third party and trackerinterchangeably. Some trackers have a substantial view ofusers’ activities across the web: Google, for instance, hasa tracking presence on roughly 80% of sites [13]. Trackingmethods have also become more sophisticated over time [14]–

    2

  • [18]. The effectiveness of tracker-blocking tools has beenstudied by various authors [19]–[21].

    Some trackers like Google and Facebook are known totie their tracking profiles to identities directly disclosed byusers, but most trackers have no direct relationship with users.However, even such trackers acquire PII, often accidentally.Various studies starting in 2009 have shown that the leakageof PII from first parties to third parties is rampant [22], [23],and the problem remains severe today.

    Most trackers are legitimate businesses, but are known touse intrusive means to track users. These include misuseof HTML5 APIs for fingerprinting, such as Canvas, AudioContext, and Battery Status [2]; cross-device tracking [24];workarounds to browser privacy features [25], and sniffingdata from unsubmitted forms [26]. Many trackers have poorsecurity on their servers and are a target for compromise formalvertising and other purposes [27], [28].

    The problem of trackers observing shopping and paymentflows is unlikely to go away. Consider retargeting, which isthe ability to serve ads to users for items they are known tohave shown an interest in purchasing. It is one of the mostvaluable forms of advertising [29]. The farther into a paymentflow a tracker can observe a user (cart page, checkout page,etc.) the greater the interest signaled. Another major benefit isconversion tracking of ad campaigns. Having trackers on thepayment flows is needed to help analyze whether a user whowas served an ad actually follows through with a purchase.Other applications include fraud/abuse detection and consumerinsights.

    Cryptocurrencies. In Bitcoin-like cryptocurrencies, userspay by broadcasting transactions to a peer-to-peer network.Transactions are signed statements authorizing transfers fromone address to another. Addresses are public keys that act aspseudonymous “account” identifiers. Transactions are recordedin an immutable, global ledger called the blockchain [30], [31].

    Address clustering and mixing. It is trivial to generate newBitcoin addresses, and most wallet software takes advantageof this feature to improve user privacy. In the normal courseof operation, users end up with coins split between numerousaddresses, and it may not be obvious which addresses belongto the same user (or entity). However, there are well knownand well understood attacks to infer links between suchaddresses [3]–[6]. These techniques have been improved uponand implemented by companies such as Chainalysis and madeavailable via easily accessible APIs. Address clustering is notperfect, but it is a powerful attack, and wallet addresses mustbe considered clusterable unless additional privacy-protectiontechniques are employed to break the link between thoseaddresses.

    Many such privacy-protection techniques are known [32];the ones readily deployable on existing Bitcoin-like cryptocur-rencies are all variants of the idea of mixing. The best knownand most used technique is known as CoinJoin [8], [33], inwhich different users coordinate in order to jointly create atransaction that spends a coin of equal value from each ofthem, and from which each of them receives a coin of the same

    value. The order of outputs is randomly permuted so that themapping between inputs and outputs cannot be deduced fromthe public blockchain. Services such as JoinMarket providethe ability for users to coordinate to mix their coins [34].

    CoinJoin improves unlinkability by breaking the multi-inputheuristic, one of the main heuristics used in address clustering.However, the susceptibility of CoinJoin (and other mixingtechniques) to clustering has not yet been rigorously studied.It is known that CoinJoin transactions are at least detectableas such, since they involve many inputs and outputs with thesame value, a highly unlikely pattern in a regular paymenttransaction. In other words, CoinJoin improves anonymity butdoes not provide unobservability [34].

    Intersection attacks date back to the communicationsanonymity literature and are well known. Their applicability tocryptocurrency mixing is also generally understood. At leasttwo papers mention it explicitly [35], [36], but they focus onmix participants and other intermediaries as adversaries. A2015 blog post also mentions the attack [37]. What’s new inour work is the idea that auxiliary information to link differentmixed coins is readily available to web adversaries (as opposedto behavioral patterns in earlier work, which is a much lessreliable linkage mechanism). Further, we are able to empiri-cally evaluate the attack using recently proposed techniquesfor identifying CoinJoin transactions on the blockchain [34](Section VI).

    Other research on cryptocurrency privacy and forensics.Gervais et al. present an intriguing attack on e-commerce pur-chases using cryptocurrencies: since prices are denominatedin local currencies, and are usually close to integer multiplesof the unit of currency, blockchain transaction amounts couldreveal the currency and hence the location of the purchase [38].Our work is complementary; their attack is stronger than oursin that the adversary can be anyone examining the blockchain,whereas our attack is stronger in the sense that much moreinformation is leaked, and not just the location.

    Another major route to compromise of cryptocurrency pri-vacy, orthogonal to ours, is the linkage of transactions to thesender’s IP address. An adversary who is well connected tothe Bitcoin peer-to-peer network might be able to do so [39],[40]; even users who connect to the Bitcoin network over Torare potentially vulnerable [41]. In response to these attacks,Bitcoin Core changed the protocol for how transactions aredisseminated across the network in 2015. However, recentwork showed weaknesses in the updated protocol [42]–[44]. Are-designed P2P networking protocol with strong anonymityguarantees has been proposed [45], but not yet adopted by anycryptocurrency.

    In concurrent work, Portnoff et al. explore a techniquesimilar to our transaction linkage attack [46]. In their work,linkage is a forensic technique to help identify entities behindillegal activities (sex trafficking). It is enabled by a specificfeature of a specific website, backpage.com: classified ads paidfor by users are posted on the website along with an accuratetimestamp. This allows anyone (e.g., researchers, NGOs, lawenforcement) to link an ad to the transaction on the bitcoin

    3

  • blockchain that represents the payment for the ad. In ourwork, linkage can be carried out only by specific entities,such as trackers, but we extend the linkage via cookies, PII,and blockchain analysis, none of which are applicable to thesetting of Portnoff et al. Of course, their work can be viewedas a demonstration of a privacy breach affecting Backpageusers, including the majority not engaging in illegal activities;similarly, our attack can be turned around into a forensictechnique (Section VIII).

    III. THREAT MODEL AND ATTACKS

    Merchant, payment processor, and trackers. A typicalcryptocurrency-based e-commerce flow consists of a user, amerchant, a payment processor, and one or more trackers.The merchant is the website where the user is shopping. Mostmerchants make use of payment processors such as BitPay andCoinbase to handle the details of processing cryptocurrencies.When the user pays with Bitcoin or another cryptocurrency,the transaction is received by the payment processor, whothen usually credits the merchant’s account with an equivalentamount of dollars or other local currency. Trackers are “thirdparties” on web pages, often invisible, that track users’ actionsfor purposes of advertising, analytics and so on (Section II).Doubleclick, Google Analytics, and Facebook are commonexamples. Merchants, payment processors, and trackers areall potential adversaries in our attack, although we are mostinterested in the latter.

    Information flows to third parties. Users take actions onshopping sites such as logging in, viewing items, adding itemsto their cart, checking out, and making a payment. See Figure2 for an illustration of a typical payment flow on a merchantsite. The more of these actions a third party learns about, themore feasible the attack. The types of information useful tothe third party are:• Payment timestamp: the third party learns the approxi-

    mate payment time simply by virtue of being embeddedon merchant website, especially on pages constituting thecheckout process. Checkout pages usually require the userto complete payment (i.e., broadcast the cryptocurrencytransaction) within a short time window, typically 15minutes. Trackers embedded on payment receipt pagesare in an even better position, as they learn the paymenttime to within a few seconds. Note that assuming the userincluded a reasonable transaction fee, payment processorsconsider payments received as soon as the transaction isbroadcast to the peer-to-peer network and received bythe payment processor’s node. This involves a latencyof only a few seconds. The transaction may not beconfirmed until it is incorporated into the blockchain,which may take tens of minutes depending on the degreeof confirmation that the payment processor requires. Thetransaction confirmation time is largely irrelevant to ourattack.

    • Payment address: the payment address is the destinationto send coins. Recipients (payment processors) will typ-ically generate a fresh address specific to the transaction

    — new Bitcoin addresses are trivial to generate. Althoughthere is no business reason for trackers to receive thepayment address, we find that this does happen often(Section IV). Since payment addresses are unique, at leastwithin the time scale of interest to us, a leak of thepayment address trivially allows the tracker to link theweb user to the blockchain transaction.

    • Price: Depending on the merchant website, trackers maybe able to see the prices of items viewed by users,items added to the cart, or even the final price aftershipping and taxes have been calculated. Note that theseprices are almost always denominated in USD, EUR, oranother fiat currency, even on websites that only acceptcryptocurrencies as payment, due to the extreme volatilityof cryptocurrency exchange rates. However, once theuser checks out, the amount is calculated in BTC (oranother cryptocurrency) based on the exchange rate atthat instant. In some cases, this BTC-denominated priceis also revealed to the tracker, which is more useful forlinking than the price before applying the conversion.

    • Personally Identifiable Information (PII): By PII we meanany information related to the user’s real identity oraccount on the merchant website, such as name, emailaddress, username, and shipping address. Trackers’ accessto PII exacerbates the attack. In this paper we analyzeleaks of PII from merchant websites to trackers, but weemphasize that since trackers are widely present on theweb, the link to PII can be acquired on any websitewhatsoever. Leaks of PII to trackers are known to berampant across the web (Section II).

    In our measurements in Section IV, we focus on passiveattacks where trackers obtain this information in the normalcourse of operation. Except for (some) PII leaks, most otherinformation flows to trackers are intentional: trackers use thisinformation for advertising and analytics purposes. However,we note that in many cases, tracking scripts are in a positionto carry out an active attack and extract all of the aboveinformation from web pages even if they don’t obtain itpassively. This is because third-party scripts are typicallyembedded without any isolation, in a way that gives themfull access to the content on the page. Sandboxing techniquessuch as iframes are readily available, but only infrequentlyemployed since they interfere with some of the functionalityprovided by trackers.

    Attack 1: single transaction linkage. In this attack, theadversary (tracker) seeks to link a web user (as identifiedby the user’s cookie or PII) to a transaction on the cryp-tocurrency blockchain. The merchant and payment processorare not interesting adversaries for this attack, because it isunsurprising that they can carry out this linkage (but seeSection IV-C). We assume that the user is aware of thispossibility, and potentially takes necessary precautions, such asmixing to unlink the transaction on the blockchain from herother blockchain transactions and addresses. Attack 2 seeksto overcome such defenses. But the tracker’s ability to link

    4

  • Fig. 2. An illustration of a typical payment flow on a merchant site. Each step of this flow presents opportunities for leaking transaction-relevantinformation to embedded third-party trackers.

    to even a single transaction is a privacy breach, because theuser has no business relationship with the tracker and manyusers are in fact unaware of the existence of trackers (or atleast their prevalence and sophistication). It is also worrisomebecause trackers compile profiles of users’ activities across theweb.

    If the tracker has access to the receiving address, it triviallyenables linkage, as noted above. The more interesting case iswhen the tracker knows the approximate price and time. Thenthe tracker’s task is to search the logs of transactions that werebroadcast to the peer-to-peer network to identify those that fallwithin the window of uncertainty both in terms of transactionvalue and time. To quantify the tracker’s success, then, wemust model the uncertainty in the tracker’s knowledge of priceand time.• Price uncertainty: The tracker’s uncertainty around price

    arises primarily from shipping. If the tracker knows theadversary’s location (either based on a leak of PII orbased on IP address), this uncertainty can be minimized,although there might still be a small number of possiblevalues of the shipping fee based on the shipping speedselected by the user.

    • Exchange rate uncertainty: The second source of uncer-tainty is the exchange rate: the tracker usually sees pricesdenominated in USD (or another fiat currency) and not inBTC. Most payment processors use exchange rates basedon trading data publicly released by an exchange, whichmeans the tracker can always reconstruct the exchangerate at any given point in the past. However, since tradeshappen several times per second, the exchange rate variesrapidly and hence some uncertainty will still remain.

    • Payment time uncertainty: this arises because of thegap between the user checking out, the user’s walletbroadcasting the transaction, and that broadcast beingrecorded by the adversary or another node. The adversarymay run his own peer-to-peer Bitcoin node, or maysimply obtain the transaction broadcast timestamp frompublicly available sources such as blockchain.info. If thetracker is present on the transaction receipt page, then thelatency is minimized, and is of the order of the networkpropagation delay, i.e., a few seconds.

    Attack 2: Cluster intersection. This is a complementaryattack where the adversary aims to identify the cluster ofaddresses in the victim’s Bitcoin wallet. Recall from SectionII that wallets can (and do) easily create numerous addresses,

    but in the normal course of operation these addresses can stillbe linked together via various heuristics. Mixing techniquessuch as CoinJoin are thought to protect against such linkage,although this has not yet been studied rigorously. We assumethat the victim uses a desktop (local) wallet rather than anonline wallet provider. Privacy-conscious users tend not touse online wallets, since that would allow the wallet providerto trivially track all of the user’s activities. We also assumethat the user employs effective communications anonymitytechniques to mask the IP addresses of their wallet, as that isa well-known way for anonymity to be compromised (SectionII).

    In our attack, the victim interacts with the adversary multi-ple times. The adversary could be a merchant, payment pro-cessor, or (especially) a tracker who only indirectly observesthe victim. Knowing that the adversary might learn one of hisaddresses, the victim employs mixing to prevent the adversaryfrom learning the rest of his addresses and transactions. Hedoesn’t spend coins directly from his wallet, but only afterfirst mixing them. In Figure 1, after the victim has shoppedon merchantA.com the adversary is unable to determine whichof the three wallet clusters belongs to the victim. But after asecond interaction with the same victim on merchantB.com,the adversary simply finds the intersection of the two sets ofclusters, which leads him to a unique cluster.

    Web trackers passively observe users’ web purchases andare able to link them together, via cookies or device fin-gerprinting, even if the merchant and payment processor aredifferent in every case. Thus, this attack is complementary toAttack 1, and would take as input two blockchain transactionsidentified via Attack 1. Note that even if Attack 1 is imperfect,and returns a set of transactions instead of a single one, Attack2 will still succeed. The intersection size rapidly decreases asa function of the number of observations, and even if twoobservations aren’t sufficient to uniquely identify the wallet,it is likely that a small number of additional observations willsuffice. We quantify this in Section VI.

    IV. WEB MEASUREMENT: LEAKS OF SENSITIVE DATA

    In this section we analyze leaks of sensitive data on mer-chant sites. In sections V and VII we examine how trackerscan actually use this data to identify transactions on theblockchain. We also show in this section how trackers canconnect this information to users’ identities.

    5

  • A. Method

    To identify leaks of sensitive data, we performed a webcrawl of popular merchants that accept Bitcoin. To create a listof merchant sites, we began by combining popular community-maintained lists of merchants [47], [48], which gave us 1438sites. We then pruned the list to those domains that werefound in the Alexa top 1 million websites, which left 283sites. As we crawled the sites, we discovered that about halfof the merchants no longer accepted Bitcoin. This left 130merchants in our crawl that accepted Bitcoin at the time ofour measurements, and we focus on these 130 sites. Thesemerchants were geographically distributed over 21 countries,with 64 based in the United States and 20 based in the UnitedKingdom.

    Typical merchant payment flows allow us to complete mostof the steps — viewing products, adding them to the cart,initiating checkout, and receiving a payment address and price— before actually having to make a payment. This allowed usto collect data on almost the entire payment flow on a largenumber of websites. However, to analyze payment receiptpages, we need to actually make purchases. Thus we analyzetransaction receipt pages on a smaller scale based on actualpurchases. We made purchases from 20 distinct merchants intotal.

    To collect data on web tracking we used a modified versionof the open-source web privacy measurement tool OpenWPM[2]. Using the tool we collected all HTTP(S) requests andresponses. We also manually marked any PII and payment-related information that we encountered on the pages wevisited; we added functionality to the tool to automaticallyrecord this information when marked.

    Throughout our measurements, we are interested in theprivacy risk both for a regular user and for a user who employstracking-protection tools. Most such tools (e.g., Adblock Plus,uBlock Origin) use standard, community-maintained filterlists: EasyList and EasyPrivacy. To measure the privacy-riskfor users of tracking protection lists, we simply re-run ouranalysis after deleting those third-party URLs in our crawldatabases that appear in the lists.

    B. Findings

    Based on our measurements of 130 Bitcoin-acceptingmerchants, we found numerous third parties that receivetransaction-relevant information by virtue of their businessrelationship with the merchant in the normal course of atransaction. We define transaction-relevant information as anyinformation that could help identify the transaction on theblockchain. These potential adversaries could retroactivelyperform the transaction-linkage attack using data alreadypresent in their HTTP logs or databases. We measure in-formation received through either unintentional leakage, viathe referer field of an HTTP GET or POST request [23], orthrough intentional information sharing via an HTTP POSTaction or a GET URL parameter.

    BitPay 70Coinbase 24Coinpayments 3Stripe 2Other 31Total 130

    TABLE IPREVALENCE OF PAYMENT PROCESSOR IN 130-SITE CRAWL

    Info type w/o protection w/ protectionNon-BTC-denominated price,incl. shipping 24 12

    Non-BTC-denominated price,pre-shipping 23 5

    Non-BTC-denominated price,either 43 16

    Bitcoin address 12 12Bitcoin price 11 9Bitcoin address or price 17 15Add-to-cart events 28 2Total merchants sharinginfo 53 25

    TABLE IINUMBER OF MERCHANT SITES SENDING TRANSACTION-RELEVANT

    INFORMATION TO THIRD PARTIES, WITH OR WITHOUT TRACKINGPROTECTION

    1) Third parties that receive Bitcoin address or BTC-denominated price: Almost all merchants use third-party pay-ment processors; this helps them avoid the security, volatility,and legal risks of receiving and holding bitcoins. Table I liststhe prevalence of different payment processors in our crawlof bitcoin-accepting merchants. The payment processor eithersits in an iframe on the checkout page, or on a separate pagein the payment flow. The frame or page will display the exactBitcoin amount the user should send, to an address controlledby the payment processor.

    We found that 17 of the 130 Bitcoin-accepting merchantwebsites send the receiving Bitcoin address or BTC-denominated price to a third party (Table II). With thisinformation, linking the payment to the blockchain is trivial.The leaks were found on less-popular payment processors andwebsites that implement their own Bitcoin payment process-ing.

    We can also break it down by third parties instead ofmerchants: see Table III. In both tables, we also present thecorresponding measurements for users of tracking-protectiontools (via the EasyList and EasyPrivacy2 filter lists). Those re-sults are presented in the “with protection” columns. Table VIin Appendix B lists the third parties that receive transaction-relevant information despite the use of tracking protection.

    On 11 out of the 12 websites that leak the Bitcoin address,the leaks were to third-party services that render QR codesto facilitate payment. Providing a QR code encoding thepayment recipient’s Bitcoin address and the Bitcoin pricemakes payment easier for the user. QR-code generator servicesgenerally work by accepting a GET request with the dataencoded as a query parameter, and returning the rendered

    2https://easylist.to/

    6

  • Info type w/o protection w/ protectionNon-BTC-denominated price,incl. shipping 29 11

    Non-BTC-denominated price,pre-shipping 18 3

    Non-BTC-denominated price,either 38 13

    Bitcoin address 5 4Bitcoin price 4 2Bitcoin address or price 9 6Add-to-cart events 9 2Total third parties receivinginfo 40 13

    TABLE IIINUMBER OF THIRD PARTIES RECEIVING TRANSACTION-RELEVANT

    INFORMATION, WITH OR WITHOUT TRACKING PROTECTION

    QR code image. Thus, transaction-relevant information iscontained in the GET request (e.g., https://blockchain.info/qr?data=bitcoin://[address]?amount=[price]&size=180). If theQR-code generator service stores HTTP requests in their logsthey will have passively collected sufficient information toperform the blockchain analysis attack. We saw three domainsproviding this service: chart.googleapis.com, qrserver.com,and blockchain.info. Three payment processors in particularuse chart.googleapis.com to generate QR codes: coingate.com,litepaid.com, gourl.io.3

    We made purchases on a subset of 20 merchants websites,which allowed us to examine trackers on payment receiptpages. Table IX in Appendix B presents the number of thirdparties found on each merchant’s payment receipt page, andthe number of third parties that also receive transaction-relevant information in the course of the payment flow. Em-bedded third parties are common on receipt pages: in total,there were 245 third parties on the 20 merchant receiptpages we visited.

    We found further serious leaks of sensitive information onsome of these pages. In particular, the payment processorCoinbase redirects to receipt page URL on the merchant web-site (such as https://www.overstock.com/bitcoinprocessed/?...),and appends to this merchant URL a long string of queryparameters that include the Bitcoin payment address. If theresulting payment receipt page embeds third parties, thenthe merchant will (likely inadvertently) leak the paymentaddress via the HTTP referer header. We found this to bethe case on multiple merchant websites that use Coinbase.The overstock.com receipt page alone leaked the paymentaddress to 42 distinct third parties via this referer leakage.

    Additionally, we found that many merchant websites leakpayment processor invoice page URLs to third parties. This isa different type of leak from the one in the previous paragraph.The URLs themselves do not contain sensitive information,but the contents of the invoice pages do, in the case of bothCoinbase and Bitpay. In both cases, the invoice page is notprotected by access control and the content can be viewed by

    3Google’s policy is to retain these log for 2 weeks, for debugging anddevelopment purposes [49]. We could not find the retention policies for theother two service providers.

    anyone who has the URL. Of the sites from which we madepurchases, 12 of the 20 merchants included leaks of theseURLs to a total of 25 third parties.

    2) Third parties that receive non-BTC-denominated cartprices: The cart page displays each product in a user’s shop-ping cart, along with the non-BTC denominated (e.g., USDor EUR denominated) subtotal of the cart. This subtotal willoften exclude taxes and shipping. The user is often directedto enter their shipping address at the following checkout page,which will then calculate the shipping fee and add it to thecart subtotal.

    From our crawl data we identify a second set of thirdparties that receive the non-BTC-denominated cart price. Ifthe received cart price is missing shipping and handling,it will increase the adversary’s uncertainty about the BTC-denominated price (see Section V). As seen in Table II, 43out of the 130 bitcoin-accepting merchants we visited sendsome form of non-BTC-denominated cart price data tothird parties — many more than share BTC price or addresswith third parties.

    Based on the type of HTTP request that sent the transaction-relevant information to a third-party, we can categorizewhether the sharing of data was intentional or unintentional.We consider an unintentional data leakage a sharing of datathat happens solely via referer leakage. While it is possiblethat the third party parses the referer for the price informationon the backend, we find it is useful to separate these casesfrom an intentional sharing of data. An intentional share witha third party means that the price was sent in the URLof a GET request or the body of a POST request — inthat case, the request was intentionally constructed. In ourcrawl, we found that the overwhelming majority of requeststhat shared transaction-relevant data with a third partywere intentional: of the 312 requests we observed on 53merchant sites sharing transaction-relevant information, 295intentionally shared data.

    To perform the cluster intersection attack, an adversary musthave transaction-relevant information for at least two separatepurchases. A third party positioned on more than one websiteis in a better position to have the necessary data. Table VII inAppendix B contains the prevalence of third parties receivingtransaction-relevant information that appeared on at least twomerchant sites. As one might expect, Google Analytics andFacebook are pervasive.

    3) Third parties receiving product page visits: At mini-mum, on an e-commerce site a product page will displaythe non-BTC denominated price of an item that a user canpurchase. A product page will also often embed resourcesfrom many third-party domains. As an illustrative example, theproduct page https://missionbelt.com/collections/solid-color-40mm-belts/products/vader-40, includes resources from 31distinct third-party domains.

    Thus, third-party trackers could infer the user’s cart subtotalbased on the product pages they visit. In our data we foundseveral examples of third parties that not only see the productpages a user visits, but know exactly when the user adds an

    7

    https://blockchain.info/qr?data=bitcoin://[address]?amount=[price]&size=180https://blockchain.info/qr?data=bitcoin://[address]?amount=[price]&size=180chart.googleapis.comqrserver.comblockchain.infocoingate.comlitepaid.comgourl.iohttps://www.overstock.com/bitcoinprocessed/?...https://missionbelt.com/collections/solid-color-40mm-belts/products/vader-40https://missionbelt.com/collections/solid-color-40mm-belts/products/vader-40

  • item to their cart. In total, 28 bitcoin-accepting merchantsin our crawl shared add-to-cart events with third parties.The most common third party to receive this information wasFacebook, which received add-to-cart events on 26 merchants’sites. Being able to receive add-to-cart events is essentiallyequivalent to being able to see pre-shipping cart prices. Onlytwo of those merchant sites send add-to-cart events to thirdparties if the consumer uses browser tracking protection.

    4) Third parties receiving transaction timing: As discussedin Section V, an adversary needs to know the approximatetiming of a Bitcoin transaction in addition to its value. Thereare several ways in which third parties already have thisinformation stored in logs.

    As discussed earlier, when a user completes a Bitcoin trans-action, the payment processor typically redirects the user backto a receipt page. Any third parties loaded on the receipt pagewho had previously seen transaction-relevant data can then usethe receipt page load-time as the approximate timestamp of thebitcoin transaction (except during periods of anomalously highnetwork load). In our data, 245 distinct third-party domainshad resources loaded on merchant receipt pages on sites forwhich we made a purchase. 16 out of 20 of those merchantsites embedded on the receipt page at least one thirdparty which had previously received transaction-relevantinformation. This knowledge makes blockchain linking mucheasier. A more detailed breakdown of third parties by merchantreceipt page can be found in Table IX in Appendix B.

    Even without seeing a bitcoin transaction receipt page, thirdparties could still estimate the time a transaction took placebased on whether or not they know when a user starts thecheckout process. From our data we measured the extentto which Facebook trackers, for instance, explicitly track auser through their API’s “InitiateCheckout” event. TheFacebook InitiateCheckout event was found on 15 sitesof the 130 bitcoin-accepting merchants we visited in our crawl.

    5) Third parties receiving PII: A leak of transaction-relevant information coinciding with a leak of PII allows theadversary to attach a real world identity to a Bitcoin address.In our crawl, 49 bitcoin-accepting merchants leak someform of PII to a total of 137 third parties. Table IV liststhe number of merchants that share each type of PII with athird party. Table VIII in Appendix B lists the number of thirdparties receiving each type of PII.

    We also found that 21 third parties that receive PII alsoreceive transaction-relevant information. Those third partiescan conduct the blockchain analysis attack and add a label tothe resulting cluster.

    C. Other adversaries

    Some entities do not yet collect the necessary informationto conduct the blockchain attack, but could be in the positionto collect the information through an active attack.

    1) Active attacks by third parties: Third-party JavaScripthas access to the complete DOM of whatever frame the scriptis embedded in: if that frame contains a piece of transaction-

    PII type w/o protection w/ protectionemail 32 25firstname 27 20lastname 25 19username 15 12address 13 9name 11 4phone 10 4company 5 4Merchantssharing PII 49 38

    TABLE IVNUMBER OF MERCHANTS SHARING EACH TYPE OF PII, WITH OR

    WITHOUT TRACKING PROTECTION

    Bitcoin address 31Non-BTC-denominated price 104Bitcoin price 30Total sites leaking transaction-relevant info 107

    TABLE VACTIVE ATTACKS: NUMBER OF MERCHANTS THAT ALLOW THIRD-PARTY

    SCRIPT ACCESS TO TRANSACTION-RELEVANT INFORMATION

    relevant data, then a script that turned malicious could go outof its way to collect the data if it wasn’t already collecting it.

    OpenWPM allows us to match third-party scripts to thepieces of information they can read. In short, 107 sites inour crawl grant third-party scripts access to transaction-relevant information (Table V). The most-prevalent thirdparty by far is google-analytics.com, which appears to bewidely trusted: for example, a google-analytics.com script isoften the only script found on Coinbase’s payment processingpage.

    Potential third-party access to PII was even more prevalent:on the 130 bitcoin-accepting merchants we crawled, 125merchants granted third-party scripts access to some formof PII. This included the scripts of payment processors thatmay not otherwise receive PII.

    2) Network adversaries: While most of the merchants wevisited in our crawl use HTTPS, some failed to do so. In ourcrawl, we found 36 merchants that did not use HTTPS forcrucial parts of the payment flow, including cart pages. Anetwork adversary could thus see the page load in cleartext,and parse transaction-relevant information or PII from thepage.

    3) Payment processors: While users are arguably awarethat payment processors will receive transaction-related infor-mation in the course of a payment, they probably do not expectthat payment processors receive PII, since creating an accountwith the payment processor is generally not required. Yet, wefound that at least 24 merchants share some form of PIIwith BitPay, even though BitPay does not require that themerchant send them PII.

    4) Merchants: Conversely, users may expect that whilemerchants necessarily receive PII, they may not be able toeasily identify the transaction on the blockchain. But we mademerchant accounts with BitPay and Coinbase, and found thatthey both share the full details of the Bitcoin transaction withthe merchant.

    8

  • V. BLOCKCHAIN ANALYSIS: ESTIMATION OF LINKABILITY

    Having shown that trackers obtain payment-related infor-mation from online purchases, we now present empiricalanalyses of the Bitcoin blockchain to show that trackers canuse this information to uniquely identify the transaction on theblockchain. We reiterate that many trackers will have no needfor the techniques in this section since merchants share uniquetransaction-specific information with them.

    A. Blockchain analysis infrastructure and algorithms

    First we describe our blockchain analysis tool, BlockSci,and the algorithms that we used for the results reportedthroughout the rest of this paper. We plan to release BlockScias open-source software.

    Overview. BlockSci is an in-memory blockchain graphdatabase and query interface. A pre-processing step con-verts Bitcoin’s blockchain and several other Bitcoin-likeblockchains into a uniform and optimized graph representa-tion. BlockSci exposes a Jupyter notebook interface to the an-alyst, and analyst queries are written in Python. The interfaceis object-oriented for expressiveness, but the Transactionobjects are not instantiated in memory when the blockchainis loaded. Instead, they are instantiated “just in time” whenaccessed.

    Features. We describe some features of BlockSci that arerelevant to the analyses reported in this paper.• General graph queries: for example, we can conveniently

    express queries of the form “find all transactions at adistance ≤ d from transaction T in the graph.”

    • Identifying CoinJoin transactions: The CoinJoin mixingtechnique aims to provide unlinkability, but not unob-servability. Identifying which transactions represent Coin-Joins is a key part of our attacks. Our algorithm is adaptedfrom [34]; see Algorithm 3.

    • Address clustering: There are well-known techniques foridentifying address clusters belonging to a single walletas long as mixing techniques are not used. We adaptour clustering algorithm from [3], but linking addressesconnected by a CoinJoin transaction. This allows us toavoid minimize positives. See Algorithm 1.

    • P2P network data collection: Some transactions broad-cast on the P2P network may never be included in theblockchain (for example, if the transaction fee provided istoo low). BlockSci includes a network daemon that moni-tors and logs all broadcast transactions. This informationis necessary for the adversary to minimize transactiontime uncertainty (Section III). Note that the adversarydoesn’t necessarily need to collect the data himself;broadcast timestamps for all historical transactions areavailable from public sources such as blockchain.info.

    • Exchange rate data collection: BlockSci incorporates ex-change rate data for various cryptocurrency / fiat currencypairs of interest, and exposes them via the same interface.This information is also necessary for the adversary in ourattacks.

    • Support for altcoins: BlockSci supports a range ofblockchains, which are currently Bitcoin, Namecoin, Lite-coin, and Dash. It is straightforward to support at leastthe eight other Bitcoin-derived blockchains supported bythe custom parsing tool that we use. Blockchains withcomplex scripts, notably Ethereum, are out of scope. Mostof our analyses in this paper were done on the Bitcoinblockchain, but we do present an analysis for Litecoin(Section V-C).

    B. Method

    We seek to answer the question: for a given level of priceuncertainty, exchange rate uncertainty, and transaction timeuncertainty, what is the distribution of anonymity set sizes ofthe transaction? The anonymity set contains candidate trans-actions, one of which represents the actual payment. Based onthe anonymity set sizes, we also compute adversary successprobabilities. Throughout, we average our measurements overa set of prices (obtained from actual sites, as described below),and a set of random points in time over a two-year period frommid-2015 to mid-2017.

    Exchange rate data. Recall from Section II that paymentprocessors use price data from exchanges, which is alsoavailable to adversaries. In our measurements, we use publiclyavailable historical data from BitStamp made available bybitcoincharts.com.4 The data contains the prices of all tradesexecuted on the exchange, starting in September 2011. As ofJune 2017 it contains 11.6 million trades. During the timeperiod of interest to us it contains about 4.4 million trades, orabout 4.2 trades per minute.

    Sampling prices. To obtain a representative sample ofprices of user purchases, we sampled 100 item prices fromour dataset of merchants. We sampled merchants randomlyand then sampled items randomly from those listed on thehomepages of those merchant websites, ensuring a maximumof 10 from any one merchant. When converted to USD, theprices ranged from a minimum of 1.52 to a maximum of359.00, with a mean of 51.27 and a median of 24.99.

    Sampling actual prices is important, because the distributionof values of e-commerce payments is different from that ofother transactions on the blockchain. For example, prices areoften close to integer multiples of the currency of account(USD, EUR, etc.) [38]. Therefore, if we sample prices directlyfrom blockchain transactions, we might obtain unrepresenta-tive results.

    Sampling times. We pick 100 random timestamps fromour time period of interest. By picking these randomly insteadof periodically, we ensure that there are no patterns such asspecific times of day or day of week.

    Modeling price uncertainty. As we showed in Section IV,the adversary typically sees the USD (or other local currency)price in the shopping cart, but only some trackers see the finalprice after shipping and any taxes. By analyzing the behaviorof various merchants, we make the following key observation:

    4https://api.bitcoincharts.com/v1/csv/

    9

    https://api.bitcoincharts.com/v1/csv/

  • 1 2 3 4 5 6 7 8 10Anonymity Set Size

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    Fract

    ion o

    f to

    tal

    Fig. 3. Anonymity set size of the single-transaction linkage attack that aimsto link a web transaction to the blockchain. Here the price uncertainty setsize is 5, the payment time uncertainty is 15 minutes, and the exchange rateuncertainty is 5 minutes.

    if the adversary knows only the country of the user’s shippingaddress, there are only a small number of possible values(typically fewer than 10) for the difference between the cartprice and the final price. For example, there are no instanceswhere each US state has a distinct shipping fee. Based on ourobservations, the vast majority of merchants do not collect tax.Note that often the shipping address is directly revealed to thetracker (Section IV). Even otherwise, it is easy for trackers tolearn or infer the users country, whether based on IP address,locale, or language. Thus, we model price uncertainty as asmall set of possible values. We sample these values fromthe list of actual shipping rates on the merchant websiteswe analyzed. We always include the value 0 in this sample,because free shipping is (unsurprisingly) a popular option.

    Modeling time uncertainty. The tracker may observe theuser take one or more of the following actions: view theshopping cart, initiate checkout, and view the transactionreceipt page. In the first case, the tracker can guess that the usermay have initiated payment within a few minutes (of course,this guess might be incorrect). In the second case, the trackerknows that if the payment was made at all, it would have to bewithin a time window set by the payment processor, typically15 minutes. In the third case, the tracker knows the transactiontimestamp to within a few seconds (network latency).

    Modeling exchange rate uncertainty. The exchange ratedata used by the payment processor is always available tothe adversary. However, there is potentially some uncertaintyintroduced by the lag between the tracker observing the userinitiate the checkout process and the user being actually quotedan exchange rate. We model the adversary’s uncertainty asa time interval. If this window is 5 minutes, it means thatthe exchange rate that was applied to the transaction couldtake any value from the published time-series of exchange-rate values during a 5-minute period that begins when theadversary observed the user initiate checkout.

    Modeling the victim and adversary. We simulate 10,000payment flows based on all combinations of the 100 prices and100 timestamps sampled as above. For each flow, we considertwo cases: the victim either does, or doesn’t, complete thepayment within the windows of uncertainty assumed by theadversary.

    We posit an adversary that behaves as follows:• if there is exactly one transaction that falls within the

    uncertainty intervals, output that transaction.• if there is more than one such transaction, output a

    random transaction from that set.• if there are no such transactions, output “no such trans-

    action”.The last point is important, because in many cases the

    adversary observes the user on the shopping cart page or thecheckout page, but does not know for sure that the paymentwas made (some adversaries are also present on transactionreceipt pages and will have this additional information).

    Quantifying success. We quantify the adversary’s successin terms of two numbers: the true positive rate and the truenegative rate. A true positive occurs when the victim com-pleted the transaction, and the adversary outputs the correcttransaction. A true negative occurs when the victim did notcomplete the transaction, and the adversary correctly outputs“no such transaction”.

    For each simulated payment flow and each set of uncer-tainty parameters, we search the log of broadcast transactions(“mempool” log) for transactions that match the price andtime within the specified uncertainty windows. Any transac-tions found, in addition to the payment itself, constitute theanonymity set. Naturally, since the payments are simulated,we do not expect to find them on the blockchain, but in areal attack the payment would be part of the blockchain. Inother words, our measurements answer the question: “had apayment of a given value been made at a given time, how manytransactions on the blockchain could it be confused with?” Theanonymity set size is 1 more than this value.

    Additional heuristics. E-commerce payment transactionshave several other characteristics that enable the adversaryto distinguish them from (some) other transactions on theblockchain. We incorporate several such heuristics in ourattack.• Payments are always made to regular addresses rather

    than high-security “multisignature” addresses. This is trueacross almost all 130 merchants that we analyzed. Theuse of multisignature addresses would make our attack farstronger since the attacker, knowing the type of addressused by any given merchant, would be able to greatlylimit the set of candidate transactions on the blockchain.

    • Payment transactions almost always have two outputs —the recipient’s output and the change output — and nevermore than two. This behavior is consistent across all butone user wallet software that we are aware of; the excep-tion is Samourai Wallet (https://samouraiwallet.com/).

    • Fresh addresses are used, both for change and for therecipient’s output. This is a conservative assumption; al-

    10

    https://samouraiwallet.com/

  • Fig. 4. Effect of price uncertainty on transactionlinkage

    Fig. 5. Effect of payment time uncertainty ontransaction linkage

    Fig. 6. Effect of exchange rate uncertainty ontransaction linkage

    ternative behavior would make our attack stronger. If theuser’s wallet reuses addresses for change, that would undothe effect of mixing. If the recipient reuses addresses,it would make it easier for the adversary to associatespecific addresses with recipients, and thus further filterthe set of candidate transactions on the blockchain.

    C. Results

    Anonymity set size. Figure 3 shows the distribution ofanonymity set sizes under default values of various param-eters: payment time uncertainty of 15 minutes, exchange rateuncertainty of 5 minutes, and a price uncertainty set size of5. The most common value of the anonymity set is 1, whichshows that the attack is powerful under this default set ofparameters. Based on the anonymity set size distribution, thetrue positive rate is 76% and the true negative rate is 62%.

    Impact of uncertainty. Having shown that the attack issuccessful under a default set of parameters for uncertainty, weexamine the impact of each uncertainty parameter. In Figure4 we see that the accuracy remains high even if there are10 possible values for the price. As we observed earlier, priceuncertainty arises due to shipping options, and there are rarelymore than 10 possible values for it for any given country.

    Similarly, the attack degrades gracefully when we increasethe adversary’s time uncertainty or exchange rate uncertainty(Figure 5, Figure 6). Note that if the payment processorautomatically redirects to the payment receipt page, and theadversary is embedded on this page, then the time uncertaintyis on the order of seconds, and the success rate is extremelyhigh.

    More generally, the adversary will have a high success rateif his uncertainty on at least one of the three dimensions islow (Figures 4, 5, 6), as this greatly cuts down the number ofpossible matching transactions.

    Robustness of the results. While we took care to sampleprices from the actual distribution of prices on merchantwebsites, we find that our results are robust in terms ofthe sampling strategy. For example, we repeated our experi-ments with prices sampled from the distribution of transaction

    amounts on the blockchain (Figure 10 in Appendix C). Theresults are very similar; the accuracy improves slightly. Wealso repeated our experiments with all prices doubled, i.e.,with each sampled price replaced by twice its value (Figure 11in Appendix C). Again the results are essentially unchanged.

    We also repeated our experiments on the Litecoinblockchain instead of Bitcoin. Litecoin is the original altcoin— the first fork of Bitcoin — and is the altcoin with themost adoption for online payments, in terms of support bymerchants and payment processors. Again we find that thesuccess rate is high (Figure 12 in Appendix C); in fact, itis higher than the success rate for Bitcoin, likely due toLitecoin’s lower transaction volume, and therefore smalleranonymity sets. Litecoin had a volume of 3,605,028 transac-tions in the two-year period of interest, as opposed to Bitcoin’s150,614,721.

    Further improvements. So far, we have made conservativeassumptions about the adversary’s knowledge. The success ofthe attack in practice may in fact be much higher, either dueto idiosyncratic behavior by payment processors or due toadditional information available to the adversary.

    BitPay, one of the two main payment processors, rounds itstransaction amounts (in Satoshis) to a multiple of 100. Sincethe adversary knows the identity of the payment processor,whenever that processor is BitPay, he can eliminate a largefraction of possible transactions — any transaction amount thatis not a multiple of 100 Satoshis can be eliminated. Applyingthis heuristic, the accuracy improves substantially (Figure 13in Appendix C).

    Even if there is no discernible pattern in the transactionamount, the adversary may be able to tell which (if any)payment processor was involved in any given transaction onthe blockchain. Such address tagging heuristics are well known[3], and are applied at scale by companies such as Chainalysis.Tagging is not always accurate, but it can help the adversarygreatly decrease the anonymity set. This technique was usedfor Bitcoin forensics in a recent paper [46].

    11

  • VI. THE CLUSTER INTERSECTION ATTACK

    We now turn to our second attack, the cluster intersectionattack (Algorithm 2). To recap, the attack is applicable whenthe adversary has auxiliary information revealing that two(or more) transactions made with mixed coins trace back tothe same wallet (address cluster). Web trackers who observemultiple purchases may have this information.

    1: function EXPANDCLUSTER(addr)2: C ← {addr}3: for all tx in TXSFROM(a) do4: if not ISMIXTX(tx) then5: C ← C ∪ FROMADDRESSES(tx)6: C ← C ∪ {CHANGEADDRESS(tx)}7: for all tx in TXSTO(a) do8: if not ISMIXTX(tx) and9: CHANGEADDRESS(tx) = addr then

    10: C ← C ∪ FROMADDRESSES(tx)return C

    Algorithm 1: Clustering. One step of the address cluster-ing algorithm. We invoke this function recursively to findall addresses associated with a given coin or address. Thealgorithm incorporates the multi-input and change-address de-tection heuristics from [3]. Bitcoin mixing today is dominatedby JoinMarket, so we use JoinMarket detection (Appendix A)in place of IsMixTx.

    Inputs:• a set of mixed coins C known to be controlled by the

    same user.• an integer r, representing the adversary’s (possibly incor-

    rect) assumption that the victim did at most r rounds ofmixing.

    Output: a wallet cluster.1) Identify all join transactions on the blockchain.2) For each coin c ∈ C:

    • Identify all coins x such that there is a directed pathfrom x to c of length at most r consisting only ofjoin transactions. Call this set of coins Xc.

    • For each coin x ∈ Xc, identify the wallet cluster itbelongs to (Algorithm 1). Call the resulting set ofwallet clusters Wc.

    3) Compute the intersection of wallet clusters:⋂

    c∈C Wc.4) If it results in a unique wallet cluster, output it. Other-

    wise output “incorrect assumptions”.

    Algorithm 2: Cluster Intersection Attack. Step 1 can beamortized over multiple invocations of the algorithm; alter-nately step 2 can be modified so that join detection can beperformed only as needed.

    In this section, we present a large-scale simulation of theeffectiveness of this attack. An empirical validation, where wede-anonymize our own wallets, is deferred to the next section.

    A. Method

    Identifying joins. First we identify existing CoinJoin trans-actions on the Bitcoin blockchain. We focus on JoinMarket,since it is (to our knowledge) the only decentralized mixingservice that is currently operating and has a usable levelof liquidity. We adapt Möser et al.’s algorithm to identifyJoinMarket transactions [34], and it is shown in Algorithm3. We found 95,239 such transactions, of which 78,697 areduring the period of interest to us (mid 2015–mid 2017). Thenumber of coins mixed in one of these transactions has a meanof 3.98 and a standard deviation of 1.72.

    Simulating the victim. We consider a victim with a wal-let of clusterable addresses who obtains 100 distinct mixedcoins over the two-year period of interest. We sample 100timestamps (block heights) uniformly during this period; ateach of these times, the victim initiates mixing of a coinfrom her wallet and completes r ≤ 5 rounds of mixing. 5rounds represents a very high degree of anonymity based onJoinMarket’s advice to users [50]. The victim retains the mixedcoins until the end of the period of interest. The values of thesecoins don’t matter since this information is not used by ourdeanonymization algorithm.

    To simulate the mixing of one coin (with r rounds ofmixing) starting from a given timestamp (block height), wesample from the JoinMarket transactions on the blockchainthat have this timestamp. With this node as the starting pointin the graph of JoinMarket transactions, we sample a pathof length r from among all such paths. If there are no suchpaths, we repeat the procedure starting from a different initialtransaction.

    Attack. At this point the victim has 100 mixed distinctcoins in her (simulated) wallet. Now we simulate the webtracker’s view, that is, we simulate the victim making twotransactions in a way that reveal to the adversary that twoof these coins trace from the same wallet. Then we executethe cluster intersection attack (Algorithm 2). We repeat theprocedure with different values of the number of rounds rand the number of transactions t observed by the adversary.

    B. Results

    Figure 7 shows the adversary’s success rate as a functionof the number of rounds of mixing and the number oftransactions observed by the adversary. By construction ofthe experiment, the cluster intersection attack has the sametrue positive rate and true negative rate. Thus the graph alsorepresents the probability that, if the adversary’s assumptionsare incorrect about the number of rounds of mixing, it willoutput “incorrect assumptions”. With one or two rounds ofmixing, just two observed transactions are sufficient for theadversary to identify the wallet cluster. Even with four roundsof mixing, a small number of observations is sufficient forhigh accuracy.

    Figure 8 helps explain why the attack succeeds: the successrate is strongly dependent on the difference in age between thedifferent mixed coins. This is intuitive: if the victim mixed acoin a year ago and another coin today, the anonymity sets

    12

  • 2 3 4 5Transactions Observed

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    Advers

    ary

    Succ

    ess

    Cluster Intersection Attack

    3 Rounds

    4 Rounds

    5 Rounds

    Fig. 7. Effect of the number of rounds of mixing on success rate of clusterintersection attack. For r < 3 rounds, the success rate is 100%.

    0 50 100 150 200Time between tx's (days)

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    Advers

    ary

    Succ

    ess

    Cluster Intersection Attack

    2 tx's

    3 tx's

    4 tx's

    Fig. 8. Effect of age of mixed coins on success rate of cluster intersectionattack (with 5 rounds of mixing). The X axis is the maximum difference inage between coins known to the adversary to be associated with the sameuser.

    of the two coins are much less likely to intersect, comparedto two coins both mixed today. In other words, users whohave a long history of making e-commerce purchases usingmixed coins are at a greater risk of deanonymization, not justbecause of the number of purchases but also because of thegap between them.

    VII. EMPIRICAL VALIDATION OF ATTACKS

    In this section we describe how we validated our attacksempirically by making actual purchases and participating inCoinJoin transactions. Naturally, the scale of these experimentswas more limited than our previous measurements.

    A. Setup.

    We began by purchasing bitcoins from exchange serviceCoinbase and routing it to a set of six addresses. We ensuredthat these six addresses are clearly clusterable by our clusteringalgorithm (Algorithm 1). This simulates a user with a walletcontaining addresses that are linkable to each other, beforeemploying mixing. As discussed in Section 2, we believe thatthis is a conservative and realistic assumption. Furthermore,the clusterability of the user’s wallet is affected by factorsnot in the user’s control. For example, if a payment processorprovides a payment address that has been used before, ratherthan a freshly generated address, then the user’s changeaddress will be linked to her wallet.

    Next, we participated in JoinMarket CoinJoin transactions tocreate 11 coins which are not linkable to the main cluster usingknown techniques. We participated in one round of CoinJoinfor 6 of the transactions and two rounds of CoinJoin for theother 5.

    Finally, we made a set of 21 purchases on 20 merchantsites. We sampled these sites from among those that leakedtransaction-relevant information to at least one tracker (asmeasured in Section IV). For 11 of these purchases, we used

    coins that had been mixed in the previous step, and we ensuredthat these addresses as well as the change addresses for thesepurchases did not get linked to our cluster. For the other 10purchases, we used coins directly from our cluster. The finalprices of these items ranged from a minimum of 3.28 to amaximum of 46.40 when converted to USD, with a mean of13.67.

    B. Validating transaction linkability.

    We calculate the anonymity sets of these 21 transactionsbased on our default values of the adversary’s uncertainty: a5-minute exchange rate uncertainty and a 15-minute paymenttime uncertainty. The uncertainty windows are centered aroundthe true values of the payment time and the exchange ratedetermination time. As for price uncertainty, we use the actuallist of shipping options (and resulting list of final prices) thatwe recorded while making purchases. The number of possiblepre-BTC prices is typically 5 or fewer per purchase. We findthat in 10 out of 21 cases (48%), the anonymity set size is 1(Figure 9).

    For 17 of the 21 purchases, there was at least one trackerthat received transaction-relevant information and was alsopresent on the payment receipt page, which means that forthese adversaries the payment time uncertainty is very low.In this scenario, the anonymity set sizes are much lower: infact, it is 1 for 14 of 17 purchases. The adversary behaving asdescribed in Section VI would identify the correct transaction90% of the time. The true negative rate is also high: 82%. Thismeans that if the adversary’s assumption about the paymenttime was incorrect for whatever reason — if the P2P networkload is too high or the user’s wallet software included too lowa transaction fee, the receipt may not happen quickly — theadversary will be able to correctly deduce this.

    13

  • Fig. 9. Observed anonymity set sizes in empirical evaluation of the transactionlinkage attack

    C. Validating the cluster intersection attack.

    Next we validate the cluster intersection attack. Out ofthe 11 purchases we made using mixed coins, we consideradversaries that observe a random t of those purchases andknow which transaction on the blockchain corresponds to eachof them. We find that for t = 2, the adversary described inAlgorithm 2 has an 89% chance of correctly identifying ourwallet cluster, and for t = 3, this goes up to 99%.

    If more rounds of mixing are used, the attack will not workas well. On the other hand, we mixed all our coins during a3-day period, and realistic users who mix coins over a periodof months or years will have worse privacy. In Section V weevaluated the effect of these factors as well as the number oftransactions observed by the adversary. For our experimentsin this section, we limited the number of parameters becauseof the expensive nature of the mixing market.

    D. Validating the attack end-to-end.

    The two attacks are especially powerful when combined.Even if the single-transaction linkage doesn’t produce a uniquetransaction, we can run cluster intersection starting from everypossible combination of the candidate transactions producedby it. Most combinations will produce an intersection ofsize zero, and can be discarded. If exactly one combinationproduces one cluster and all the rest produce zero clusters,then the adversary outputs that cluster.

    From our purchase records, we determined that there were11 trackers in a position to observe more than one purchase:americanexpress.com, chatid.com, criteo.com, doubleclick.net,facebook.com, google-analytics.com, google.com, mone-tate.net, revjet.com, steelhousemedia.com, tealiumiq.com.Overall there are 25 pairs of purchases for which there wasa tracker that could observe both purchases. We ran the end-to-end attack on all 25 pairs, again using default values ofthe adversary’s payment time uncertainty and exchange rateuncertainty. We found that the attack succeeds in identifyingour wallet cluster in 20 cases.

    Again, these numbers reflect conservative assumptionsabout the adversary’s knowledge. If the Bitcoin transactiondetails are directly leaked to the tracker, or if the trackeris present on the receipt page, or observes more than twopayments, the success rate will be much higher. Similarly,many Bitcoin users do not employ mixing. We do not knowwhat fraction of e-commerce purchases are made with mixedcoins, but we observe that only .05% of transactions on theblockchain over the past two years are CoinJoin transactions. Ifthe user doesn’t employ mixing, then it will be straightforwardfor the adversary to identify her wallet cluster, even with a highdegree of uncertainty in the payment amount and time.

    VIII. MITIGATION AND DISCUSSION

    Our findings are a reminder that systems without provable(or at least well-defined) privacy properties may have infor-mation leaks and privacy breaches in unexpected ways. Whenmultiple such systems interact, the leaks can be even moresubtle. For another example of the difficulty of composingsystems with complex privacy properties, see [41].

    Cryptocurrency anonymity is a new research topic, but itsits at the intersection of anonymous communication and dataanonymization, both well-established fields. Unfortunately,it seems inherit the worst of these two worlds. Like dataanonymization (and unlike anonymous communication), sen-sitive data must be publicly and permanently stored, avail-able to any adversary, and de-anonymization may happenretroactively. And like anonymous communication systems(and unlike data anonymization), privacy depends on subtle in-teractions arising from the behavior of users and applications.Worse, realistic traces of the system may not be available atthe time of designing and implementing the privacy defenses.

    Turning to defenses, we observe that our first attack exploitsthe inherent tension between privacy and e-commerce, and oursecond attack exploits the inherent tension between privacyand the public nature of the blockchain. Thus, all mitigationstrategies come with tradeoffs. The available mitigations breakdown into three categories: self-defense by users, techniquesthat merchants can use, and alternative cryptocurrencies orcryptocurrency-based payment methods.

    Mitigation by merchants. There are a few straightforwardmitigations that merchants could deploy: (1) enabling HTTPSon all shopping (and especially payment-related) pages — thiswould protect against network adversaries, but not third-partytrackers, our main adversary of interest (2) generating Bitcoin-address QR codes internally instead of outsourcing it to a thirdparty; (3) avoiding leaks of the Bitcoin address from paymentreceipt pages; and (4) avoiding unintentional PII leaks. Asto the last point, however, note that the attack succeeds aslong as some first party website visited by the user leaks PIIto third parties, and at least some PII leakage is for cross-device linking purposes [24], and thus intentional. Beyondthese obvious steps, merchants could share less data with thirdparties, and with fewer of them, but this would come at theexpense of their advertising and analytics objectives.

    14

  • Self-defense. Web tracking is a well-known privacy threat,and the main defense is to use browser extensions such asuBlock Origin, Adblock Plus, or Ghostery to block trackers.Such defenses can be quite effective, but they are far fromperfect [2], [20]. Our measurements in Section IV confirmthe partial effectiveness of these tools. Note that these toolsare of no help when the adversary is a network eavesdropper(for either attack) or the payment processor (for the clusterintersection attack).

    On the cryptocurrency side, the main self-defense is to useimproved mixing techniques, especially multi-round mixing.We showed in Section VI that this is effective (but not perfect)as long as the adversary observes only 2 or 3 transactions. Inour end-to-end evaluation in Section VII, we carried out only 1or 2 rounds of mixing, and this a limitation of our experiments.Increasing the number of rounds comes at the expense of cost(transaction fees and mixing fees) and convenience (due totransaction confirmation time). A more through evaluation ofthe trade-offs is a topic for future work.

    While we have focused on CoinJoin or decentralized mix-ing, in principle the cluster intersection attack should also workagainst centralized mixes. If a mixing service introduces adelay of (say) up to 6 blocks, then for a given coin that wasmixed at a given block height, all mix outputs produced in thenext 6 blocks can be considered part of its anonymity set. Themain complication is the extent to which mix transactions aredistinguishable from other transactions, which is likely highlyimplementation-dependent. Of course, centralized mixing isadditionally vulnerable to the mix operator colluding with theadversary or stealing the funds. Evaluating the attack againstcentralized mixes (as well as other anonymity techniquesincluding TumbleBit [51]) is an avenue for future work.

    Alternative cryptocurrencies and payment mechanisms.Unlike the Bitcoin approach of anonymity as an overlay, othercryptocurrencies bake privacy into the protocol, and promiseuntraceability of transactions. The most well known of theseare Zcash, based on the Zerocash protocol [52], [53], andMonero, based on the Cryptonote protocol [54]. Zcash is morecomputationally expensive but comes with more rigorous se-curity properties. Of the two, Monero has more vendor supportat the time of writing, but still far less than Bitcoin or evenLitecoin, and primarily on hidden-service sites merchandisingillicit goods. While some anonymity weaknesses have recentlybeen revealed in Monero [55], [56], we believe that it is notsusceptible to the cluster intersection attack.

    The lightning network [57] is a proposal for a fast micro-payments. It is a network of two-party bidirectional paymentchannels. If Alice wants to pay Bob, she finds a path ofsuch channels that link her to Bob, through which she canroute the payment. Although the lightning network relies onBitcoin (or another underlying cryptocurrency) for its security,the vast majority of transactions are off-chain. There is noglobal ledger of all lightning payments, rendering our attackineffective. If and when the lightning network is deployedon Bitcoin, it would be an effective defense. However, otherprivacy concerns have been identified [58], [59], and the issues

    that arise are analogous to communications anonymity [60].Finally, like virtually all deanonymization attacks on cryp-

    tocurrencies, our techniques could be used to build forensictools for law enforcement use. In past investigations, agentshave sought to find the identity behind specific blockchaintransactions that were known to represent thefts, funding ofunlawful activities, or earnings from unlawful activities, as inthe case of ransomware. Alternatively, agents may have anidentified person of interest and may wish to scrutinize theircryptocurrency dealings for evidence of money laundering orother financial crimes. Thus, both blockchain→ web and web→ blockchain linking techniques are of potential interest tolaw enforcement. Agents might subpoena a tracker or paymentprocessor for information that might allow such linkage, oreven use network surveillance techniques.

    Acknowledgements. We are grateful to Malte Möser andBen Burgess for feedback on a draft. This work is supportedby NSF awards CNS-1421689, CNS-1526353, CNS-1651938,and an NSF Graduate Research Fellowship under grant numberDGE-1148900.

    REFERENCES

    [1] K. Krombholz, A. Judmayer, M. Gusenbauer, and E. Weippl, “The otherside of the coin: User experiences with bitcoin security and privacy,” inInternational Conference on Financial Cryptography and Data Security.Springer, 2016, pp. 555–580.

    [2] S. Englehardt and A. Narayanan, “Online tracking: A 1-million-sitemeasurement and analysis,” in ACM Conference on Computer andCommunications Security, 2016.

    [3] S. Meiklejohn, M. Pomarole, G. Jordan, K. Levchenko, D. McCoy, G. M.Voelker, and S. Savage, “A fistful of bitcoins: Characterizing paymentsamong men with no names,” in Proceedings of the 2013 conference onInternet measurement conference. ACM, 2013, pp. 127–140.

    [4] E. Androulaki, G. O. Karame, M. Roeschlin, T. Scherer, and S. Capkun,“Evaluating user privacy in bitcoin,” in International Conference onFinancial Cryptography and Data Security. Springer, 2013, pp. 34–51.

    [5] F. Reid and M. Harrigan, “An analysis of anonymity in the bitcoinsystem,” in Security and Privacy in Social Networks. Springer, 2013,pp. 197–223.

    [6] D. Ron and A. Shamir, “Quantitative analysis of the full bitcoin trans-action graph,” in International Conference on Financial Cryptographyand Data Security. Springer, 2013, pp. 6–24.

    [7] J. Bonneau, A. Narayanan, A. Miller, J. Clark, J. A. Kroll, and E. W.Felten, “Mixcoin: Anonymity for bitcoin with accountable mixes,”in Financial Cryptography and Data Security, ser. Lecture Notes inComputer Science, N. Christin and R. Safavi-Naini, Eds. SpringerBerlin Heidelberg, 2014, vol. 8437, pp. 486–504.

    [8] T. Ruffing, P. Moreno-Sanchez, and A. Kate, “Coinshuffle: Practicaldecentralized coin mixing for bitcoin.” ESORICS (2), vol. 8713, pp.345–364, 2014.

    [9] A. Lerner, A. K. Simpson, T. Kohno, and F. Roesner, “Internet jones andthe raiders of the lost trackers: An archaeological study of web trackingfrom 1996 to 2016,” in 25th USENIX Security Symposium, 2016.

    [10] J. R. Mayer and J. C. Mitchell, “Third-party web tracking: Policy andtechnology,” in 2012 IEEE Symposium on Security and Privacy, 2012.

    [11] F. Roesner, T. Kohno, and D. Wetherall, “Detecting and defendingagainst third-party tracking on the web,” in Proceedings of the 9thUSENIX conference on Networked Systems Design and Implementation.USENIX Association, 2012, pp. 12–12.

    [12] C. Budak, S. Goel, J. Rao, and G. Zervas, “Understanding emergingthreats to online advertising,” in Proceedings of the ACM Conferenceon Economics and Computation, 2016.

    [13] T. Libert, “Exposing the invisible web: An analysis of third-party httprequests on 1 million websites,” International Journal of Communica-tion, vol. 9, p. 18, 2015.

    15

  • [14] A. Soltani, S. Canty, Q. Mayo, L. Thomas, and C. J. Hoofnagle, “Flashcookies and privacy.” in AAAI spring symposium: intelligent informationprivacy management, vol. 2010, 2010, pp. 158–163.

    [15] S. Englehardt, D. Reisman, C. Eubank, P. Zimmerman, J. Mayer,A. Narayanan, and E. W. Felten, “Cookies that give you away: Thesurveillance implications of web tracking,” in Proceedings of the 24thConference on World Wide Web, 2015.

    [16] G. Acar, C. Eubank, S. Englehardt, M. Juarez, A. Narayanan, andC. Diaz, “The web never forgets: Persistent tracking mechanisms inthe wild,” in Proceedings of ACM CCS. ACM, 2014, pp. 674–689.

    [17] P. Eckersley, “How unique is your web browser?” in InternationalSymposium on Privacy Enhancing Technologies Symposium. Springer,2010, pp. 1–18.

    [18] P. Laperdrix, W. Rudametkin, and B. Baudry, “Beauty and the beast:Diverting modern web browsers to build unique browser fingerprints,”in 37th IEEE Symposium on Security and Privacy, 2016.

    [19] Z. Yu, S. Macbeth, K. Modi, and J. M. Pujol, “Tracking the trackers,” inProceedings of the 25th International Conference on World Wide Web.International World Wide Web Conferences Steering Committee, 2016,pp. 121–132.

    [20] G. Merzdovnik, M. Huber, D. Buhov, N. Nikiforakis, S. Neuner,M. Schmiedecker, and E. Weippl, “Block me if you can: A large-scale study of tracker-blocking tools,” in Proceedings of the 2nd IEEEEuropean Symposium on Security and Privacy (IEEE EuroS&P), 2017.

    [21] A. Gervais, A. Filios, V. Lenders, and S. Capkun, “Quantifying webadblocker privacy.” IACR Cryptology ePrint Archive, 2016.

    [22] B. Krishnamurthy, K. Naryshkin, and C. Wills, “Privacy leakage vs.protection measures: the growing disconnect,” in W2SP, 2011.

    [23] B. Krishnamurthy and C. E. Wills, “On the leakage of personallyidentifiable information via online social networks,” in Proceedings ofthe 2nd ACM workshop on Online social networks. ACM, 2009, pp.7–12.

    [24] J. Brookman, P. Rouge, A. A. Alva, and C. Yeung, “Cross-devicetracking: Measurement and disclosures,” in Proceedings of the PrivacyEnhancing Technologies Symposium, 2018.

    [25] J. Angwin and J. Valentino-Devries, “Google’s iphone tracking,” WallStreet Journal, 2012.

    [26] K. Hill and S. Mattu, “Before You Hit ’Submit,’ This Company HasAlready Logged Your Personal Data,” https://gizmodo.com/before-you-hit-submit-this-company-has-already-logge-1795906081, 2017.

    [27] N. Nikiforakis, L. Invernizzi, A. Kapravelos, S. Van Acker, W. Joosen,C. Kruegel, F. Piessens, and G. Vigna, “You are what you include:large-scale evaluation of remote javascript inclusions,” in Proceedingsof the 2012 ACM conference on Computer and communications security.ACM, 2012, pp. 736–747.

    [28] A. K. Sood and R. J. Enbody, “Malvertising–exploiting web advertising,”Computer Fraud & Security, vol. 2011, no. 4, pp. 11–16, 2011.

    [29] J. Koetsier, “90% of marketers say retargeting now as good as search ads,email marketing,” VentureBeat, https://venturebeat.com/2014/12/16/90-of-marketers-say-retargeting-now-as-good-as-search-ads-email-marketing/, 2014.

    [30] S. Nakamoto, “Bitcoin: A Peer-to-Peer Electronic Cash System,” 2008.[Online]. Available: https://bitcoin.org/bitcoin.pdf

    [31] J. Bonneau, A. Miller, J. Clark, A. Narayanan, J. A. Kroll, andE. W. Felten, “Research perspectives and challenges for bitcoin andcryptocurrencies,” in IEEE Symposium on Security and Privacy, 2015.

    [32] D. Yang, J. Gavigan, and Z. Wilcox-OHearn, “Survey of confidentialityand privacy preserving technologies for blockchains,” https://z.cash/static/R3 Confidentiality and Privacy Report.pdf, 2016.

    [33] G. Maxwell, “CoinJoin: Bitcoin Privacy for the Real World,” 2013.[Online]. Available: https://bitcointalk.org/index.php?topic=279249.0

    [34] M. Möser and R. Böhme, “Join me on a market for anonymity,” inWorkshop on Privacy in the Electronic Society, 2016.

    [35] G. Bissias, A. P. Ozisik, B. N. Levine, and M. Liberatore, “Sybil-resistant mixing for bitcoin,” in Proceedings of the 13th Workshop onPrivacy in the Electronic Society. ACM, 2014, pp. 149–158.

    [36] E. Heilman, F. Baldimtsi, and S. Goldberg, “Blindly signed contracts:Anonymous on-blockchain and off-blockchain bitcoin transactions.” inFinancial Cryptography Workshops, 2016, pp. 43–60.

    [37] E. Ordano, “We need more coinjoin,” https://medium.com/@eordano/we-need-more-coinjoin-c7fefd12dc5e, 2015.

    [38] A. Gervais, H. Ritzdorf, M. Lucic, and S. Capkun, “Quantifying locationprivacy leakage from transaction prices.” European Symposium onResearch in Computer Security, vol. 2016.

    [39] P. Koshy, D. Koshy, and P. D. McDaniel, “An analysis of anonymityin bitcoin using P2P network traffic,” in Financial Cryptography, ser.Lecture Notes in Computer Science, vol. 8437. Springer, 2014, pp.469–485.

    [40] A. Biryukov, D. Khovratovich, and I. Pustogarov, “Deanonymisation ofclients in bitcoin p2p network,” in Proceedings of the 2014 ACM SIGSACConference on Computer and Communications Security. ACM, 2014,pp. 15–29.

    [41] A. Biryukov and I. Pustogarov, “Bitcoin over tor isn’t a good idea,” inSecurity and Privacy (SP), 2015 IEEE Symposium on. IEEE, 2015, pp.122–134.

    [42] G. Fanti and P. Viswanath, “Anonymity properties of the bitcoin p2pnetwork,” arXiv preprint arXiv:1703.08761, 2017.

    [43] P. L. Juhász, J. Stéger, D. Kondor, and G. Vattay, “A bayesian approachto identify bitcoin users,” arXiv preprint arXiv:1612.06747, 2016.

    [44] T. Neudecker and H. Hartenstein, “Could network information facilitateaddress clustering in bitcoin?”

    [45] S. B. Venkatakrishnan, G. Fanti, and P. Viswanath, “Dandelion:Redesigning the bitcoin network for anonymity,” arXiv preprintarXiv:1701.04439, 2017.

    [46] R. S. Portnoff, D. Y. Huang, P. Doerfler