The spread of low-credibility content by social bots · retweets a tweet by user Bob, the tweet is...

The spread of low-credibility content

by social bots

Chengcheng Shao, Giovanni Luca Ciampaglia, Onur Varol,Kaicheng Yang, Alessandro Flammini, and Filippo Menczer

Indiana University, Bloomington

Abstract

The massive spread of digital misinformation has been identified as amajor global risk and has been alleged to influence elections and threatendemocracies. Communication, cognitive, social, and computer scientistsare engaged in efforts to study the complex causes for the viral diffusionof misinformation online and to develop solutions, while search and socialmedia platforms are beginning to deploy countermeasures. With few ex-ceptions, these efforts have been mainly informed by anecdotal evidencerather than systematic data. Here we analyze 14 million messages spread-ing 400 thousand articles on Twitter during and following the 2016 U.S.presidential campaign and election. We find evidence that social botsplayed a disproportionate role in amplifying low-credibility content. Ac-counts that actively spread articles from low-credibility sources are signifi-cantly more likely to be bots. Automated accounts are particularly activein amplifying content in the very early spreading moments, before an ar-ticle goes viral. Bots also target users with many followers through repliesand mentions. Humans are vulnerable to this manipulation, retweetingbots who post links to low-credibility content. Successful low-credibilitysources are heavily supported by social bots. These results suggest thatcurbing social bots may be an effective strategy for mitigating the spreadof online misinformation.

1 Introduction

If you get your news from social media, as most Americans do [15], you areexposed to a daily dose of false or misleading content — hoaxes, conspiracy the-ories, fabricated reports, click-bait headlines, and even satire. We refer to suchcontent collectively as “misinformation.” The incentives are well understood:traffic to fake news sites is easily monetized through ads [33], but political mo-tives can be equally or more powerful [36, 42]. The massive spread of digital mis-information has been identified as a major global risk [22]. Claims that so-called“fake news” can influence elections and threaten democracies [16], however, are

1

arX

iv:1

707.

0759

2v4

[cs

.SI]

24

May

201

8

hard to prove [2]. Yet we have witnessed abundant demonstrations of real harmcaused by misinformation and disinformation spreading on social media, fromdangerous health decisions [21] to manipulations of the stock market [14].

A complex mix of cognitive, social, and algorithmic biases contribute toour vulnerability to manipulation by online misinformation [27]. These includeinformation overload and finite attention [41], novelty of false news [57], theselective exposure [52, 40, 39] caused by polarized and segregated online socialnetworks [9, 8], algorithmic popularity bias [45, 20, 37], and other cognitivevulnerabilities such as confirmation bias and motivated reasoning [50, 26, 29].

Abuse of online information ecosystems can both exploit and reinforce thesevulnerabilites. While fabricated news are not a new phenomenon [31], the easewith which social media can be manipulated [42] creates novel challenges andparticularly fertile grounds for sowing disinformation. Public opinion can beinfluenced thanks to the low cost of producing fraudulent websites and highvolumes of software-controlled profiles or pages, known as social bots [14, 54].These fake accounts can post content and interact with each other and withlegitimate users via social connections, just like real people [51]. Bots can tailormisinformation and target those who are most likely to believe it, taking advan-tage of our tendencies to attend to what appears popular, to trust informationin a social setting [25], and to trust social contacts [23]. Bots alone may notentirely explain the success of false news, but they do contribute to it [57]. Sinceearliest manifestations uncovered in 2010 [36, 42], we have seen influential botsaffect online debates about vaccination policies [14] and participate actively inpolitical campaigns, both in the U.S. [4] and other countries [62, 13].

The fight against online misinformation requires a grounded assessment ofthe relative impact of different mechanism by which it spreads. If the problemis mainly driven by cognitive limitations, we need to invest in news literacyeducation; if social media platforms are fostering the creation of echo chambers,algorithms can be tweaked to broaden exposure to diverse views; and if mali-cious bots are responsible for many of the falsehoods, we can focus attention ondetecting this kind of abuse. Here we focus on gauging the latter effect. Mostof the literature about the role played by social bots in the spread of misinfor-mation is based on anecdotal or limited evidence; a quantitative understandingof the effectiveness of misinformation-spreading attacks based on social bots isstill missing. A large-scale, systematic analysis of the spread of low-credibilitycontent by social bots is now feasible thanks to two tools developed in our lab:the Hoaxy platform to track the online spread of claims [47] and the Botometermachine learning algorithm to detect social bots [54]. Here we examine socialbots and how they promote this class of content spreading through millions ofTwitter posts during and following the 2016 U.S. presidential campaign.

2 Results

Our analysis is based on a large corpus of news stories posted on Twitter. Ratherthan focusing on individual stories that have been flagged as misinformation by

2

fact-checkers, we track the complete production of a number of sources knownto fact-checkers for their low credibility. There are two reasons for this ap-proach [27]. First, these sources have intent and processes for the deceptionand manipulation of public opinion. Second, fact-checking millions of individ-ual articles is unfeasible. As a result, this approach is widely adopted in theliterature (see Supplementary Materials). The links to the articles consideredhere were crawled from 120 low-credibility sources that, according to lists com-piled by reputable third-party news and fact-checking organizations, routinelypublish various types of false and/or misleading news. Our own analysis of asample of articles confirms that the vast majority of their content is some typeof misinformation (see Methods). We also crawled the articles published byseven independent fact-checking organizations for the purpose of comparison.The present analysis focuses on the period from mid-May 2016 to the end ofMarch 2017. During this time, we collected 389,569 articles from low-credibilitysources and 15,053 articles from fact-checking sources. We further collectedfrom Twitter all of the public posts linking to these articles: 13,617,425 tweetslinked to low-credibility sources and 1,133,674 linked to fact checking sources.See Methods and Supplementary Materials for details.

On average a low-credibility source published approximately 100 articles perweek. By the end of the study period, the mean popularity of those articleswas approximately 30 tweets per article per week (see Supplementary Materi-als). However, as shown in Fig. 1, success is extremely heterogeneous acrossarticles. Whether we measure success by number of accounts sharing an article,or by number of posts containing a link, we find a very broad distribution ofpopularity spanning several orders of magnitude: while the majority of articlesgoes unnoticed, a significant fraction goes “viral.” Unfortunately, and consistentwith prior analysis of Facebook data [41], we observe that the popularity distri-bution of low-credibility articles is indistinguishable from that of fact-checkingarticles. This result is not as bleak as that of a similar analysis based on onlyfact-checked claims, which found false news to be even more viral than realnews [57]. However, the qualitative conclusion is the same: massive numbers ofpeople are exposed to low-credibility content.

Even though low-credibility and fact-checking sources show similar popu-larity distributions, we observe some anomalous patterns in the spread of low-credibility content. First, most articles by low-credibility sources spread throughoriginal tweets and retweets, while few are shared in replies (Fig. 2(a)); this isdifferent from articles by fact-checking sources, which are shared mainly viaretweets but also replies (Fig. 2(b)). In other words, the spreading patterns oflow-credibility content are less “conversational.” Second, the more a story wastweeted, the more the tweets were concentrated in the hands of few accounts,who act as “super-spreaders” (Fig. 2(c)). This goes against the intuition that,as a story goes “viral,” and thus reaches a broader audience, the contribution ofany individual account or group of accounts should matter less. In fact, a singleaccount can post the same low-credibility article hundreds or even thousandsof times (see Supplementary Materials). This could suggest that the spread isamplified through automated means.

3

a

bc

Figure 1: Online virality of content. (a) Probability distribution (density func-tion) of the number of tweets for articles from both low-credibility and fact-checking sources. The distributions of the number of accounts sharing an articleare very similar (see Supplementary Materials). As illustrations, the diffusionnetworks of two stories are shown: (b) a medium-virality misleading article ti-tled FBI just released the Anthony Weiner warrant, and it proves they stoleelection, published a month after the 2016 U.S. election and shared in over 400tweets; and (c) a highly viral fabricated news report titled “Spirit cooking”:Clinton campaign chairman practices bizarre occult ritual, published four daysbefore the 2016 U.S. election and shared in over 30 thousand tweets. In bothcases, only the largest connected component of the network is shown. Nodesand links represent Twitter accounts and retweets of the article, respectively.Node size indicates account influence, measured by the number of times anaccount was retweeted. Node color represents bot score, from blue (likely hu-man) to red (likely bot); yellow nodes cannot be evaluated because they haveeither been suspended or deleted all their tweets. An interactive version of thelarger network is available online (iunetsci.github.io/HoaxyBots/). Notethat Twitter does not provide data to reconstruct a retweet tree; all retweetspoint to the original tweet. The retweet networks shown here combine multiplecascades (each a “star network” originating from a different tweet) that all sharethe same article link.

4

iunetsci.github.io/HoaxyBots/

a

0

20

40

60

80

100

100

80

60

40

20

0

0 20 40 60 80 100Retweets

RepliesTwee

ts

100

101

102

103

Dens

ity

b

0

20

40

60

80

100

100

80

60

40

20

0

0 20 40 60 80 100

RepliesTwee

ts

Retweets 100

101

102

Dens

ity

c d

0.0 0.2 0.4 0.6 0.8 1.0Bot Score

0

2

4

6

8

Prob

abilit

y De

nsity

Most ActiveRandom Sample

Figure 2: Anomalies. The distribution of types of tweet spreading articles from(a) low-credibility and (b) fact-checking sources are quite different. Each articleis mapped along three axes representing the percentages of different types ofmessages that share it: original tweets, retweets, and replies. When user Aliceretweets a tweet by user Bob, the tweet is rebroadcast to all of Alice’s followers,whereas when she replies to Bob’s tweet, the reply is only seen by Bob. Colorrepresents the number of articles in each bin, on a log-scale. (c) Correlationbetween popularity of articles from low-credibility sources and concentrationof posting activity. We consider a collection of articles shared by a minimumnumber of tweets as a popularity group. For articles in each popularity group, aviolin plot shows the distribution of Gini coefficients, which measure concentra-tion of posts by few accounts (see Supplementary Materials). In violin plots, thewidth of a contour represents the probability of the corresponding value, andthe median is marked by a colored line. (d) Bot score distributions for a randomsample of 915 accounts who posted at least one link to a low-credibility source,and for the 961 “super-spreaders” that most actively shared content from low-credibility sources. The two groups have significantly different scores (p < 10−4

according to a Mann-Whitney U test): super-spreaders are more likely bots.

5

We hypothesize that the “super-spreaders” of low-credibility content aresocial bots, which are automatically posting links to articles, retweeting otheraccounts, or performing more sophisticated autonomous tasks, like followingand replying to other users. To test this hypothesis, we used Botometer toevaluate the Twitter accounts that posted links to articles from low-credibilitysources. For each account we computed a bot score (a number in the unitinterval), which can be interpreted as the level of automation of that account.We used a threshold of 0.5 to classify an account as bot or human. Detailsabout the Botometer system and the threshold can be found in Methods. Wefirst considered a random sample of the general population of accounts thatshared at least one link to a low-credibility article. Only 6% of accounts inthe sample are labeled as bots using this method, but they are responsiblefor spreading 31% of all tweets linking to low-credibility content, and 34% ofall articles from low-credibility sources. We then compared this group witha sample of the top most active accounts (“super-spreaders”), 33% of whichhave been labeled as bot — over five times as many (details in SupplementaryMaterials). Fig. 2(d) confirms that the super-spreaders are significantly morelikely to be bots compared to the general population of accounts who share low-credibility content. Because these results are based on a classification model,it could be possible that what we see in Fig. 2(d) is due to bias in the wayBotometer has been trained. For example, the model could learn to assignhigher scores to more active accounts. We rule out this competing explanationby showing that higher bot scores cannot be attributed to this kind of bias inthe learning model (see Supplementary Materials).

Given these anomalies, we submit that bots may play a critical role in drivingthe viral spread of content from low-credibility sources. To test this question,we examined whether bots tend to get involved at particular times in the spreadof popular articles. As shown in Fig. 3(a), bots are prevalent in the first fewseconds after an article is first published on Twitter. We conjecture that thisearly intervention exposes many users to low-credibility articles, increasing thechances than an article goes “viral.”

We find that another strategy often used by bots is to mention influen-tial users in tweets that link to low-credibility content. Bots seem to employthis targeting strategy repetitively; for example, a single account mentioned@realDonaldTrump in 19 tweets, each linking to the same false claim about mil-lions of votes by illegal immigrants (see details in Supplementary Materials).For a systematic investigation, let us consider all tweets that mention or replyto a user and include a link to a viral article from a low-credibility source in ourcorpus. The number of followers is often used as a proxy for the influence of aTwitter user. As shown in Fig. 3(b), in general tweets tend to mention popularpeople. However, accounts with the largest bot scores tend to mention userswith a larger number of followers (median and average). A possible explana-tion for this strategy is that bots (or rather, their operators) target influentialusers with content from low-credibility sources, creating the appearance that itis widely shared. The hope is that these targets will then reshare the contentto their followers, thus boosting its credibility.

6

a

100 101 102 103

Lag +1 (seconds)

0.0

0.2

0.4

0.6

0.8

Bot S

core

b

Lower Third Middle Third Top ThirdBot Score Percentile of Mentioning Accounts

6.75

7.00

7.25

7.50

7.75

8.00

Follo

wers

of M

entio

ned

Acco

unts

×106

100

104

108

Figure 3: Bot strategies. (a) Early bot support after a viral low-credibilityarticle is first shared. We consider a sample of 60,000 accounts that participatein the spread of the 1,000 most viral stories from low-credibility sources. Wealign the times when each article first appears. We focus on a one-hour earlyspreading phase following each of these events, and divide it into logarithmiclag intervals. The plot shows the bot score distribution for accounts sharing thearticles during each of these lag intervals. (b) Targeting of influentials. We plotthe average number of followers of Twitter users who are mentioned (or repliedto) by accounts that link to the most viral 1000 stories. The mentioning accountsare aggregated into three groups by bot score percentile. Error bars indicatestandard errors. Inset: Distributions of follower counts for users mentioned byaccounts in each percentile group.

7

We examined whether bot operators tended to target people in certain statesby creating the appearance of users posting articles from those locations. Wedid find that the location patterns produced by bots are inconsistent with thegeographic distribution of conversations on Twitter. However, we did not findevidence that bots sharing articles from low-credibility sources used this par-ticular strategy to target swing states. Details of this analysis can be found inSupplementary Materials.

Having found that bots are employed in ways that appear to drive the viralspread of low-credibility articles, let us explore how humans interact with thecontent shared by bots, which may provide insight into whether and how botsare able to affect public opinion. Fig. 4(a) shows who retweets whom: humansdo most of the retweeting (Fig. 4(b)), and they retweet articles posted by botsas much as by other humans (Fig. 4(c)). This suggests that collectively, peopleare not able to discriminate between low-credibility content shared by humansversus social bots. It also means that when we observe many accounts exposedto low-credibility information, these are not just bots (re)tweeting it. In fact, wefind that the volume of tweets by likely humans scales super-linearly with thevolume by likely bots, suggesting that the reach of these articles among humansis amplified by social bots. In other words, each amount of sharing activity bybots tends to trigger a disproportionate amount of human engagement. Thesame amplification effect is not observed for articles from fact-checking sources.Details are presented in Supplementary Materials.

Another way to assess the impact of bots in the spread of low-credibilitycontent is to examine their critical role within the diffusion network. Let usfocus on the retweet network [47], where nodes are accounts and connectionsrepresents retweets of messages with links to stories — just like the networksin Fig. 1(b,c), but aggregating across all articles from low-credibility sources.We apply a network dismantling procedure [1]: we disconnect one node at atime and analyze the resulting decrease in the total volume of retweets and inthe total number of unique articles. The more these quantities are reduced bydisconnecting a small number of nodes, the more critical those nodes are inthe network. We prioritize accounts to disconnect based on bot score and, forcomparison, also based on retweeting activity and influence. Further details canbe found in Methods. Unsurprisingly, Fig. 4(d,e) shows that influential nodes aremost critical. The most influential nodes are unlikely to be bots. Disconnectingnodes with high bot score is the second-best strategy for reducing low-credibilityarticles (Fig. 4(d)). For reducing overall post volume, this strategy performs wellwhen more than 10% of nodes are disconnected (Fig. 4(e)). Disconnecting activenodes is not as efficient a strategy for reducing low-credibility articles. Theseresults show that bots are critical in the diffusion network: disconnecting likelybots results in the removal of most low-credibility content.

Finally, we compared the extent to which social bots disseminate contentfrom different low-credibility sources. We considered the most popular sourcesin terms of median and aggregate article posts, and measured the bot scores ofthe accounts that most actively spread their content. As shown in Fig. 5, one site(beforeitsnews.com) stands out for the high degree of automation, but other

8

beforeitsnews.com

ab

c

d e

Figure 4: Impact of bots. (a) Joint distribution of bot scores of accounts thatretweeted links to low-credibility articles and accounts that had originally postedthe links. Color represents the number of retweeted messages in each bin, ona log scale. (b) The top projection shows the distributions of bot scores forretweeters, who are mostly human. (c) The left projection shows the distribu-tions of bot scores for accounts retweeted by likely humans (score below 0.5),with a significant portion of likely bots. (d,e) Dismantling the low-credibilitycontent diffusion network. The priority of disconnected nodes is determined byranking accounts on the basis of different characteristics. The remaining frac-tion of (d) unique articles from low-credibility sources and (e) retweets linkingto those articles is plotted versus the number of disconnected nodes.

9

104 105 106 107

Total Tweet Volume

0.0

0.2

0.4

0.6

0.8M

edia

n Bo

t Sco

re o

f Act

ive

Acco

unts

1

2

3

45

6

7

8

9

101112

13

14

15

161718

19202122

2324

infowarsbreitbartpoliticususatheonionpolitifacttheblazebeforeitsnewsoccupydemocrats

snopesredflagnewstwitchydcclotheslineclickholebipartisanreportwndnaturalnews

redstateyournewswireaddictinginfoactivistpostusuncutveteranstodayopensecretsfactcheck

Figure 5: Popularity and bot support for the top sources. Satire websites areshown in orange, fact-checking sites in blue, and low-credibility sources in red.Popularity is measured by total tweet volume (horizontal axis) and mediannumber of tweets per article (circle area). Bot support is gauged by the medianbot score of the 100 most active accounts posting links to articles from eachsource (vertical axis). Low-credibility sources have greater support by bots, aswell as greater median and/or total volume in many cases.

popular low-credibility sources also have many bots among their promoters.The dissemination of content from satire sites like The Onion and fact-checkingwebsites does not display the same level of automation.

3 Discussion

Our analysis provides quantitative empirical evidence of the key role played bysocial bots in the spread of low-credibility content. Relatively few accounts areresponsible for a large share of the traffic that carries misinformation. Theseaccounts are likely bots, and we uncovered two manipulation strategies theyuse. First, bots are particularly active in amplifying content in the very earlyspreading moments, before an article goes “viral.” Second, bots target influen-tial users through replies and mentions. People are vulnerable to these kinds ofmanipulation, retweeting bots who post low-credibility content just as much asthey retweet other humans. As a result, bots amplify the reach of low-credibilitycontent, to the point that it is statistically indistinguishable from that of fact-checking articles. Successful low-credibility sources in the U.S., including thoseon both ends of the political spectrum, are heavily supported by social bots.Social media platforms are beginning to acknowledge these problems and deploycountermeasures, although their effectiveness is hard to evaluate [60, 35, 27].

The present findings are not in contradiction with the recent work by Vosoughiet al. [57]. Their analysis is based on a small subset of articles that are fact-checked, whereas the present work considers a much broader set of articles from

10

low-credibility sources, most of which are not fact-checked. In addition, theanalysis of Vosoughi et al. did not consider an important mechanism by whichbots can amplify the spread of an article, namely, by resharing links originallyposted by human accounts. Because of these two methodological differences,the present analysis provides new evidence about the role played by bots.

Our results are robust with respect to various choices. First, using a morerestrictive criterion for selecting low-credibility sources, based on a consensusamong several news and fact-checking organizations (see Methods), yields qual-itatively similar results, leading to the same conclusions. Second, our analysisabout active spreaders of low-credibility content being likely bots is robust withrespect to the activity threshold used to identify the most active spreaders.Furthermore, activity and bot scores are uncorrelated with account activity vol-ume. Third, the conclusions are not affected by the use of different bot-scorethresholds to separate social bots and human accounts. Details about all ofthese robustness analyses can be found in Supplementary Materials.

Our findings demonstrate that social bots are an effective tool to manipulatesocial media. While the present study focuses on the spread of low-credibilitycontent, such as false news, conspiracy theories, and junk science, similar botstrategies may be used to spread other types of malicious content, such as mal-ware. And although our spreading data is collected from Twitter, there is noreason to believe that the same kind of abuse is not taking place on other digi-tal platforms as well. In fact, viral conspiracy theories spread on Facebook [12]among the followers of pages that, like social bots, can easily be managed au-tomatically and anonymously. Furthermore, just like on Twitter, false claimson Facebook are as likely to go viral as reliable news [41]. While the difficultyto access spreading data on platforms like Facebook is a concern, the growingpopularity of ephemeral social media like Snapchat may make future studies ofthis type of abuse all but impossible.

The results presented here suggest that curbing social bots may be an effec-tive strategy for mitigating the spread of low-credibility content, and that thebot score might provide a useful signal to prioritize accounts for further review.Progress in this direction may be accelerated through partnerships between so-cial media platforms and academic research [27]. For example, our lab and oth-ers are developing machine learning algorithms to detect social bots [14, 51, 54].The deployment of such tools is fraught with peril, however. While platformshave the right to enforce their terms of service, which forbid impersonation anddeception, algorithms do make mistakes. Even a single false-positive error lead-ing to the suspension of a legitimate account may foster valid concerns aboutcensorship. This justifies current human-in-the-loop solutions, which unfortu-nately do not scale with the volume of abuse that is enabled by software. Itis therefore imperative to support research both on improved abuse detectionalgorithms and on countermeasures that take into account the complex inter-play between the cognitive and technological factors that favor the spread ofmisinformation [30].

An alternative strategy would be to employ CAPTCHAs [56], challenge-response tests to determine whether a user is human. CAPTCHAs have been

11

deployed widely and successfully to combat email spam and other types of onlineabuse. Their use to limit automatic posting or resharing of news links couldhelp stem bot abuse by increasing its cost, but also add undesirable friction tobenign applications of automation by legitimate entities, such as news mediaand emergency response coordinators. These are hard trade-offs that must bestudied carefully as we contemplate ways to address the fake news epidemics.

4 Methods

Data about articles shared on Twitter was collected through Hoaxy, an openplatform developed at Indiana University to track the spread of claims and factchecking [47]. A search engine, interactive visualizations, and open-source soft-ware are freely available (hoaxy.iuni.iu.edu). The data is accessible througha public API. Further details are presented in Supplementary Materials.

We started the collection in mid-May 2016 with 70 low-credibility sites andadded 50 more in mid-December 2016. The collection period for the presentanalysis extends until the end of March 2017. During this time, we collected389,569 articles from these 120 sites. We also tracked 15,053 stories published byindependent fact-checking organizations, such as snopes.com, politifact.com,and factcheck.org. We did not exclude satire because many low-credibilitysources label their content as satirical, and satire that goes “viral” is sometimesmistaken for real news. The Onion is the satirical source with the highest totalvolume of shares. We repeated our analyses of most viral articles (e.g., Fig. 3(a))with articles from theonion.com excluded and the results were not affected.

The full list of 120 sources is reported in Supplementary Materials. Wealso repeated the analysis using a more restrictive criterion for selecting low-credibility sources, based on a consensus among three or more news and fact-checking organizations. This yields 327,840 articles (86% of the total) from65 low-credibility sources, also listed in Supplementary Materials, where weshow that the results are robust with respect to these different source selectioncriteria.

Our analysis does not require a complete list of low-credibility sources, butdoes rely on the assumption that many articles published by these sources canbe classified as some kind of misinformation or unsubstantiated information. Tovalidate this assumption, we checked the content of a random sample of articles.For the purpose of this verification, we adopted a definition of “misinformation”that follows industry convention and includes the following classes: fabricatedcontent, manipulated content, imposter content, false context, misleading con-tent, false connection, and satire [58]. To these seven categories we also addedarticles whose claims could not be verified. We found that fewer that 15% ofarticles could be verified. More details are available in Supplementary Materials.

Using the filtering endpoint of Twitter’s public streaming API, we col-lected 13,617,425 public posts that included links to articles from low-credibilitysources and 1,133,674 public posts linking to fact checks. This is the completeset of tweets linking to these articles in the study period, and not a sample (see

12

hoaxy.iuni.iu.edu

snopes.com

politifact.com

factcheck.org

theonion.com

Supplementary Materials for details). We extracted metadata about the sourceof each link, the account that shared it, the original poster in case of retweet orquoted tweet, and any users mentioned or replied to in the tweet.

We transformed links to canonical URLs to merge different links referring tothe same article. This happens mainly due to shortening services (44% links areredirected) and extra parameters (34% of URLs contain analytics tracking pa-rameters), but we also found websites that use duplicate domains and snapshotservices. Canonical URLs were obtained by resolving redirection and removinganalytics parameters.

The bot score of Twitter accounts is computed using the Botometer classifier,which evaluates the extent to which an account exhibits similarity to the charac-teristics of social bots [54]. The system is based on a supervised machine learningalgorithm leveraging more than a thousand features extracted from public dataand meta-data about Twitter accounts. These features include various descrip-tors of information diffusion networks, user metadata, friend statistics, tempo-ral patterns of activity, part-of-speech and sentiment analysis. The classifier istrained using publicly available datasets of tens of thousands of Twitter usersthat include both humans and bots of varying sophistication. The Botometersystem is available through a public API (botometer.iuni.iu.edu). It has alsobeen employed in other studies [57, 61] and is widely adopted, serving hundredsof thousand requests daily.

For the present analysis, we use the Twitter Search API to collect up to200 of an account’s most recent tweets and up to 100 of the most recent tweetsmentioning the account. From this data we extract the features used by theBotometer classifier. We use logistic calibration to make the bot scores calcu-lated by the classifier easier to interpret as confidence levels (see SupplementaryMaterials).

There is no crisp, binary classification of accounts as human or bot, as thereare many types of bots and humans using different levels of automation. Ac-cordingly, Botometer provides a score (percentage) rather than a binary classifi-cation. Nevertheless, the model can effectively discriminate between human andbot accounts of different nature; five-fold cross-validation yields an area underthe ROC curve of 94% [54]. A value of 50% indicates random accuracy and100% means perfect accuracy. When a binary classification is needed, we use athreshold of 0.5, which maximizes accuracy [54]. See Supplementary Materialsfor further details about bot classification and its robustness.

In the targeting analysis (Fig. 3(b)), we exclude mentions of sources usingthe pattern “via @screen name.”

The network studied in the dismantling analysis (Fig. 4(d,e)) is based onretweets with links to articles from low-credibility sources, posted before theelection (May 16–Nov 7, 2016). The network has 630,368 nodes (accounts) and2,236,041 directed edges. Each edge is weighted by the number of retweets be-tween the same pair of accounts. When an account is disconnected, all of its in-coming and outgoing edges are removed. When we disconnect a retweeting nodei that was in turn retweeted by some node j, only i is removed because in theTwitter metadata, each retweet connects directly to the account that originated

13

botometer.iuni.iu.edu

the tweet. Given the directionality of edges, retweeting activity is measured bynode in-strength (weighted in-degree) and influence by out-strength (weightedout-degree).

Acknowledgments. We are grateful to Ben Serrette and Valentin Pentchevof the Indiana University Network Science Institute (iuni.iu.edu), as well asLei Wang for supporting the development of the Hoaxy platform. Pik-MaiHui assisted with the dismantling analysis. Clayton A. Davis developed theBotometer API. Nic Dias provided assistance with claim verification. We arealso indebted to Twitter for providing data through their API. C.S. thanks theCenter for Complex Networks and Systems Research (cnets.indiana.edu) forthe hospitality during his visit at the Indiana University School of Informat-ics, Computing, and Engineering. He was supported by the China ScholarshipCouncil. G.L.C. was supported by IUNI. The development of the Botometerplatform was supported in part by DARPA (grant W911NF-12-1-0037). The de-velopment of the Hoaxy platform was supported in part by the Democracy Fund.A.F. and F.M. were supported in part by the James S. McDonnell Foundation(grant 220020274) and the National Science Foundation (award CCF-1101743).The funders had no role in study design, data collection and analysis, decisionto publish or preparation of the manuscript.

Appendix: Supplementary Background

Tracking abuse of social media has been a topic of intense research in recentyears. The analysis in the main text leverages Hoaxy, a system focused ontracking the spread of links to articles from low-credibility and fact-checkingsources [46]. Here we give a brief overview of other systems designed to monitorthe spread of misinformation on social media. This is related to the problemsof mining and detecting misinformation and fake news, which are the subjectsof recent surveys [64, 48].

Beginning with the detection of simple instances of political abuse like as-troturfing [43], researchers noted the need for automated tools for monitoringsocial media streams and detecting manipulation or misinformation. Severalsuch systems have been proposed in recent years, each with a particular focusor a different approach. The Truthy system [43] relied on network analysistechniques to classify memes, such as hashtags and links. TraceMiner [63] alsomodels the propagation of messages, but by inferring embeddings of social me-dia users with social network structures. The TweetCred system [6, 18] focuseson content-based features and other kind of metadata, and distills a measureof overall information credibility. The Hierarchical Credibility Network [24]considers credibility as propagating through a three-layer network consisting ofevent, sub-events, and messages classified based on their features.

Specific systems have been proposed to detect rumors [66]. These include Ru-morLens [44], TwitterTrails [34], FactWatcher [19], and News Tracer [32]. Thenews verification capabilities of these systems range from completely automatic

14

iuni.iu.edu

cnets.indiana.edu

(TweetCred), to semi-automatic (TwitterTrails, RumorLens, News Tracer). Inaddition, some of them let the user explore the propagation of a rumor with aninteractive dashboard (TwitterTrails, RumorLens). These systems vary in theircapability to monitor the social media stream automatically, but in all cases theuser is required to enter a seed rumor or keyword to operate them.

Our analysis is based on the spread of content from low-credibility sourcesrather than focusing on individual stories that are labeled as misinformation.Due to the impossibility to fact-check millions of articles, this approach of us-ing sources as proxies for misinformation labels is increasingly adopted in theliterature cited in the main text, and more [53, 49, 65, 3, 17].

Since misinformation can be propagated by coordinated online campaigns,it is important to detect whether a meme is being artificially promoted. Ma-chine learning has been applied successfully to the task of early discriminatingbetween trending memes that are either organic or promoted by means of ad-vertisement [55].

Finally, there is a growing body of research on social bot detection. The levelof sophistication of bot-based manipulation can vary greatly [5]. As discussedin the main text, there is a large gray area between human and completelyautomated accounts. So-called cyborgs are accounts used to amplify contentgenerated by humans [7]. It is possible that a significant portion of the manip-ulation discussed in this paper, aimed to amplify the spread of low-credibilitycontent, is carried out by this kind of bot. The Botometer system used in thispaper has been publicly available for a few years [10]. Its earliest version wastrained on simple spam bots, detected through a social honeypot system [59, 28].The version used here was trained on public datasets that also included moresophisticated bots.

Appendix: Supplementary Methods

List of Sources

Our list of low-credibility sources was obtained by merging several lists compiledby third-party news and fact-checking organizations or experts. It should benoted that these lists were compiled independently of each other, and as aresult they have uneven coverage. However, there is overlap between them.The full list of sources is shown in Table 1. Some lists annotate sources indifferent categories. In the case of OpenSources (www.opensources.co), weonly considered sources tagged with any of the following labels: fake, satire,bias, conspiracy, rumor, state, junksci, clickbait, hate. In the case of Starbird’slist of alternative domains [49], we considered those with primary orientationcoded as one of conspiracy theorists, political agenda, tabloid—clickbait news.

For robustness analysis (below), we also consider a “consensus” subset ofsites that are each listed among low-credibility sources by at least three orga-nizations or experts. This subset includes 65 sources, also shown in Table 1.We track 10,663,818 tweets (79% of the total) with links to 327,840 articles

15

www.opensources.co

(86% of the total) from consensus low-credibility source, generated by 1,135,167accounts (84% of the total).

We additionally tracked the websites of seven independent fact-checking or-ganizations: badsatiretoday.com, factcheck.org, hoax-slayer.com,1 opensecrets.org, politifact.com, snopes.com, and truthorfiction.com. In April 2017we added climatefeedback.org, which does not affect the present analysis.

Table 1: Low-credibility sources. For each source, we indicate which lists includeit. The lists are: Fake News Watch (FNW), OpenSources (OS), Daily Dot (DD),US News & World Report (US), New Republic (NR), CBS, Urban Legends (UL),NPR, Snopes Field Guide (Sn), Starbird Alternative Domains (KS), BuzzFeedNews (BF), and PolitiFact (PF). Table headers link to the original lists. Thedate indicates when Hoaxy started following a source: June 29 or December 20,2016. Consensus sources (in three or more lists) are shown in italics.

Source FNW OS DD US NR CBS UL NPR Sn KS BF PF Date21stcenturywire.com X X X X Jun70news.wordpress.com X X X Decabcnews.com.co X X X X X Decactivistpost.com X X X X X Junaddictinginfo.org X X X Decamericannews.com X X X X X Junamericannewsx.com X Decamplifyingglass.com X Junanonews.co X Decbeforeitsnews.com X X X X X Junbigamericannews.com X X Junbipartisanreport.com X X Decbluenationreview.com X X Decbreitbart.com X X X Decburrardstreetjournal.com X X X Deccallthecops.net X X X Decchristiantimes.com X Decchristwire.org X X X Junchronicle.su X X Juncivictribune.com X X X X X X Junclickhole.com X X X X Juncoasttocoastam.com X X Juncollective-evolution.com X Decconsciouslifenews.com X X X X Junconservativeoutfitters.com X X X Deccountdowntozerotime.com X X X Juncounterpsyops.com X X Juncreambmp.com X X X Jundailybuzzlive.com X X X X Jundailycurrant.com X X X X Jundailynewsbin.com X Decdcclothesline.com X X X Jundemyx.com X Decdenverguardian.com X X X Decderfmagazine.com X X Jundisclose.tv X X X X Jun

Continued on next page

1hoax-slayer.com includes its older version hoax-slayer.net.

16

badsatiretoday.com

factcheck.org

hoax-slayer.com

opensecrets.org

opensecrets.org

politifact.com

snopes.com

truthorfiction.com

climatefeedback.org

http://archive.is/4x42V

http://www.opensources.co/

http://www.dailydot.com/layer8/fake-news-sites-list-facebook/

http://www.usnews.com/news/national-news/articles/2016-11-14/avoid-these-fake-news-sites-at-all-costs

https://newrepublic.com/article/118013/satire-news-websites-are-cashing-gullible-outraged-readers

http://www.cbsnews.com/pictures/dont-get-fooled-by-these-fake-news-sites/

http://urbanlegends.about.com/od/Fake-News/tp/A-Guide-to-Fake-News-Websites.htm

http://www.npr.org/sections/alltechconsidered/2016/11/23/503146770/npr-finds-the-head-of-a-covert-fake-news-operation-in-the-suburbs

http://www.snopes.com/2016/01/14/fake-news-sites/

https://docs.google.com/spreadsheets/d/1lk3pFSc5wo3OfJc8ekONqO3MJCCigqe8SBSYwLYlHLo

https://github.com/BuzzFeedNews/2017-12-fake-news-top-50/tree/master/data

http://www.politifact.com/punditfact/article/2017/apr/20/politifacts-guide-fake-news-websites-and-what-they/

hoax-slayer.com

hoax-slayer.net

Table 1 – continued from previous pageSource FNW OS DD US NR CBS UL NPR Sn KS BF PF Dateduffelblog.com X X X X Junduhprogressive.com X X Junempireherald.com X X X X Decempirenews.net X X X X X X X X Junempiresports.co X X X X X X X Junen.mediamass.net X X X X Junendingthefed.com X Decenduringvision.com X X X Junflyheight.com X Decfprnradio.com X X Junfreewoodpost.com X X X X Decgeoengineeringwatch.org X X Junglobalassociatednews.com X X X Decglobalresearch.ca X X X Jungomerblog.com X Jungovtslaves.info X X X Jungulagbound.com X X Junhangthebankers.com X X Junhumansarefree.com X X Junhuzlers.com X X X X X X X X Junifyouonlynews.com X X Decinfowars.com X X X X X X Junintellihub.com X X X Junitaglive.com X Junjonesreport.com X X Junlewrockwell.com X X X Junliberalamerica.org X Declibertymovementradio.com X X Junlibertytalk.fm X X Junlibertyvideos.org X X Junlightlybraisedturnip.com X Decnationalreport.net X X X X X X X X X X Junnaturalnews.com X X X X Junncscooper.com X X X Decnewsbiscuit.com X X X X Junnewslo.coma X X X X X X Junnewsmutiny.com X X X Junnewswire-24.com X X Junnodisinfo.com X X X Junnow8news.com X X X X X Decnowtheendbegins.com X X Junoccupydemocrats.com X X X Decother98.com X X Decpakalertpress.com X X Junpoliticalblindspot.com X X Junpoliticalears.com X X Junpoliticops.coma X X X X X Junpoliticususa.com X Decprisonplanet.com X X Junreact365.com X X X X X Decrealfarmacy.com X X Junrealnewsrightnow.com X X X X X X Junredflagnews.com X X X X Junredstate.com X X Decrilenews.com X X X X X X Jun

Continued on next page

17













Table 1 – continued from previous pageSource FNW OS DD US NR CBS UL NPR Sn KS BF PF Daterockcitytimes.com X Junsatiratribune.com X X X X Decstuppid.com X X X Dectheblaze.com X Decthebostontribune.com X X X Decthedailysheeple.com X X X Junthedcgazette.comb X X X X Junthefreethoughtproject.com X X X Decthelapine.ca X X Junthenewsnerd.com X X X X Juntheonion.com X X X X X X X Juntheracketreport.com X X X X Dectherundownlive.com X X Junthespoof.com X X X Juntheuspatriot.com X X Juntruthfrequencyradio.com X X Juntwitchy.com X Decunconfirmedsources.com X X JunUSAToday.com.co X X Decusuncut.com X X Decveteranstoday.com X X X Junwakingupwisconsin.com X X Junweeklyworldnews.com X X X X Junwideawakeamerica.com X Junwinningdemocrats.com X Decwitscience.org X X X Junwnd.com X Decworldnewsdailyreport.com X X X X X X X Junworldtruth.tv X X X X X Junyournewswire.com X X X X X Dec

a newslo.com and politicops.com are mirrors of politicot.com.b thedcgazette.com is a mirror of dcgazette.com.

Hoaxy Data

The back-end component of Hoaxy collects public tweets that link to a prede-fined list of websites. The use of the “POST statuses/filter” endpoint of theTwitter streaming API, together with the total volume of tweets collected, whichis well below 1% of all public tweets, guarantee that we obtain all tweets linkingto the sites in our list, and not just a sample of the tweets with these links. Inaddition, Hoaxy crawls all tracked websites and indexes all their articles, sup-porting a full-text search engine that allows users to find articles matching agiven query. Furthermore, users can select subsets of these articles to visualizetheir spread on Twitter. To this end, Hoaxy matches the indexed articles withthe tweets in our database and constructs networks based on retweets, mentions,replies, and quoted tweets. The front-end visualizes these networks interactively,allowing users to explore the accounts (nodes) and the tweets (edges) that makeup these networks. The system makes all the data accessible to the publicthrough a website (hoaxy.iuni.iu.edu) and an API.

Our analysis focuses on the period from mid-May 2016 to the end of March

18













newslo.com

politicops.com

politicot.com

thedcgazette.com

dcgazette.com

hoaxy.iuni.iu.edu

Jul Aug Sep Oct Nov Dec Jan2017

Feb Mar

0.25

0.50

0.75

1.00

1.25

1.50

Artic

les

×104

ArticlesTweets/article ratioArticles/site ratio

0

50

100

150

200

Ratio

Figure 6: Weekly tweeted low-credibility articles, tweets/article ratio and ar-ticles/site ratio. The collection was briefly interrupted in October 2016. InDecember 2016 we expanded the set of low-credibility sources, from 70 to 120websites.

2017. During this time, we collected 15,053 and 389,569 articles from fact-checking and low-credibility sources, respectively. The Hoaxy system collected1,133,674 public posts that included links to fact checks and 13,617,425 pub-lic posts linking to low-credibility articles. As shown in Fig. 6, low-credibilitywebsites each produced approximately 100 articles per week, on average. To-ward the end of the study period, this content was shared by approximately 30tweets per article per week, on average. However, as discussed in the main text,success is extremely heterogeneous across articles. This is the case irrespec-tive of whether we measure success through the number of tweets (Fig. 7(a))or accounts (Fig. 7(b)) sharing an article. For both popularity measures, thedistributions are very broad and basically indistinguishable across articles fromlow-credibility vs. fact-checking sources.

Content Analysis

Our analysis considers content published by a set of websites flagged as sourcesof misinformation by third-party journalistic and fact-checking organizations(Table 1). This source-based approach relies on the assumption that most of thearticles published by our compilation of sources are some type of misinformation,as we cannot fact-check each individual article. We validated this assumptionby estimating the rate of false positives, i.e, verified articles, in the corpus.We manually evaluated a random sample of articles (N = 50) drawn from ourcorpus, stratified by source. We considered only those sources whose articleswere tweeted at least once in the period of interest. To draw an article, wefirst selected a source at random with replacement, and then chose one of the

19

100 101 102 103 104

(a) Number of Tweets

10 9

10 7

10 5

10 3

10 1

PDF

Low-credibilityFact checking

100 101 102 103 104

(b) Number of Users

10 8

10 6

10 4

10 2

100

PDF

Low-credibilityFact Checking

Figure 7: Probability distributions of popularity of articles from low-credibilityand fact-checking sources, measured by (a) the number of tweets and (b) thenumber of accounts sharing links to an article.

20

articles it published, again at random but without replacement. We repeatedour analysis on an additional sample (N = 50) in which the chances of drawingan article are proportional to the number of times it was tweeted. This ‘sampleby tweet’ is thus biased toward more popular sources.

It is important to note that articles with unverified claims are sometimesupdated after being debunked. This happens usually late, after the article hasspread, and could lead to overestimating the rate of false positives. To mitigatethis phenomenon, the earliest snapshot of each article was retrieved from theWayback Machine at the Internet Archive (archive.org). If no snapshot wasavailable, we retrieved the version of the page current at verification time. If thepage was missing from the website or the website was down, we reviewed thetitle and body of the article crawled by Hoaxy. We gave priority to the currentversion over the possibly more accurate crawled version because, in decidingwhether a piece of content is misinformation, we want to consider any form ofvisual evidence included with it, such as images or videos.

After retrieving all articles in the two samples, each article was evaluatedindependently by two reviewers (two of the authors), using a rubric summarizedin Fig. 8. Each article was then labeled with the majority label, with ties brokenby a third reviewer (another author). Fig 9 shows the results of the analysis. Wereport the fractions of articles that were verified and that could not be verified(inconclusive), out of the total number of articles that contain any factual claim.The rate of false positives is below 15% in both samples.

Concentration

In the main text we use the Gini coefficient to calculate the concentration ofposting activity for an article, based on the accounts that post links to thearticle. For each article, the Lorenz curve plots the cumulative share of tweetsversus the cumulative share of accounts generating these tweets. The Ginicoefficient is the ratio of the area that lies between the line of equality (diagonal)and the Lorenz curve, over the total area under the line of equality. A highcoefficient indicates that a small subset of accounts was responsible for a largeportion of the posts.

Bot Score Calibration

Calibration methods are applicable when a machine learning classifier outputsprobabilistic scores. Well-calibrated classifiers are probabilistic models for whichthe estimates can be directly interpreted as confidence levels. We use Platt’sscaling [38], a logistic regression model trained on classifier outputs, to calibratethe bot score computed by the Botometer classifier.

We present the mapping between raw and calibrated scores in Fig. 10. Thecalibration only changes scores within the unit interval, but retains the rankingamong accounts. The figure also shows reliability diagrams for raw and cali-brated scores [11]. We split the unit interval into 20 bins. Each instance inthe training data set is assigned to a bin based on its predicted (raw) score.

21

archive.org

any factual claim? excluded

no

satire or parody site?

yes

no

satireyes

yesimposter or fantasy site?

already fact-checked?

no

debunked?

yes

verifiedno

false connection between article and

claim in headline, title photo, or title photo

caption?

no

manipulated or false-context photo or

video evidence?

no

fabricated?

no

misleading?

no

echoed by reputable sources?

no

yes

yes

yes

yes

yes

yes

no

start

misinformation

inconclusive

Figure 8: Flowchart summarizing the annotation rubric employed in the contentanalysis.

22

Sample by source

Sample by tweet

MisinformationInconclusiveVerified

76.60% 82%8.50% 4%

14.90% 14%

14.9%

8.5%

76.6%

Sample by source

MisinformationInconclusiveVerified

14.0%

4.0%

82.0%

Sample by tweet

Figure 9: Content analysis based on two samples of articles. Sampling bysource gives each source equal representation, while sampling by tweets biasesthe analysis toward more popular sources. We excluded from the sample bysource three articles that did not contain any factual claims. Satire articles aregrouped with misinformation, as explained in the main text.

Figure 10: Bot score calibration curves. Left: The calibration mapping functionprojects raw classifier output to calibrated scores. Right: Reliability curves plotthe true positive rate against the mean predicted scores. The calibrated curveindicates higher reliability because it is closer to the unbiased diagonal line.

23

Table 2: Analysis of likely bots and their content spreading activity based ona random sample of Twitter accounts sharing at least one article from a low-credibility source.

Total Likely bots PercentageAccounts 915 54 6%Tweets with low-credibility articles 11,656 3,587 31%Unique low-credibility articles 7,726 2,608 34%Tweets with fact-checks 598 4 0.7%Unique fact-checks 395 3 0.8%

For each bin, the mean predicted score is computed and compared against thefraction of true positive cases. In a well-calibrated model, the points align tothe diagonal.

Bot Classification

To show that a few social bots are disproportionately responsible for the spreadof low-credibility content, we considered a random sample of accounts thatshared at least one article from a low-credibility source, and evaluated theseaccounts using the bot classification system Botometer. Out of 1,000 sampledaccounts, 85 could not be inspected because they had been either suspended,deleted, or turned private. For each of the remaining 915, Botometer returneda bot score estimating the level of automation of the account. To quantify howmany accounts are likely bots, we transform bot scores into binary assessmentsusing a threshold of 0.5. This is a conservative choice to minimize false negativesand especially false positives, as shown in prior work (cit. in main text). Table 2shows the fraction of accounts with scores above the threshold. To give a senseof their overall impact in the spreading of low-credibility content, Table 2 alsoshows the fraction of tweets with articles from low-credibility sources postedby accounts that are likely bots, and the number of unique articles includedin those tweets overall. As a comparison, we also tally the fact-checks sharedby these accounts, showing that bot accounts focused on sharing low-credibilitycontent and ignored fact-checking content.

In the main text we show the distributions of bot scores for this sample of ac-counts, as well as for a sample of accounts that have been most active in spread-ing low-credibility content (super-spreaders). To select the super-spreaders, weranked all accounts by how many tweets they posted with links to low-credibilitysources, and considered the top 1,000 accounts. We then performed the sameclassification steps discussed above. For the same reasons mentioned above, wecould not obtain scores for 39 of these accounts, leaving us with a sample of 961scored accounts.

24

100 101 102 103

Tweets per Account Sharing the Same Article

10 6

10 4

10 2

100

CCDF

Figure 11: Cumulative distribution of repetitions, i.e., the number of times asingle account tweets the same link to an article from a low-credibility source.

Appendix: Supplementary Text

Super-Spreaders of Low-Credibility Content

In the main text we show that the more popular a low-credibility article, themore its posting activity is concentrated around a relative small number ofactive accounts. We also find that the most active spreaders of content fromlow-credibility sources are more likely to be social bots. To further illustratethe anomalous activity patterns of these “super-spreaders,” Fig. 11 plots thedistribution of repeated tweets by individual accounts sharing the same low-credibility article. While it is normal behavior for a person to share an articleonce, the long tail of the distribution highlights inorganic, automated support.A single account posting the same article over and over — hundreds or thousandsof times in some cases — is likely controlled by software.

Bots Targeting Influentials

The main text discusses a strategy used by bots, by which influential users arementioned in tweets that link to low-credibility content. Bots seem to employthis targeting strategy repetitively. Fig. 12 offers an illustration: in this example,a (now suspended) single account produced 19 tweets linking to the article shownin the figure and mentioning @realDonaldTrump.

Amplification by Bots

The analysis in the main text focuses on the role of bots in the spread of articlesfrom low-credibility sources, assuming that bots do not equally support thespread of articles from fact-checking sources. In fact, we show in the main

25

POTUS

TheEllenShow

Reuters

BarackObama

cnnbrk

BBCBreaking

NFL

FoxNews

nytimes

CNN

ladygaga

realDonaldTrump

POTUS44

BBCWorld

TheEconomist

YouTube

Figure 12: Example of targeting for the article Report: three million votesin presidential election cast by illegal aliens, published by Infowars.com onNovember 14, 2016 and shared over 18 thousand times on Twitter. Only aportion of the diffusion network is shown. Nodes stand for Twitter accounts,with size representing number of followers. Links illustrate how the articlespreads: by retweets and quoted tweets (blue), or by replies and mentions (red).

26

Infowars.com

Figure 13: Fraction of tweets linking to news articles that are posted by accountswith bot score above a threshold, as a function of the popularity of the linkedarticles. We see different bot activity for articles from low-credibility (left)versus fact-checking (right) sources.

27

Figure 14: For links to articles from low-credibility (left) and fact-checking(right) sources, the number of tweets by accounts with bot score above a thresh-old is plotted versus the number of tweets by accounts with bot score below thethreshold. The dashed lines are guides to the eye, showing linear growth. Asuper-linear relationship is a signature of amplification by bots.

text that articles from low-credibility and fact-checking sources spread throughdifferent mixes of original tweets, retweets, and replies. And we also find thatlow-credibility sources have greater support from bots than fact-checking andsatire sources. To further confirm the assumption that bots do not play an equalrole in the spread of fact-checking articles, we observe in Fig. 13 that the fractionof tweets posted by likely bots is much higher for articles from low-credibilitysources. Further, the fraction depends on popularity in the case of articles fromlow-credibility sources (it gets diluted for more viral ones), whereas it is flatterfor articles from fact-checking sources. Here, bots and humans are separatedbased on a threshold in the bot score. These findings are robust to the choiceof threshold, and point to selective amplification of articles from low-credibilitysources by bots.

To focus on amplification more directly, let us consider how exposure tohumans varies with activity by bots. Fig. 14 estimates the numbers of tweetsby likely humans/bots, using a threshold on bot scores to separate them. Re-sults are robust with respect to the choice of threshold. For articles from low-credibility sources, the estimated number of human tweets per article growsfaster than the estimated number of bot tweets for article. For fact-checkingarticles, instead, we find a linear relationship. In other words, bots seem toamplify the reach of articles from low-credibility sources, but not the reach ofarticles from fact-checking sources.

28

log ratio

−0.6−0.4−0.2+0.0+0.2+0.4+0.6

Figure 15: Map of location targeting by bots, relative to a baseline. To gaugesharing activity by likely bots, we considered 793,909 tweets in the period be-tween August and October 2016 posting links to low-credibility articles by ac-counts with bot score above 0.5 that reported a U.S. state location in theirprofile. We compared the tweet frequencies by states with those expected froma sample of 1,393,592,062 tweets in the same period, obtained from a 10% ran-dom sample of public posts from the Twitter streaming API. Colors indicatelog ratios after normalization: positive values indicate states with higher thanexpected bot activity. Using different thresholds yields similar maps.

Geographic Targeting

We examined whether bots (or rather their programmers) tended to target vot-ers in certain states by creating the appearance of users posting from thoselocations. To this end, we considered accounts with bot scores above a thresh-old that shared articles from low-credibility sources in the three months beforethe election, and focused on those accounts with a state location in their profile.The location is self-reported and thus trivial to fake. We compared the distri-bution of bot account locations across states with a baseline obtained from alarge sample of tweets in the same period. A χ2 test indicates that the locationpatterns produced by bots are inconsistent with the geographic distribution ofconversations on Twitter (p < 10−4). This suggests that as part of their dis-guise, social bots are more likely to report certain locations than others. Forexample, Fig. 15 shows higher than expected bot activity appearing to origi-nate from the states of Montana, Maine, and Wisconsin. However, we did notfind evidence that bots used this particular strategy for geographic targeting ofswing states (Table 3).

29

Table 3: Spearman rank correlation ρ between U.S. state absolute expected votemargin and relative bot activity. Low margins indicate 2016 tight races (swingstates). Data from FiveThirtyEight (projects.fivethirtyeight.com/2016-election-forecast/). Bot activity is measured by numbers of tweets postinglinks to low-credibility articles by accounts with bot score above threshold thatreported a U.S. state location in their profile, in the period between August andOctober 2016. The counts by states are normalized by those expected from asample of 1,393,592,062 tweets in the same period, obtained from a 10% randomsample of public posts from the Twitter streaming API. The positive correlationindicates that bot activity was not associated with swing states. The p-valuesare based on two-tailed t-tests.

Bot score threshold Tweets by likely bots ρ p-value0.4 1,135,816 0.28 0.040.5 793,909 0.29 0.040.6 588,751 0.30 0.03

Robustness Analyses

The results in the main text are robust with respect to various choices andassumptions, presented next.

Criteria for selection of sources

We repeated the analyses in the main text using the more restrictive criterion forselecting low-credibility sources, based on a consensus among three or more newsand fact-checking organizations. The 65 consensus sources are listed in Table 1.To carry out these analyses, we inspected 33,115 accounts and could obtain botscores for 32,250 of them; the rest had been suspended or gone private. Theresults are qualitatively similar to those in the main text and support the ro-bustness of the findings, namely: super-spreaders of articles from low-credibilitysources are likely bots (Fig. 16), bots amplify the spread of information fromlow-credibility sources in the early phases (Fig. 17), bots target influential users(Fig. 18), and humans retweet low-credibility content posted by bots (Fig. 19).

Absence of correlation between activity and bot score

Our notion of super-spreader is based upon ranking accounts by activity andtaking those above a threshold. The analysis about super-spreaders of low-credibility content being likely bots assumes that this finding is not explainedby a correlation between activity and bot score. In fact, although the bot clas-sification model does consider volume of tweets as one among over a thousandfeatures, it is not trained in such a way that there is an obvious monotonic rela-tion between activity and bot score. A simple monotonic relation between over-all volume and bot score would lead to many false positives, because many botsproduce very few tweets or appear to produce none (they delete their tweets);

30

projects.fivethirtyeight.com/2016-election-forecast/

projects.fivethirtyeight.com/2016-election-forecast/

0.0 0.2 0.4 0.6 0.8 1.0Bot Score

0

2

4

6

8

Prob

abilit

y De

nsity

Most ActiveRandom Sample

Figure 16: Bot score distributions for super-spreaders vs. randomly selectedsharers of links to low-credibility sources selected by the consensus criterion.The random sample includes 992 accounts who posted at least one link to anarticle from a low-credibility source. Their bot scores are compared to 997accounts that most actively share such links. The two groups have significantlydifferent scores (p < 10−4 according to a Mann-Whitney U test).

31

100 101 102 103

Lag +1 (seconds)

0.0

0.2

0.4

0.6

0.8

1.0

Bot S

core

Figure 17: Temporal evolution of bot support after the first share of a viralstory from a consensus low-credibility source. We consider a random sample of20,000 accounts out of the 163,563 accounts that participate in the spread ofthe 1,000 most viral articles. After articles from The Onion are excluded, weare left with 42,202 tweets from 13,926 accounts. We align the times when eachlink first appears. We focus on a one-hour early spreading phase following eachof these events, and divide it into logarithmic lag intervals. The plot shows thebot score distribution for accounts sharing the links during each of these lagintervals.

32

Lower Third Middle Third Top ThirdBot Score Percentile of Mentioning Accounts

7

8

9

10

11

Follo

wers

of M

entio

ned

Acco

unts

×106

100

104

108

Figure 18: Average number of followers for Twitter users who are mentioned(or replied to) by a sample of 20,000 accounts that link to the 1,000 most viralarticles from consensus low-credibility sources. We obtained bot scores for 4,006unique mentioning accounts and 4,965 unique mentioned accounts, participatingin 33,112 mention/reply pairs. We excluded 13,817 of these pairs using the “via@screen name” mentioning pattern. The mentioning accounts are aggregatedinto three groups by bot score percentile. Error bars indicate standard errors.Inset: Distributions of follower counts for users mentioned by accounts in eachpercentile group.

33

0.0

0.1

0.2

Pr(x

)

0.00.20.4Pr(y|x 0.5)

0.0 0.2 0.4 0.6 0.8 1.0Bot Score of Retweeter, x

0.0

0.2

0.4

0.6

0.8

1.0

Bot S

core

of T

weet

er, y

100

101

102

103

104

Retw

eets

Figure 19: Joint distribution of the bot scores of accounts that retweeted links toarticles from consensus low-credibility sources and accounts that had originallyposted the links. We considered retweets by a sample of 20,000 accounts thatposted the 1,000 most viral articles. We obtained bot scores for 12,792 tweetingaccounts and 17,664 retweeting accounts, participating in 229,725 retweet pairs.Color represents the number of retweeted messages in each bin, on a log scale.Projections show the distributions of bot scores for retweeters (top) and foraccounts retweeted by likely humans (left).

34

0 10 20 30 40 50 60 70 80 90

Activity Rate (Percentile)

0.0

0.2

0.4

0.6

0.8

1.0

Bot

Sco

re

Figure 20: Distributions of bot scores versus account activity. For this analysiswe randomly selected 48,517 distinct Twitter accounts evaluated by Botometer.Of these, 11,190 were available for crawling their profiles and measuring theiractivity (number of tweets). Bins correspond to deciles in the activity rate. Weshow the average and 95% confidence interval for the bot score distribution ofthe accounts in each activity bin. There is no correlation between activity andbot score (Pearson’s ρ = −0.007).

35

these accounts still get high bot scores. Figure 20 confirms that account activityvolume and bot scores are uncorrelated.

Bot-score threshold values

The results are also not affected by the use of different bot-score thresholds toseparate social bots and human accounts. For example, we experimented withdifferent thresholds and found that they do not change our conclusions thatsuper-spreaders are more likely to be social bots. Figs. 13 and 14 and Table 3show that other findings are also robust with respect to the bot score threshold,even though the estimated percentages of likely humans/bots, and the estimatednumbers of tweets posted by them, are naturally sensitive to the threshold.

References

[1] R. Albert, H. Jeong, and A.-L. Barabasi. Error and attack tolerance ofcomplex networks. Nature, 406(6794):378–382, 2000.

[2] H. Allcott and M. Gentzkow. Social media and fake news in the 2016election. Journal of Economic Perspectives, 31(2):211–236, 2017.

[3] A. Bessi, M. Coletto, G. A. Davidescu, A. Scala, G. Caldarelli, andW. Quattrociocchi. Science vs conspiracy: Collective narratives in theage of misinformation. PLoS ONE, 10(2):1–17, 2015.

[4] A. Bessi and E. Ferrara. Social bots distort the 2016 us presidential electiononline discussion. First Monday, 21(11), 2016.

[5] Y. Boshmaf, I. Muslukhov, K. Beznosov, and M. Ripeanu. The socialbotnetwork: when bots socialize for fame and money. In Proceedings of the27th annual ACM computer security applications conference, pages 93–102,2011.

[6] C. Castillo, M. Mendoza, and B. Poblete. Information credibility on Twit-ter. In Proceedings of the 20th International Conference on World WideWeb, page 675, 2011.

[7] Z. Chu, S. Gianvecchio, H. Wang, and S. Jajodia. Who is tweeting ontwitter: human, bot, or cyborg? In Proceedings of the 26th annual ACMcomputer security applications conference, pages 21–30, 2010.

[8] M. Conover, J. Ratkiewicz, M. Francisco, B. Goncalves, A. Flammini, andF. Menczer. Political polarization on twitter. In Proc. 5th InternationalAAAI Conference on Weblogs and Social Media (ICWSM), 2011.

[9] M. D. Conover, B. Goncalves, A. Flammini, and F. Menczer. Partisanasymmetries in online political activity. EPJ Data Science, 1:6, 2012.

36

[10] C. A. Davis, O. Varol, E. Ferrara, A. Flammini, and F. Menczer. Botornot:A system to evaluate social bots. In Proceedings of the 25th InternationalConference Companion on World Wide Web, pages 273–274, 2016.

[11] M. H. DeGroot and S. E. Fienberg. The comparison and evaluation offorecasters. The statistician, pages 12–22, 1983.

[12] M. Del Vicario, A. Bessi, F. Zollo, F. Petroni, A. Scala, G. Caldarelli, H. E.Stanley, and W. Quattrociocchi. The spreading of misinformation online.Proc. National Academy of Sciences, 113(3):554–559, 2016.

[13] E. Ferrara. Disinformation and Social Bot Operations in the Run Up tothe 2017 French Presidential Election. First Monday, 22(8), 2017.

[14] E. Ferrara, O. Varol, C. Davis, F. Menczer, and A. Flammini. The rise ofsocial bots. Comm. ACM, 59(7):96–104, 2016.

[15] J. Gottfried and E. Shearer. News use across social media platforms 2016.White paper, Pew Research Center, May 2016.

[16] L. Gu, V. Kropotov, and F. Yarochkin. The fake news machine: Howpropagandists abuse the internet and manipulate the public. Trendlabsresearch paper, Trend Micro, 2017.

[17] A. Guess, B. Nyhan, and J. Reifler. Selective Exposure to Misinforma-tion: Evidence from the consumption of fake news during the 2016 USpresidential campaign. Unpublished manuscript, 2018.

[18] A. Gupta, P. Kumaraguru, C. Castillo, and P. Meier. Tweetcred: Real-timecredibility assessment of content on twitter. In International Conferenceon Social Informatics, pages 228–243, 2014.

[19] N. Hassan, A. Sultana, Y. Wu, G. Zhang, C. Li, J. Yang, and C. Yu. Datain, fact out: Automated monitoring of facts by factwatcher. Proc. VLDBEndow., 7(13):1557–1560, Aug. 2014.

[20] N. O. Hodas and K. Lerman. How limited visibility and divided attentionconstrain social contagion. In Proc. ASE/IEEE International Conferenceon Social Computing, 2012.

[21] P. J. Hotez. Texas and its measles epidemics. PLOS Medicine, 13(10):1–5,10 2016.

[22] L. Howell et al. Digital wildfires in a hyperconnected world. In GlobalRisks. World Economic Forum, 2013.

[23] T. Jagatic, N. Johnson, M. Jakobsson, and F. Menczer. Social phishing.Communications of the ACM, 50(10):94–100, October 2007.

37

[24] Z. Jin, J. Cao, Y.-G. Jiang, and Y. Zhang. News credibility evaluation onmicroblog with a hierarchical propagation model. In Proc. IEEE Interna-tional Conference on Data Mining (ICDM), pages 230–239, 2014.

[25] Y. Jun, R. Meng, and G. V. Johar. Perceived social presence reduces fact-checking. Proceedings of the National Academy of Sciences, 114(23):5976–5981, 2017.

[26] D. M. Kahan. Ideology, motivated reasoning, and cognitive reflection. Judg-ment and Decision Making, 8(4):407–424, 2013.

[27] D. Lazer, M. Baum, Y. Benkler, A. Berinsky, K. Greenhill, F. Menczer,M. Metzger, B. Nyhan, G. Pennycook, D. Rothschild, M. Schudson, S. Slo-man, C. Sunstein, E. Thorson, D. Watts, and J. Zittrain. The science offake news. Science, 359(6380):1094–1096, 2018.

[28] K. Lee, J. Caverlee, and S. Webb. Uncovering social spammers: social hon-eypots+ machine learning. In Proceedings of the 33rd international ACMSIGIR conference on Research and development in information retrieval,pages 435–442, 2010.

[29] M. S. Levendusky. Why do partisan media polarize viewers? AmericanJournal of Political Science, 57(3):611—-623, 2013.

[30] S. Lewandowsky, U. K. Ecker, and J. Cook. Beyond misinformation: Under-standing and coping with the “post-truth” era. Journal of Applied Researchin Memory and Cognition, 6(4):353–369, 2017.

[31] W. Lippmann. Public Opinion. Harcourt, Brace and Company, 1922.

[32] X. Liu, A. Nourbakhsh, Q. Li, R. Fang, and S. Shah. Real-time rumordebunking on twitter. In Proceedings of the 24th ACM International onConference on Information and Knowledge Management, CIKM ’15, pages1867–1870, New York, NY, USA, 2015. ACM.

[33] B. Markines, C. Cattuto, and F. Menczer. Social spam detection. In Proc.5th International Workshop on Adversarial Information Retrieval on theWeb (AIRWeb), 2009.

[34] P. T. Metaxas, S. Finn, and E. Mustafaraj. Using twittertrails.com toinvestigate rumor propagation. In Proceedings of the 18th ACM ConferenceCompanion on Computer Supported Cooperative Work & Social Computing,CSCW’15 Companion, pages 69–72, 2015.

[35] A. Mosseri. News feed fyi: Showing more informative links in news feed.press release, Facebook, June 2017.

[36] E. Mustafaraj and P. T. Metaxas. From obscurity to prominence in minutes:Political speech and real-time search. In Proc. Web Science Conference:Extending the Frontiers of Society On-Line, 2010.

38

[37] A. Nematzadeh, G. L. Ciampaglia, F. Menczer, and A. Flammini. How al-gorithmic popularity bias hinders or promotes quality. Preprint 1707.00574,arXiv, 2017.

[38] A. Niculescu-Mizil and R. Caruana. Predicting good probabilities withsupervised learning. In Proceedings of the 22nd International Conferenceon Machine Learning, pages 625–632, 2005.

[39] D. Nikolov, D. F. M. Oliveira, A. Flammini, and F. Menczer. Measuringonline social bubbles. PeerJ Computer Science, 1(e38), 2015.

[40] E. Pariser. The filter bubble: How the new personalized Web is changingwhat we read and how we think. Penguin, 2011.

[41] X. Qiu, D. F. M. Oliveira, A. S. Shirazi, A. Flammini, and F. Menczer.Limited individual attention and online virality of low-quality information.Nature Human Behavior, 1:0132, 2017.

[42] J. Ratkiewicz, M. Conover, M. Meiss, B. Goncalves, A. Flammini, andF. Menczer. Detecting and tracking political abuse in social media. InProc. 5th International AAAI Conference on Weblogs and Social Media(ICWSM), 2011.

[43] J. Ratkiewicz, M. Conover, M. Meiss, B. Goncalves, S. Patil, A. Flammini,and F. Menczer. Truthy: Mapping the spread of astroturf in microblogstreams. In Proceedings of the 20th International Conference Companionon World Wide Web, WWW ’11, pages 249–252, 2011.

[44] P. Resnick, S. Carton, S. Park, Y. Shen, and N. Zeffer. Rumorlens: Asystem for analyzing the impact of rumors and corrections in social media.In Proc. Computational Journalism Conference, 2014.

[45] M. J. Salganik, P. S. Dodds, and D. J. Watts. Experimental study ofinequality and unpredictability in an artificial cultural market. Science,311(5762):854–856, 2006.

[46] C. Shao, G. L. Ciampaglia, A. Flammini, and F. Menczer. Hoaxy: Aplatform for tracking online misinformation. In Proceedings of the 25thInternational Conference Companion on World Wide Web, pages 745–750,2016.

[47] C. Shao, P.-M. Hui, L. Wang, X. Jiang, A. Flammini, F. Menczer, andG. L. Ciampaglia. Anatomy of an online misinformation network. PLoSONE, 13(4):e0196087, 2018.

[48] K. Shu, A. Sliva, S. Wang, J. Tang, and H. Liu. Fake news detectionon social media: A data mining perspective. ACM SIGKDD ExplorationsNewsletter, 19(1):22–36, 2017.

39

[49] K. Starbird. Examining the alternative media ecosystem through the pro-duction of alternative narratives of mass shooting events on twitter. InProceedings of the Eleventh International AAAI Conference on Web andSocial Media (ICWSM), pages 230–239, 2017.

[50] N. Stroud. Niche News: The Politics of News Choice. Oxford UniversityPress, 2011.

[51] V. Subrahmanian, A. Azaria, S. Durst, V. Kagan, A. Galstyan, K. Lerman,L. Zhu, E. Ferrara, A. Flammini, F. Menczer, A. Stevens, A. Dekhtyar,S. Gao, T. Hogg, F. Kooti, Y. Liu, O. Varol, P. Shiralkar, V. Vydiswaran,Q. Mei, and T. Hwang. The darpa twitter bot challenge. IEEE Computer,49(6):38–46, 2016.

[52] C. R. Sunstein. Going to Extremes: How Like Minds Unite and Divide.Oxford University Press, 2009.

[53] C. J. Vargo, L. Guo, and M. A. Amazeen. The agenda-setting power offake news: A big data analysis of the online media landscape from 2014 to2016. New Media & Society, 20(5):2028–2049, 2018.

[54] O. Varol, E. Ferrara, C. A. Davis, F. Menczer, and A. Flammini. Onlinehuman-bot interactions: Detection, estimation, and characterization. InProc. Intl. AAAI Conf. on Web and Social Media (ICWSM), 2017.

[55] O. Varol, E. Ferrara, F. Menczer, and A. Flammini. Early detection ofpromoted campaigns on social media. EPJ Data Science, 6(1):13, 2017.

[56] L. von Ahn, M. Blum, N. J. Hopper, and J. Langford. Captcha: Usinghard ai problems for security. In E. Biham, editor, Advances in Cryptol-ogy — Proceedings of EUROCRYPT 2003: International Conference onthe Theory and Applications of Cryptographic Techniques, pages 294–311.Springer, 2003.

[57] S. Vosoughi, D. Roy, and S. Aral. The spread of true and false news online.Science, 359(6380):1146–1151, 2018.

[58] C. Wardle. Fake news. It’s complicated. White paper, First Draft News,February 2017.

[59] S. Webb, J. Caverlee, and C. Pu. Social honeypots: Making friends with aspammer near you. In Proc. CEAS, 2008.

[60] J. Weedon, W. Nuland, and A. Stamos. Information operations and face-book. white paper, Facebook, April 2017.

[61] S. Wojcik, S. Messing, A. Smith, L. Rainie, and P. Hitlin. Bots in thetwittersphere. White paper, Pew Research Center, April 2018.

40

[62] S. C. Woolley and P. N. Howard. Computational propaganda worldwide:Executive summary. Working Paper 2017.11, Oxford Internet Institute,2017.

[63] L. Wu and H. Liu. Tracing fake-news footprints: Characterizing socialmedia messages by how they propagate. In Proc. 11th ACM InternationalConference on Web Search and Data Mining (WSDM), 2018.

[64] L. Wu, F. Morstatter, X. Hu, and H. Liu. Mining misinformation in socialmedia. In Big Data in Complex and Social Networks, pages 123–152. CRCPress, 2016.

[65] S. Zannettou, T. Caulfield, E. De Cristofaro, N. Kourtelris, I. Leontiadis,M. Sirivianos, G. Stringhini, and J. Blackburn. The web centipede: Un-derstanding how web communities influence each other through the lensof mainstream and alternative news sources. In Proceedings of the 2017Internet Measurement Conference, IMC ’17, pages 405–417, 2017.

[66] A. Zubiaga, A. Aker, K. Bontcheva, M. Liakata, and R. Procter. Detectionand resolution of rumours in social media: A survey. ACM ComputingSurveys, 50, 2018. Forthcoming.

41

The spread of low-credibility content by social bots · retweets a tweet by user Bob, the tweet is...

Documents

Transcript of The spread of low-credibility content by social bots · retweets a tweet by user Bob, the tweet is...