How does Facebook Trends Affect
News Exposure?
Shin Lee
Advised by Professor Nicholas Diakopoulos
Northwestern University
Mathematical Methods and Social Sciences Thesis
1
Acknowledgement
I would like to greatly thank my advisor, Professor Nicholas Diakopoulos for providing
helpful feedback and guidance during my journey producing and writing this thesis. This thesis
would not be where it is today without his wisdom and weekly guidance. I would also like to
thank Professor Joseph Ferrie and Nicole Schneider for being great resources whenever I had
questions or concerns. Lastly, I would like to thank Professor Jeffry Ely for his leadership
overseeing the MMSS program and its students.
Abstract
Facebook has recently been a topic of hot discussion regarding fake news and algorithms
that present bias posts. I wanted to discover whether Facebook is truly posting bias new trends
and whether this may heavily influence the new sources exposure. Using programming and
analytical methods, I investigated personalization by geographic location and demographics in
Facebook’s Trending Topics. I further analyzed the degree of which Facebook personalizes the
news trends per person. I also discover whether Facebook gives priority to certain new sources or
if it gives equal opportunity for all news sources. Lastly, I investigated how often Facebook’s
algorithm updates its news trends and if there were any differences in behavior at different days
of the week. The data collection and analysis were conducted with Python scripts I wrote. My
results show Facebook does not personalize by geo-location and personalizes very slightly by
demographics. Furthermore, my analysis showed Facebook gives more news exposure to liberal
than conservative news sources. Lastly, Facebook’s Trending Topics does not have a consistent
behavior and tends to have a higher number of new cumulative news trends when there is a lot of
news at the time in the world.
I would like to note that this paper goes into detail of basic programming concepts as my
primary audience’s expertise are not in computer science and programming. Another note is that
my advisor plans on furthering my research in the future. Thus, I include some tips for future
development of my thesis throughout this paper.
2
I. Introduction
Facebook has received high levels of attention after the Facebook founder and CEO sat in
front of Congress to answer questions about Facebook and the safety of its user following the
Cambridge Analytica scandal1. The Cambridge Analytica scandal began when Facebook allowed
a professor from University of Cambridge to collect information about its users for research
purposes. However, the data which consisted of over 50 million profiles including answers to a
personality test, location, friends list, and “liked” content, was handed over to Cambridge
Analytica, a UK-based political data company that was working on Donald Trump’s campaign2.
It’s clear Facebook has its individual users personalized data, but do they use these data to
present certain types of new trends to its users? Cass Sunstein believes Facebook feeds are “echo
chambers” that only show posts by people who are and think like us3. He claims this hurts our
democracy because Facebook users only read articles that align with their beliefs. As a result,
people become more extreme and when they see people who do not share the same beliefs, they
become enemies who are “crazy”4. Sunstein predicted that Facebook will begin to experiment
with the algorithm that determines which news and posts are presented to its users; and he was
correct. In 2018, Facebook announced they will present “less public content, including videos
and other posts from publishers or businesses” and increase the visibility of local news. Mark
Zuckerberg states local news help us understand issues that affect our lives.5 Thus, it’s apparent
that Facebook algorithms have a significant role on what its users see, and Facebook is currently
making effort to increase news that are relevant to its users. However, when Zuckerberg said he
wanted to increase news that are relevant to its users, will that also increase bias news sources as
Sunstein warned us about?
Algorithm auditing in theory is relatively simple: It is to examine the inputs, outputs, and
outcomes of some problem6. However, in practice, it is a much harder to achieve. The algorithm
is replacing the human involvement of data collection, data analysis, and human input, which
may be faster and more efficient since the human brain has limited computational abilities.
However, it comes with its own struggles such as consistencies, intention behind the algorithm,
unintentional or intentional biases, making sure the algorithm is behaving the way the
programmer expected, and many more. In this thesis, we dive deep into programming and take a
closer look at how Facebook interacts with automatic scraping of its data, the struggles
computationally behind Facebook data collection and analysis, and the process of creating
Python and Selenium scripts to automatically gather large amounts of data.
Up to this point, relevant research has investigated algorithms and the role they play on
reporting and analyzing information and whether an algorithm is accountable and to what
1 Tuesday, For five hours on. “Your Facebook Data Scandal Questions Answered.” CNNMoney, Cable News
Network, 11 Apr. 2018, money.cnn.com/2018/04/11/technology/facebook-questions-data-privacy/index.html. 2 Riley, Charles. “Cambridge Analytica, Facebook and Your Data: Here's What to Know.” CNNMoney, Cable News
Network, 20 Mar. 2018, money.cnn.com/2018/03/19/technology/facebook-data-scandal-explainer/index.html?
iid=EL 3 “'#Republic' Author Describes How Social Media Hurts Democracy.” NPR, NPR, 20 Feb. 2017,
www.npr.org/2017/02/20/516292286/-republic-author-describes-how-social-media-hurts-democracy. 4 “'#Republic' Author Describes How Social Media Hurts Democracy.” NPR, NPR, 20 Feb. 2017, 5 Brown, Pete. “Facebook Struggles to Promote 'Meaningful Interactions' for Local Publishers, Data Shows.”
Columbia Journalism Review, 18 Apr. 2018, www.cjr.org/tow_center/facebook-local-news.php. 6 Rosén, Josefin. “What Every Business Manager Should Know about Algorithm Audits.”SAS Learning Post, 16
Oct. 2017, blogs.sas.com/content/hiddeninsights/2017/10/16/algorithm-audits/.
3
degree.7 Other studies have investigated how algorithms on social media are becoming echo
chambers because the algorithms decide what to present to its users8. However, because
Facebook and online personalization is still an innovative concept, there lacks research exploring
if Facebook News Trends are personalized by demographic and geo-location, how often trends
update, and whether Facebook gives favoritism to certain news sources. This paper studies
personalization of Facebook news trends by geolocation and demographic as well as if there is
any favoritism of specific news sources by Facebook. Additionally, it analyzes how often
Facebook updates its news trends. I hypothesize that Facebook News Trends are personalized by
different demographics and geo-location. Furthermore, I hypothesize that Facebook may give
some favoritism to certain news sources that are more liberal than conservative because
Facebook is known be more liberal as a company, but only slight favoritism. Lastly, I
hypothesize that Facebook News Trends will not have a consistent update schedule as news
change depending on what happens around the world.
The data used for my thesis was collected by Python and Selenium scripts that I
developed for this project. There are four categories: Trends, Trends and Tabs, Geo-location, and
Personal vs Puppet. Each category has its set of datasets that were collected in real-time from
Facebook. The data is stored into a database and further processed by algorithms to answer the
following questions:
• How often are trends updating?
• Is there different news trend behavior on the weekend verse the weekday? What
about different days on the weekdays?
• Which news sources does Facebook give more exposure to?
• How many news articles from external news sources does Facebook publish?
• Which news sources does Facebook include in the “Trending” section?
• Do news trends differ depending on geographic location? If so, how?
• Does Facebook personalize news trends by user? More specifically, do news
trends differ depending on the Facebook account?
To my best understanding and knowledge, there are no research that answered the
questions listed above. This paper investigates and answers the questions listed and provides
insight on how the end users experience Facebook’s Trending Topic news.
This paper and is structured as follows: Section II consists of a summary of relevant
research on social media, traditional news, and news sources, Facebook news trends, and recent
Facebook events. Section III presents the methodology on how data was collected, the
technology used, details of the raw datasets, and descriptions of the methods used to analyze the
data. Section IV presents the results; Section V offers the discussion which includes the
limitations and implications of my thesis as well as future research areas. Lastly, Section VI
presents the conclusion.
7 Diakopoulos, Nicholas. “Algorithmic Accountability.” Digital Journalism, vol. 3, no. 3, 2014, pp. 398–415.,
doi:10.1080/21670811.2014.976411. 8 Alvarado, Oscar, and Annika Waern. “Towards Algorithmic Experience.” Proceedings of the 2018 CHI
Conference on Human Factors in Computing Systems - CHI '18, 2018, doi:10.1145/3173574.3173860.
4
II. Literature Review
The objective of this literature review is to present recent events with Facebook, past
research on Facebook and Twitter news trends, social media and traditional news, and algorithm
auditing.
In May 2016, Gizmodo.com, a design, technology, science, and science fiction website
published an article where several former Facebook employees who worked as “news curators”
claimed that they suppressed conservative news from the Trending news page9. They claimed
they were instructed to “artificially inject” specific stories into the Trending news page.
Sometimes, these artificially injected stories that were not naturally trending, however, were
used in the Trending section anyways.10 This raised concerns because a process that was
assumed to be purely determined by algorithms had bias human involvement. A former
Facebook employee also claimed that news covered by conservative news sources that
Facebook’s algorithm selected would not be included in Trending unless there were more
unbiased news sources that also covered the same story.11 After Gizmodo’s news coverage
gained popularity, on May 9, 2016, the Vice President of Search at Facebook responded on
Facebook stating the following12:
There are rigorous guidelines in place for the review team to ensure consistency and
neutrality. These guidelines do not permit the suppression of political perspectives. Nor
do they permit the prioritization of one viewpoint over another or one news outlet over
another. These guidelines do not prohibit any news outlet from appearing in Trending
Topics. Trending Topics is designed to showcase the current conversation happening on
Facebook. Popular topics are first surfaced by an algorithm, then audited by review team
members to confirm that the topics are in fact trending news in the real world and not, for
example, similar-sounding topics or misnomers.
On August 26, 2016, Facebook announced they will be removing human involvement in
writing the Trending topic list’s descriptions13. This meant Facebook’s algorithm will have full
control and whatever it writes will be published before a human looks at it. However, only a few
days after the change, Facebook’s algorithm posted an article about Megyn Kelly on Trending
with a description calling her a “traitor” and that she was kicked out by Fox News for “backing
Hillary”, which she was not14. This fueled Facebook’s controversy of displaying fake news on its
platform. However, there seems to be a conflict regardless of who is behind the decision making
of Trending Topics. When Facebook was using human editors for their Trending, it was accused
of inputting human biases in their decisions. When it removed the human involvement, they were
accused of fake news.
9 Nunez, Michael. “Former Facebook Workers: We Routinely Suppressed Conservative News.” Gizmodo,
Gizmodo.com, 10 May 2016, gizmodo.com/former-facebook-workers-we-routinely-suppressed-conser-1775461006. 10 Nunez, Michael. “Former Facebook Workers: We Routinely Suppressed Conservative News.” 11 Nunez, Michael. “Former Facebook Workers: We Routinely Suppressed Conservative News.” Gizmodo.com, 12 https://www.facebook.com/tstocky/posts/10100853082337958 13 Ohlheiser, Abby. “Three Days after Removing Human Editors, Facebook Is Already Trending Fake News.” The
Washington Post, WP Company, 29 Aug. 2016, www.washingtonpost.com/news/the-intersect/wp/2016/08/29/a-
fake-headline-about-megyn-kelly-was-trending-on-facebook/?noredirect=on&utm_term=.2d050b7762f3. 14 Ohlheiser, Abby. “Three Days after Removing Human Editors, Facebook Is Already Trending Fake News.
5
In January 2018, Facebook announced they will change their algorithm to favor
“authentic” connections between people.15 Furthermore, the new algorithm will devalue posts by
publishers and paid brands. For the users, the quality of user experience is expected to increase.
Advertisement on the News Feed will be more relevant to the user and the quantity of ads will be
fewer. Contents from the user’s social connection will get a boost in exposure and are expected
to be higher on the News Feed list16. However, this change will make advertising on Facebook
more competitive and expensive. Additionally, more advertisements are likely to appear in other
Facebook platforms such as Messenger, Instagram, and WhatsApp.
A study by Chakraborty, Messias, and Benevenuto investigated the demographic biases
in crowdsourced recommendations on Twitter’s Trending Topics. News content are selected for
recommendations by how much popularity and activity it gets on the social media’s platform.
Thus, it indirectly gives users the power to promote specific news. This paper studied which
demographics of people influenced which contents were worthy of recommendation and whether
that demographic was a good representative of the platform’s overall population17. The results
showed that a significant percentage of the trends were promoted by demographics that were
drastically different from the overall population18. Further concerns were raised when they
discovered that there were some demographic groups that were under-represented among the
promoters of the trends. Specifically, Black female were the most under-represented
demographic followed by Black male, Asian female, and Asian male; white male was the least
under-represented demographic19. Furthermore, middle-aged demographics were more under-
represented than the younger population. I wanted to further this research and investigate
whether the Trending Topics are different for different demographics. In other words, this study
discovered which demographics influenced certain news to rank in Trending Topics. I would like
to study how the Trending Topics are presented to different demographics. The results from this
study added good insights to my research, as I will be using two different Facebook accounts
when scraping data: an impersonalized sock puppet account and an established account with a
particular demographic.
A 2011 study by Cvijikj and Michahelles investigated trend detection over Facebook
public posts20. They monitored trends by data collection and trend detection. The data collection
process was continuous and in real-time, like the algorithm implemented for my paper. This
study also had many difficulties collecting data from Facebook. I explain in detail about the
difficulty scraping data from Facebook in the Discussion section. This study’s results suggested
that Facebook trending topics should be divided into three categories: disruptive events, popular
topics, and daily routines. However, this study lacked results on how Facebook trends are
personalized by location or account. It also did not present results on how often trends changed
on Facebook’s Trending. Additionally, the study only collected data for 4 consecutive days and
did not consider how Facebook Trends may change on different weeks and on weekends.
15 Göös, Christine. “Blog.” Facebook Advertising Trends 2018, 15 Feb. 2018, www.smartly.io/blog/facebook-
advertising-trends-2018. 16 Göös, Christine. “Blog.” Facebook Advertising Trends 2018 17 Chakraborty, et al. Who Makes Trends? Understanding Demographic Biases in Crowdsourced Recommendations.
1 Apr. 2017, arxiv.org/abs/1704.00139. 18 Chakraborty, et al. Who Makes Trends? Understanding Demographic Biases in Crowdsourced Recommendations. 19 Chakraborty, et al. Who Makes Trends? Understanding Demographic Biases in Crowdsourced Recommendations. 20 Cvijikj, Irena Pletikosa, and Florian Michahelles. “Monitoring Trends on Facebook.” 2011 IEEE Ninth
International Conference on Dependable, Autonomic and Secure Computing, 2011, doi:10.1109/dasc.2011.150.
6
A study by Groshek in 2013 compared accomplished traditional news sources’ agendas
(New York Times and CNN) with the most frequency shared news and topics on popular social
media sites (Facebook and Twitter)21. The study showed that the news agendas were similar
between the two traditional news sources and Facebook, but there were variations in terms of
ranking of the news items. The study also showed that social media influenced traditional
media’s agenda for the cultural topic. However, there was no relationship between political and
cultural coverage within the social media platforms. This literature provided valuable
information on the relationship, or the lack there of, between traditional news outlets and social
media trending news. However, it did not answer how different location and/or Facebook
accounts affect the type of news trends a user sees. In fact, it only investigated the news agendas
in terms of categories and lacked the deep analysis of what types of news sources were
presented.
A 2016 study by Kazai, Yusok, and Clarke developed a prototype mobile application that
gave content recommendation by utilizing the user’s location, Facebook and/or Twitter feed, and
her in-app activities22. Their model constructed the user’s personalized feed by mixing different
sources from multiple sources, some directly from their Facebook/Twitter feeds and some
propagated content through her in-app social network. This study was the first to provide
personalized feeds by pulling different sources and recommendations over a crowd curated
content pool. Their algorithm was different from Facebook’s Trending algorithm, but it had a
similar idea because it learned from the user’s activity and what was popular within the user’s
and platform’s network. By utilizing the data collected, they made content recommendations to
their users which they believed would interest the users.
Overall, Facebook research is a recent topic and has recently gained popularity. As a
result, there lacks a plentiful amount of Facebook research and there are only a few past
researches on Facebook Trending Topics and Facebook data scraping. To the best of my
knowledge, there is no existing research that investigate how Facebook Trends different by geo-
location and by personalization. There is also no research that investigate how often Facebook
Trending topics change and whether there is any difference in behavior depending on the day of
the week or the time of the day.
In this paper, I describe the algorithm auditing process to scrape the data from Facebook
and analyze the results of the collected data in terms of personalization by demographic, geo-
location. I also investigate how often Facebook Trending topics change and if there are different
Trending behaviors depending on the different days of the week and different times of the day.
Lastly, I investigate which new sources are receiving higher news exposure by Facebook. Based
on the evaluation of the collected results, I discovered that Facebook Trending Topics are not as
personalized as I hypothesized. However, Facebook gives clear preference to certain news
sources than others.
21 Groshek, Jacob, and Megan Clough Groshek. “Agenda Trending: Reciprocity and the Predictive Capacity of
Social Networking Sites in Intermedia Agenda Setting across Topics over Time.” 2013,
doi:10.12924/mac2013.01010015. 22 Kazai, Gabriella, et al. “Personalised News and Blog Recommendations Based on User Location, Facebook and
Twitter User Profiling.” Proceedings of the 39th International ACM SIGIR Conference on Research and
Development in Information Retrieval - SIGIR '16, 2016, doi:10.1145/2911451.2911464.
7
III. Methodology
My thesis looks at how Facebook news trends behave and how that influenced the users’
news sources. Unlike most MMSS theses, I collected my data by creating my own programming
scripts that scrape data from Facebook. This method was chosen because there are no data
publicly available for Facebook news trends. Additionally, Facebook news trends change often
because news in the media are always changing depending on what is happening the world. To
account for the randomness of what happens in the world, it is important to have the latest data
for my analysis. Furthermore, Facebook has proclaimed it will make significant changes to its
algorithm after it was criticized for fake news and bias news23. To account for the latest
Facebook algorithm, it was important to collect recent data to account for the most up to date
algorithm changes.
Scripting Technology
The scraping scripts were written in Python 3.6.4 and Selenium, a user interface (UI)
automation tool that automates UI testing. Selenium was selected because it can simulate a real
human user and trick the web browsers. It can distribute and scale scripts over many different
environments and can create robust, regression automation tests. Regression automation test’s
purpose is to catch bugs that were accidently introduced and make sure previous bugs stay dead.
Selenium physically clicked on buttons and scrolled through the webpage like a real human user
would. This is important as Facebook tried to prevent scraping of its data and raised security
checks when it felt an account had security threats, which I will talk about further later in this
paper. Additionally, Selenium can visually show the user what is happening on the webpage.
Even though the user did not physically make any actions on the page, actions such as click,
scroll, etc. were conducted on the webpage. This helped with debugging the code and confirming
the algorithm was working properly. The user could compare how Facebook reacted when an
algorithm was navigating through its pages verses when a real human being clicked through the
pages. Selenium also has the option to run the scripts in headless mode, where the browser is
scraped without the web page visually showing. This is an important feature as when the browser
is visually present, it is more prone to human errors because the user could accidently click on
the webpage and cause an interference with the script. Headless mode was a feature that was
added further into the data collection since it is most useful for long intervals of data collection
and for scripts that have been fully debugged. Since Selenium is a user interface testing platform,
it works great with HTML and has beneficial options to grab the html elements by class, id, CSS,
XPath, and practically any html element that is available on Facebook.
Python 3 was used over Python 2 because Python 3 is a newer version of Python 2 and
offers more features. Furthermore, there is a shift from Python 2 to Python 3 and to account for
future research on my thesis topic, I used the newest version of Python, so the version difference
is as minimal as possible for future researchers. Additionally, there are many benefits of Python
as a programming language. There are six main reasons why I chose to use Python over other
programming languages such as Java or C. First, Python Package Index (PyPI) makes Python
capable of interacting with many different languages and platforms because it includes various
third-party modules. These modules made it easier to work with Python when I wanted to install
software, for example open source, that are developed and shared in the public Python
23 Brown, Pete. “Facebook Struggles to Promote 'Meaningful Interactions' for Local Publishers, Data Shows.”
8
community24. It was also beneficial because it made it easier for me to distribute my software via
PyPI. Additionally, PyPI made it easier for Python to interact with different languages and
platform. In this data collection, I used two different platforms, Google Chrome and Mozilla
Firefox. Second, Python has an extensive amount of support libraries. Support libraries are
standard libraries that have predefined functions that can be used throughout the code once the
library has been imported into the script. This reduces the length of the code significantly as well
as reducing the time to write tedious repetitive code. This is extremely important because
Facebook UI has hundreds and hundreds of lines of code. It would be impossible to write code
for every single task my code needed to accomplish for this thesis in the given amount of time.
Third, Python programming language is an open source and community development25. Thus, it
is free to use by the public, even for commercial purposes. It is also an on-going development as
developers are contributing to new versions of Python regularly. This is important because even
minor updates for the Python language can address and fix bugs that were present in previous
versions. Fourth, Python has numerous documentation and community Q&A, which is where
other programmers who ran into similar problems post their errors and the Python community
gathers to solve the problem together if there is no answer already. This is tremendously helpful
when running into bugs and I have used this hundred, of times when I was writing and
debugging my scripts. Though the command prompt throws errors and gives brief description of
what caused the error when a script failed, it is usually not enough to figure out how the solve the
bug. The support available online had encouraged me to pursue my thesis in the Python language
and continuously encouraged me to keep it Python. Fifth, Python has great data structures such
as built in lists, sets, and dictionaries that make it not only easy to develop but also time optimal.
This is critical because large amounts of data were collected and stored into multiple leveled data
structures. Additionally, Python offers dynamic high-level data typing which is useful because it
decreases the amount of support code required and ultimately the amount of code written. Lastly,
Python is an object-oriented programming language. It consists of strong text processing
capabilities and has its own unit testing framework which is very important since a significant
portion of the data collected are either text or/and have gone through text processing.26
Virtual Environment Set Up
To set up the environment and get the correct dependencies and packages in one location
safely, I created a virtual environment. This step is important and recommended for developers
who wish to use my code in the future because how I manage my dependencies may be different
from how another developer manages her dependencies. I focused on development and testing
for majority of my programming for this thesis, but I did change my focus to deployment further
on as I was completing my scripts and publishing my code into GitHub.
After I downloaded Python 3.6.2, I downloaded pip separately. I developed on a Window
platform. In MacOS, Python is pre-downloaded on every device and pip comes with Python,
however, for my operating system (Windows 10), I had to manually download pip. Afterwards, I
24 “PyPI – the Python Package Index.” PyPI, pypi.org/. 25 “Welcome to Python.org.” Python.org, www.python.org/about/. 26 Rongala, Arvind. “Benefits of Python over Other Programming Languages.” Invensis Blog, 6 Apr. 2018,
www.invensis.net/blog/it/benefits-of-python-over-other-programming-languages/.
9
used pip to download Pipenv, a dependency manager for Python projects27. Virtualenv is the
virtual environment tool I used to develop in separate and different Python environments on the
same local computer. Virtualenv created a folder which isolated my thesis development
environment with other environments I was developing on for a different project. This is
important because I needed different dependencies for my thesis than, for example, my machine
learning project. After creating a new virtual environment, I downloaded the needed libraries and
dependencies for the data collection script. There are three main dependencies I downloaded into
my virtual environment: Selenium Webdriver 3.9.0, APScheduler 3.5.0, and Pyvirtualdisplay
(PyPl). Selenium was a dependency I downloaded to integrate with my Python code. By
combining Selenium and Python, I could control the user interface activity with Python code.
APScheduler was used to schedule when and how my scripts would run. I will further discuss the
APScheduler intervals in the data collection section. PyPl was used to run my script on the
Amazon Web Services (AWS) cloud. I will further discuss cloud computing in the data
collection and challenges section.
Web Browser
Initially, I collected data using Mozilla Firefox because Selenium IDE, a Firefox add-on,
was only available on Firefox and I had experience with Selenium IDE. Since I did not use
Selenium IDE for this experiment but instead I used Selenium WebDriver which is available on
multiple browsers, I was not restricted to one browser. After running into issues with Firefox
when running the scripts on the AWS cloud server, I changed from Firefox to Google Chrome.
Another reason I changed from Firefox to Chrome was because Chrome is the most popular web
browser with roughly 78% of browser usage verses 11% for Firefox28. It was important to collect
data from the most commonly used browser since there was a greater chance most Facebook
users use Chrome to access their Facebook accounts. If developers are interested in running my
scripts on Firefox, I have left the code commented out for future use. But for this study, all data
analyzed and presented in this paper are collected from Chrome.
To use Chrome as the preferred browser, you must download chromedriver to properly
run the script on the cloud. Without the chromedriver, the script will automatically fail, and you
cannot collect data on an AWS server. Examples of common errors are shown in Figure 1.
However, the script will run and collect data correctly when it is running locally on a Windows
10 operating system.
selenium.common.exceptions.WebDriverException: Message: 'chromedriver.exe' executable may have wrong
permissions. Please see https://sites.google.com/a/chromium.org/chromedriver/home
selenium.common.exceptions.WebDriverException: Message: unknown error: cannot find Chrome binary
Figure 1: Common Chromedriver Error
CSV and mySQL Database
27 “Pipenv & Virtual Environments.” Freezing Your Code - The Hitchhiker's Guide to Python, docs.python-
guide.org/en/latest/dev/virtualenvs/. 28 “Browser Statistics.” W3Schools Online Web Tutorials, www.w3schools.com/browsers/default.asp.
10
In the initial stages of data collection, the data was stored in CSV files because Python
has a csv support library with useful functions that made it simple to transport data into a csv
file. Figure 2 shows the csv support library functions in use where I create a new CSV file and
fill in the column names. Figure 3 shows a list of lists that contains data that was scraped from
Facebook being written into a csv file. Codes in figure 2 and 3 are from fb_trends.py.
Additionally, in the earlier stages of data collection, I only collected data in short intervals such
as one to two hours and only collected one subset of data called trends_only, which I will discuss
in detail later in the paper. As a result, the CSV was never unbearably large and importing the
data into a CSV was feasible.
Figure 2: Create a New CSV File using CSV Support Library
Figure 3: Insert List of List into CSV File using CSV Support Library
As I began to collect more data, in terms of higher frequency, longer time range, and
different types of data, CSV began to have issues in performance. As a result, I moved my data
storage from CSV to MySQL Database. MySQL Database has many perks that CSV does not
have. First, MySQL is developed to store extremely large amounts of data. Thus, even if I collect
data for 24 hours every one minute, there is no issue with performance in retrieving the data.
With MySQL, I can conduct SQL queries that lets me filter certain types of data depending on
specific criteria. For example, Figure 4 shows an SQL query that selects all columns of the data
in fb_scrape_db database PROXYIL table. Figure 4 is an example of one of the basic select SQL
queries, but there are also action queries where I can insert, update, or delete data inside the
database. Furthermore, queries can calculate or summarize data in addition to automate data
management tasks29. To help understand the concept of SQL queries, you can think of it as a
search on Google. You ask Google search engine a question and it returns an answer, which is
the same as you give Google a query, and it returns what it finds in its database.
Figure 4: Basic Select SQL Query
Another benefit of MySQL database is that the database is stored in the cloud. This
means that I or anyone I want to share my data with can access my data from any device as long
she has the login credentials. This is extremely valuable since I was running my scripts on three
different devices during this experiment: my laptop, a desktop, and an AWS cloud server. Thus,
there is complete mobility and flexibility on where and how I can access my data. For example, a
desktop has a more powerful central processing unit (CPU) than a laptop so majority of the more
29 Rouse, Margaret. “What Is Query? - Definition from WhatIs.com.” SearchSQLServer,
searchsqlserver.techtarget.com/definition/query.
11
complicated scripts were ran on a desktop30. However, I can only access the desktop when I am
home. With MySQL database, I have the option to access the data being collected on my desktop
directly from my laptop. Furthermore, the data is immediately inserted into the database once its
collected, and more importantly, the data is immediately accessible anywhere that has access to
the cloud. This meant I could check how my script was running on my desktop from my laptop.
This dynamic concept was critical in collecting the data on time and accurately. Lastly, storing
data into the MySQL database makes it easier for future developers to work on this project since
the data is stored on a cloud which can be accessed from anywhere.
AWS Cloud
During the early stages of data collection, the scripts were locally running on my
Window 10 laptop. However, as the time range and amount of data collected increased, it was no
longer feasible to run the scripts locally. When a script is locally running, the laptop must be on
the whole time or else the script will stop. Additionally, the laptop must be connected to the
internet always or else the connect to Facebook will fail and the script will fail. Because of these
two constraints, it was impossible to run the data for a whole 24 hours without any interruptions
because I needed my laptop for my classes. I would like to note that even though there is internet
on campus in Northwestern University, the connection to the internet is not stable outdoors.
Thus, when I tried to walk from class to class with my laptop on, the script still failed because
the internet connection was too poor to stay connected to Facebook. As a result, I decided to
move the scripts that were running locally to run on the cloud.
I used AWS for cloud computing because originally, I was also going to utilize
Amazon’s Mechanical Turk to crowdsource Facebook data. However, due to the recent events
with Facebook’s data scandal, I decided to not pursue that method. AWS is a cloud platform that
offers both cloud computing and database storage31. Since my database was in AWS, I decided
to utilize its cloud computing platform as well. Additionally, AWS is a very popular cloud
services platform that is used by many large companies such as Spotify, Intuit, Yelp, and more. It
is trustworthy and provides applications with flexibility, reliability, and scalability.
To access the cloud server, I used PuTTY to connect from my local to the cloud. Please
note that this cloud server has a private key that needs to be present when you try to connect. To
transfer files from my local to the cloud server and vice versa, I used WinSCP. Both applications
are highly recommended for Window users and they available online for free.
When a script is running on a cloud, it is running on a virtual server. It is the same idea as
running on a different computer, but the only difference is this one is in the cloud. So, you can
think of it like it is another computer that is not physically next to you. Thus, I can connect to the
cloud and run my scripts as I would on my local. However, there are some key differences. The
AWS cloud server I used was in Ohio at eastern time zone and the operating system is Red Hat
Linux. Because the difference in operating system, command prompt commands are completely
different from each other. For example, in Linux, you must start your script with python3 while
30 “What Is a Central Processing Unit (CPU)? - Definition from Techopedia.” Techopedia.com,
www.techopedia.com/definition/2851/central-processing-unit-cpu 31 “What Is AWS? - Amazon Web Services.” Amazon, Amazon, aws.amazon.com/what-is-aws/.
12
in Windows you start with py. Additionally, AWS cloud servers do not come with Chrome pre-
downloaded and since I do not have an interface to look at on the cloud, I had to download
chrome via the command prompt. There are many other key differences between Window and
Linux users should be aware of for future additions to this study, but I will not include in this
paper since it is too technically complex for the average MMSS audience. Once again, I had to
set up a virtual environment and download the three dependencies on the Linux cloud server.
The purpose of the cloud server was to run my script without any interruptions for 24
hours. However, I had to make the UI be present but not visible because there is no screen on a
cloud computer. Thus, I used xvfb from pyvirtualdisplay (PyPl) which is a library that was
imported into the script. This feature allowed an invisible UI to appear without it being shown to
the cloud’s screen. To understand this concept, you must understand how a computer display
works. A computer screen queues up the next visual images one after the other and shows it on
the screen to the user. This is how we can browser from page to page or watch a video. What
happens in the cloud server is there is no computer screen to be shown because the server is in
the cloud and there is no physical screen. Thus, when I try to launch a chrome browser, the
browser cannot open, and error occurs. However, by using xvfb, we can queue the images up on
the server but not visually present it on a screen. In doing so, the cloud server loads the UI
behind the screen without erroring. This allows us to run UI scripts on the screenless cloud.
It is important to note that the cloud server is like a separate invisible computer. This
means when I disconnect from the cloud platform, my cloud server shuts down just like how a
computer shuts down when you turn it off. When the cloud server shuts down, everything on the
server closes, including the scripts. To run the server continuously even when I exit the cloud
server, I used nohup. Nohup is a command which prevents the command that follows from
getting aborted automatically when you log out or exit the server32. Thus, when a nohup
command is ran on a command prompt, the command prompt returns to its normal state after the
command is admitted. Figure 5 shows an example of a nohup example. Nohup is followed by the
command to run the script python3 -u which is followed by the script name
fb_trends_cloud_tab.py. The commands after > is to keep track of the command prompt outputs
in a text file. This is used for debugging purposes. Lastly the & is telling the server do not
terminate this command even if I close the cloud server. In Figure 6, the text results from
running the nohup command is written in a text file instead of a command prompt when the
command in Figure 5 is executed. By running the script on the cloud, using xvfb, and using
nohup, I could collect data without any interruptions for 24 hours.
Nohup python3 -u fb_trends_cloud_tab.py > trends_only5.txt &
Figure 5: Nohup Example
32 “Linux Nohup Command Help and Examples.” Computer Hope, 1 Apr. 2018,
www.computerhope.com/unix/unohup.htm.
13
Figure 6: Nohup Command Prompt Results Sent to Text File
Another benefit to switch from local to cloud is that anyone can access the cloud server if
they have the criteria. This means I can access the cloud anywhere in the world at any time from
any device. I can also share the account with another person, in the case we need to debug a
script together. This allows easier collaboration and communication because everyone who is
working on the project has the latest code. For my thesis, I wrote all the code myself. However,
this is important to note for developers who wish to use my code in the future or to further this
research.
Config File
A config file was utilized in the script to keep track of different features of Facebook
login criteria, data collection intervals, csv file and MySQL database names. This file contains
the different login information for different accounts. Config file was used because it creates a
single location where all feature names are in. This makes it easier to change minor but critical
variables to collect different types of data. Otherwise, one would need to go into every single
Python script and change the variables in every location it occurs. A config file makes this
process dynamic and local. This is a file that will not be uploaded on the GitHub as it contains
private login information. For future development on my thesis, I highly suggest you create your
own config.ini file and include your criteria for Facebook login. You can also include other
criteria such as interval length and filenames.
Data
There are four main dataset categories that were collected: (1) trends, (2) trends and tabs,
(3) different geographic location, and (4) personalized vs puppet Facebook accounts. In this
paper, I will refer to the four data sets by the names mentioned above.
(1) Trends
The trends category’s data was collected from the “Trending” section on right side of
Facebook’s home page as shown in Figure 7. There are five topics in the trending page: Top
Trends, Politics, Science and Technology, Sports, and Entertainment. Each topic had a maximum
of 10 trending news. It was usually the case that each topic had 10 trends on average but there
14
were instances where there was less than 10. Notice in Figure 7 there were only three trends.
Once the “See More” as clicked, the rest of the trends appeared.
Figure 7: Facebook Trending Page
For this category, I used a puppet Facebook account that had a bare profile. The only
information that was provided on the puppet account was a fake first and last name, phone
number, fake birthday, and a profile picture. It had no friends and no activity. The data was
collected by first signing into the puppet Facebook account and loading all the HTML elements
on the home page. It is important to note that Facebook does not load all five topics when you
log into your account. It only loaded the current topic the user was on, which by default is Top
Trends. To load the other four topics, I wrote a function that clicked through every topic before
scraping the data. Once the topics were clicked, the HTML elements were loaded and available
to get collected. My script searched for specific HTML elements I defined and collected the data
respectively. For this category, I collected the following:
● Type: The topic (top trends, politics, science and technology, sports, or entertainment)
● Title: Title of news trend
● Description: The short description located under the title (Figure 7)
● Trend Link: The link that redirects the user when the trend is clicked. The link redirects
the user to a compilation of news on a Facebook page.
● Rank: Where the trend is ranked in the trending list
● Scrape ID: An integer to keep track of which round of scraping the data is collected from
● Timestamp: The exact time and day the data was collected (YY-MM-DD HH-MM-SS)
Type was used to distinguish the different topics during the analysis. The analysis
investigated the top trends for all five topics in addition to the top trends for each of the topics.
Title and description were collected for detail for each news trend. The trend link was used to
uniquely identify each trend. Rank was collected to discover where in the list certain new trends
were placed and whether they moved up or down the list. Scrape ID was used with trend link to
uniquely identify trends for the whole 24-hour dataset. Timestamp was used to keep track of the
time and date the data was collected, which was critical when analyzing the trend behavior at
different days of the week.
I collected five rounds of 24-hour data on separate days locally. The data collected are
listed in Figure 8. Trends 2 is collected at a high interval to analyze when trends are changing
and how often. It investigated if new trends are changing as often as one minute or if 5, 10-
15
minute or a higher interval was sufficient enough to catch most of the new trend updates. Trend 1
and 2 results were compared to investigate whether there is consistency in news trends behavior
between the same weekdays but different weeks. Trends 1 was compared to Trend 3 to
investigate whether there is consistency in news trends behavior on the weekday and the
weekend. Trends 4 and 5 are compared to investigate if different weekdays have different news
trends behavior. For datasets that have different intervals, I only include data in the higher
interval value, so the interval variable is consistent between the two datasets. In conclusion, for
the Trends category I conducted analysis to answer the following questions:
1. How often are trends updating?
2. Is there different news trend behavior on the weekend verse the weekday? What
about different days on the weekdays?
Start Date/Time End Date/Time Interval per Round (min)
Trends 1 Wednesday 3/7/18
4:30PM
Thursday 3/8/18
4:30PM
5
Trends 2 Wednesday 3/14/18
10:30PM
Thursday 3/15/18
10:30PM
1
Trends 3 Saturday 4/21/18
6:00PM
Sunday 4/22/18
6:00PM
5
Trends 4 Thursday 4/26/18
1:00PM
Friday 4/27/18
1:00PM
5
Trends 5 Tuesday 5/15/18
10:00AM
Wednesday 5/16/18
10:00AM
10
Figure 8: Trends Data Information
To answer the questions, I calculated the Jaccard similarity. Jaccard similarity compared
two datasets and analyzed the similarities and differences33. The result was a number between 0
and 1 where the higher the number, the more similarities the two sets had. Jaccard similarity is
the intersection divided by the union of the two sets. The computation was written in Python and
ran as a script. The script then outputted the results on a csv and graphs were produced by excel
and matplotlib, a python library used for graphing. The analysis calculated the average,
minimum, maximum, and standard deviation of the computed Jaccard similarities. The script I
wrote also calculated the Jaccard similarities, average, minimum, maximum, and standard
deviation for every 5-minute interval starting from the interval the data was collected up to 60
minutes. Furthermore, I created a Python script to calculate the cumulative new trends for every
5-minute interval starting from the interval the data was collected up to 60 minutes. This analysis
showed how many brand-new trends appeared in the 24 hours and the results are printed into a
33 “Jaccard Index / Similarity Coefficient.” Statistics How To, www.statisticshowto.com/jaccard-index/.
16
csv file. The graphs for this analysis were also constructed using the same tools as the Jaccard
similarities.
(2) Trends and Tabs
The Trends and Tabs category was an addition to (1) Trends. In addition to collecting
data of Trends, Tabs information was also collected. Tabs data is collected from the news trends
page that consisted of the top articles of that news topic. Once a user clicked on a news trend
from the Trending section, as shown in Figure 7, she was redirected to a Facebook page with
new articles and Facebook communities’ comments and likes, as shown in Figure 9.
Figure 9: Tabs – New Trends Facebook Page
For this category, I used the same puppet account and methodology to collect the Trends
data as (1) Trends. To collect the tabs data, I opened a new tab for each news trend’s link,
scraped the data from the first box with the news sources for that topic. Once on the tabs page,
the new source links did not load unless the user hovered over each news box. Thus, I wrote a
hover function that hovered and waited for each new source’s direct link to load before scraping
the data. My script searched for specific HTML elements I defined and collected the data
respectively. For this category, the same attributes were collected for the Trends portion as (1)
Trends. For the Tabs portion, I collected the following:
• Timestamp: The exact time and day the data was collected (YY-MM-DD HH-MM-SS)
• Scrape ID: An integer to keep track of which round of scraping the data is collected from
• Type: The topic (top trends, politics, science and technology, sports, or entertainment)
• Rank: Where the news source is ranked in the list of news sources on tabs page
17
• Title: Title of article for the specific news source
• Source: News source of the article
• Published Date: When the article was published
• Time Since: Time since the article was published on Facebook
• Description: Description of the article
• URL: Direct URL to the article on the original news source page
Timestamp was used to keep track of the time and date the data was collected. The
Scrape ID was used to keep track of which round of scraping the data was from. The Type was
used to distinguish the different topics during analysis. Rank was important to track the news
source Facebook prioritized. Title and description were collected for detail for each news article.
Source was used to analyze which news sources Facebook gave more news exposure to.
Published Data and Time Since was collected to track how recent news articles were. URL was
used to uniquely identify each news article. Facebook only provided Publish Date, Time Science,
and Description data for the first news source.
Collecting Tabs data required a lot of interaction with Facebook in a short amount of
time. However, Facebook has been making serious effort to block data collection. As a result,
limited Tabs data could be collected. Furthermore, the intervals were higher compared to (1)
Trends because there were more data that needed to be collected and it required more time per
scrape. I collected a 26-hour dataset locally. The data collected are listed in Figure 10. I
conducted analysis to answer the following questions:
1. Which news sources does Facebook give the more news exposure to?
2. How many news articles from external news sources does Facebook publish?
3. Which news sources does Facebook include in the “Trending” section?
Start Date/Time End Date/Time Interval (min)
Trends and Tabs 1 Wednesday, 5/15/18
10:00AM
Thursday, 5/16/18
12:00PM
30
Figure 10: Trends and Tabs Data Information
To answer the first question, I calculated how often certain news sources were exposed
overall and per topic. To answer the second question, I calculated the number of unique articles
overall and per topic. To answer the third question, I used a set to include all the unique news
sources Facebook exposed from the data collected. The computation was written in Python and
ran as a script. The script I wrote used Counter from the collections library to rank the news
sources on how frequently they appeared. I exported the data from MySQL database and
conducted data aggregations via Python. The script then outputted the results on a csv and graphs
were produced by excel and matplotlib.
(3) Geographic Location
For the geographic location category, I used the puppet account to gather data from two
locations: Northern California and Chicago, Illinois. I got a proxy for an Internet Protocol (IP)
address located in Northern California through AWS. Every device has an IP address that is
associated with it. When a device connects to a website, the online connection gives your
18
computer an address, so the website knows how to send information to your computer34. This IP
address identifies where that device is in the world. To trick Facebook to think I am logging in
from a different location, I used an IP address from Northern California. The purpose of this
experiment was to discover whether Facebook presented different new trends in different parts of
the world. The Facebook account, time, and day were held constant. The variables collected in
this experiment are the same variables collected in (1) Trends. The same puppet account from (1)
Trends is used.
I collected two datasets of 32 hour from 12:00AM, Sunday, May 13, 2018 to 3:00PM,
Monday, May 14, 2018 locally. The data collected are listed in Figure 11. Geo-location 1 was
compared with Geo-location 2 to analyze if there were any differences between Facebook news
trends depending on location. Every other variable was held constant. In conclusion, for the Geo-
location category I conducted analysis to answer the following question:
1. Do news trends differ depending on geographic location? If so, how?
Start Date/Time End Date/Time Interval per
Round (min)
Location
Geo-location 1 Sunday, 5/13/18
12:00AM
Monday, 5/14/18
3:00PM
10 Northern
California
Geo-location 2 Sunday, 5/13/18
12:00AM
Monday, 5/14/18
3:00PM
10 Chicago, IL
Figure 11: Geographic Location Data Information
To answer the question, I analyzed the differences between the two datasets collected in
terms of similarities and the number of unique trends overall and per topic. The results consisted
of the number of trends for each location, number of unique trends for each location, the number
of same unique trends between the two locations. This analysis was conducted for both the
overall news trends and per topic. The computation was written in Python and ran as a script. I
exported the data from MySQL database and conducted data aggregations via Python. The script
then outputted the results on a csv and tables were produced by excel.
(4) Personal vs Puppet
For the Personal vs Puppet category, I used the same puppet account as the previous
categories and my personal Facebook account. My personal Facebook account was created in
2008 and included many personalization information. It had numerous pictures, statuses, likes,
comments, and overall activity. For the privacy and safety of my account, I will not discuss in
detail about my personal account. The Config file was very helpful for this category’s data
collection because it contained private information that can be separated from the rest of the
scripts. This was critical especially when I uploaded my code on GitHub where other people
have access to it. The purpose of this experiment was to discover whether Facebook presented
different news trends to different Facebook accounts. More specially, I investigated if Facebook
personalized its new trends from what they believe a specific user would be interested. This
would be linked to whether Facebook used the data it collected from its users to present specific
34 “What Is a Proxy Server and Should You Risk Using One?” WhatIsMyIPAddress.com,
whatismyipaddress.com/proxy-server
19
types of new trends on their profile. The bare puppet account did not have any data or activity, so
it was hypothesized that the news trends would be more general. However, my personal account
had 10 years of data and activity. Thus, it was hypothesized to have new trends that would be
personalized to me. As a result, I theorized that the news trends between the two profiles would
be different. The only variable that changed was the Facebook account because I collected data
on the puppet and personal account. All other variables were held constant. The variables
collected in this experiment are the same variables collected in (1) Trends. The same puppet
account from (1) Trends is used. I collected two 40-hour datasets locally and the data collected
are listed in Figure 12. Personal and Puppet 1 dataset were analyzed to investigate whether there
was personalization of news trends. In conclusion, I conducted analysis to answer the following
question:
1. Does Facebook personalize news trends by user? More specifically, do news
trends differ depending on the Facebook account?
Start Date/Time End Date/Time Interval per Round (min)
Personal 1 Sunday, 5/13/2018
2:00PM
Tuesday, 5/15/18
6:30PM
10
Puppet 1 Sunday, 5/13/2018
2:00PM
Tuesday, 5/15/18
6:30PM
10
Figure 12: Personal vs Puppet Data Information
To answer the question, I analyzed the differences between the pair of datasets collected
in terms of similarities and the number of unique trends per topic. The results consisted of the
number of trends for each account, number of unique trends for each account, the number of
same unique trends between the two accounts. The analysis for this category was like the
analysis for (3) Geographic Location but with a focus on per topic data. The computation was
written in Python and ran as a script. I exported the data from MySQL database and conducted
data aggregations via Python. The script then outputted the results on a csv and tables were
produced by excel.
IV. Results
The purpose of this research was to answer whether Facebook Trends effect news
exposure. There are four categories: trends, trends and tabs, geo-location, and personal vs puppet
accounts. This section will present the results of the different categories. When I refer to all
intervals, it means 5-minute intervals from 5 to 60 minutes. When I refer to new cumulative
news trends, it means brand new, unique news trends.
(1) Trends
There were five datasets used and analyzed to answer the two questions listed in the
methodology section. The first dataset showed that Top Trends had the highest cumulative new
trends on average. The other four topics had similar amounts of cumulative new trends for all
intervals, as shown in Figure 13. In a span of 24 hours, there were 43 new trends in Top Trends,
and between 22 to 28 new trends in the other topics for 5-minute intervals. Furthermore, for 60-
20
minute intervals, there were only 16 news trends for Top Trends while the other four topics had
between 16 to 18 news trends.
Figure 13: Trends 1 – Cumulative New Trends
On average, Top Trends had the lowest Jaccard similarity average for all intervals. Figure
14 shows Top Trend’s Jaccard similarity at 5-minute interval was about the same for the other
topics at higher intervals. Additionally, Top Trend’s Jaccard similarity average for 5-minute
intervals had the biggest difference between the minimum (0.182) and maximum (1) compared
to the other four topics: Politics minimum (0.667) and maximum (1), Science/Tech minimum
(0.538) and maximum (1), Sports minimum (0.667) and maximum (1), and Entertainment
minimum (0.538) and maximum (1) as shown in Appendix 1.
5 10 15 20 25 30 35 40 45 50 55 60
Top Trends 43 38 37 28 27 28 24 17 23 22 25 16
Politics 28 26 25 23 19 20 18 20 19 18 20 18
Science/Tech 26 24 22 21 20 19 17 18 18 19 19 18
Sports 22 21 21 19 19 19 17 17 17 17 16 16
Entertainment 25 25 24 25 18 20 18 21 17 17 20 16
05
101520253035404550
Cumulative New Trends - Trend 13/7/18 - 3/8-18 | 4:30PM
Top Trends Politics Science/Tech Sports Entertainment
21
Figure 14: Trends 1 – Jaccard Similarity Average
The second dataset was used to investigate how often news trends are changing. Figure
15 showed that for one-minute intervals, Top Trends had 74 new cumulative trends, Politics 41,
Science/Tech 33, Sports 39, and Entertainment 42 in 24 hours. In Figure 16, Top Trends had 73
new cumulative trends for 5 and 10-minute intervals, Politics 41, Science/Tech 33 and 22, Sports
38 and 37, and Entertainment 39 and 38, respectively. The biggest difference is between 10 to
15-minute intervals for Top Trends by a difference of 3, 30 to 35-minute intervals for Politics by
a difference of 3, 40 to 45-minute intervals for Science/Tech by a difference of 2, 15 and 20-
minute intervals for Sports by a difference of 2, and Entertainment has consistently a difference
of one every interval. This dataset had much higher number of cumulative news trends than
Trends 1 even though they were both collected from Wednesday to Thursday.
Type Interval New Trends
Top Trends 1 74
Politics 1 41
Science/Tech 1 33
Sports 1 39
Entertainment 1 42
Figure 15: Trends 2 – Cumulative New Trends: One Minute Intervals
5 10 15 20 25 30 35 40 45 50 55 60
Top Trends 0.58 0.36 0.47 0.31 0.44 0.26 0.44 0.20 0.29 0.21 0.31 0.22
Politics 0.87 0.81 0.74 0.69 0.69 0.60 0.61 0.54 0.46 0.49 0.42 0.42
Science/Tech 0.89 0.81 0.80 0.75 0.72 0.68 0.72 0.68 0.58 0.56 0.54 0.39
Sports 0.87 0.81 0.74 0.75 0.73 0.65 0.65 0.65 0.58 0.60 0.58 0.54
Entertainment 0.85 0.77 0.73 0.68 0.73 0.65 0.67 0.54 0.58 0.54 0.44 0.54
0.000.100.200.300.400.500.600.700.800.901.00
Jaccard Similarity Average - Trend 13/7/18 - 3/8/18 | 4:30PM
Top Trends Politics Science/Tech Sports Entertainment
22
Figure 16: Trends 2 – Cumulative New Trends: 5 to 60-Minute Intervals
For one-minute intervals, Entertainment had the lowest average Jaccard Similarity at
0.800 followed by Top Trends at 0.803, as shown in Figure 17. The other three topics were
between 0.93 and 0.97. For 5 to 60-minute intervals, the results showed Top Trends had the
lowest Jaccard similarity on average followed up Entertainment, and an unclear ranking of the
other three topics, as shown in Figure 18. In the analyzed data table in Appendix 1, it showed
that Entertainment’s Jaccard Similarity is, on average, 0.10 lower than Top Trends.
Figure 17: Trends 2 – Jaccard Similarity Average: One-Minute Intervals
5 10 15 20 25 30 35 40 45 50 55 60
Top Trends 73 73 70 68 65 65 63 60 60 58 57 58
Politics 41 41 40 39 40 40 37 37 36 37 36 37
Science/Tech 33 32 33 32 33 32 33 32 30 30 30 30
Sports 38 37 37 35 37 35 33 35 34 35 33 34
Entertainment 39 38 38 38 37 37 38 37 36 37 35 36
01020304050607080
Cumulative New Trends - Trends 23/14/18 - 3/15/18 | 10:30PM
Top Trends Politics Science/Tech Sports Entertainment
0.803088578
0.969030969 0.937562438 0.926469364
0.800491175
0
0.2
0.4
0.6
0.8
1
1.2
Top Trends Politics Science/Tech Sports Entertainment
Jaccard Similarity Average - Trend 23/14/18 - 3/15/18 | 10:30PM
23
Figure 18: Trend 2 - Jaccard Similarity Average: 5 to 60-Minute Intervals
Dataset Trends 3 was used to investigate whether there were different levels of news
activity on the weekends verses the weekday. As shown in Figure 19, the most apparent result
was that Science/Tech only had six new cumulative news trends for all intervals. In other words,
there were only six news trends for the 24-hour interval. Top Trends had the highest cumulative
news trends for every interval, then Sports, then Politics then Entertainment. Between 5 and 60-
minute intervals, Politics, Science/Tech, and Entertainment only had differences of 4 or 5
cumulative news trends. Top Trends had 17 cumulative new trends.
Figure 19: Trend 3 – Cumulative New Trends: 5 to 60-Minute Intervals
5 10 15 20 25 30 35 40 45 50 55 60
Top Trends 0.77 0.68 0.62 0.57 0.54 0.55 0.50 0.48 0.48 0.48 0.56 0.38
Politics 0.93 0.90 0.89 0.87 0.85 0.83 0.78 0.78 0.77 0.78 0.75 0.73
Science/Tech 0.93 0.90 0.88 0.88 0.82 0.84 0.80 0.80 0.83 0.77 0.82 0.79
Sports 0.94 0.92 0.86 0.88 0.85 0.82 0.88 0.77 0.86 0.79 0.83 0.74
Entertainment 0.81 0.76 0.62 0.69 0.65 0.65 0.71 0.57 0.64 0.48 0.76 0.60
0.000.100.200.300.400.500.600.700.800.901.00
Jaccard Similarity Average - Trend 23/14/18 - 3/15/18 | 10:30PM
Top Trends Politics Science/Tech Sports Entertainment
5 10 15 20 25 30 35 40 45 50 55 60
Top Trends 46 44 40 42 36 36 35 34 34 31 31 29
Politics 25 25 24 25 23 24 25 24 22 23 23 20
Science/Tech 6 6 6 6 6 6 6 6 6 6 6 6
Sports 29 29 27 28 26 27 27 26 26 23 25 24
Entertainment 24 24 23 24 23 22 23 24 21 21 22 20
05
101520253035404550
Cumulative New Trends - Trend 34/21/18 - 4/22/18 | 6:00PM
Top Trends Politics Science/Tech Sports Entertainment
24
Figure 20 shows that Top Trends’ Jaccard Similarity was lower than the other four topics
while Politics, Sports, and Entertainment were relatively close to each other. Science/Tech
consistently had a higher Jaccard Similarity on average than other topics. There is a downward
trend of the Jaccard Similarity as the time interval increases. At 50-minute, Politics has a sharp
decrease but returns to the usual trend at 55-minute intervals. The analyzed data for all intervals
are presented in Appendix 1.
Figure 20: Trend 3 - Jaccard Similarity Average: 5 to 60-Minute Intervals
Dataset Trends 4 was used to investigate whether different weekdays have different news
trends behavior. Figure 21 shows that Top Trends had the highest cumulative new trends
followed by Sports, then an unclear ranking of Politics, Science/Tech, and Entertainment for
most intervals, excluding 30, 40, 55, and 60-minute. Top Trends had a difference of 28
cumulative new trends between 5 and 60-minute intervals, Sports 16, Politics, 13, Science 10,
and Entertainment 11.
5 10 15 20 25 30 35 40 45 50 55 60
Top Trends 0.83 0.73 0.70 0.61 0.57 0.52 0.53 0.46 0.40 0.35 0.37 0.34
Politics 0.92 0.86 0.82 0.78 0.72 0.73 0.68 0.67 0.66 0.52 0.59 0.61
Science/Tech 0.97 0.94 0.90 0.89 0.84 0.81 0.80 0.77 0.74 0.73 0.68 0.69
Sports 0.91 0.83 0.82 0.76 0.72 0.67 0.65 0.62 0.58 0.59 0.54 0.47
Entertainment 0.93 0.88 0.84 0.81 0.76 0.76 0.73 0.67 0.67 0.65 0.61 0.61
0.00
0.20
0.40
0.60
0.80
1.00
1.20
Jaccard Similarity Average - Trend 34/21/18 - 4/22/18 | 6:00PM
Top Trends Politics Science/Tech Sports Entertainment
25
Figure 21: Trend 4 – Cumulative New Trends: 5 to 60-Minute Intervals
Figure 22 shows Top Trends and Sports had the lowest Jaccard Similarity averages while
the other three topics had higher but similar Jaccard Similarity with each other. On average, as
the intervals increase, Jaccard Similarity decreases. There is a sharp increase at 15-minute
interval for Science/Tech, Entertainment, and Sports and a sharp decrease at 25-minute for
Science/Tech and Politics. For most sharp changes, the trend returns to the usual at the next
interval.
5 10 15 20 25 30 35 40 45 50 55 60
Top Trends 60 55 51 47 43 40 36 39 39 31 32 32
Politics 37 36 34 32 32 28 27 28 27 23 25 24
Science/Tech 35 35 29 30 31 29 27 26 25 26 24 25
Sports 49 47 46 45 43 42 36 42 38 28 37 33
Entertainment 35 34 31 31 28 30 27 27 28 23 27 24
0
10
20
30
40
50
60
70
Cumulative New Trends - Trends 4 4/26/18 - 4/27/18 | 1:00PM
Top Trends Politics Science/Tech Sports Entertainment
5 10 15 20 25 30 35 40 45 50 55 60
Top Trends 0.79 0.67 0.58 0.54 0.48 0.44 0.46 0.37 0.32 0.35 0.33 0.24
Politics 0.85 0.74 0.74 0.69 0.51 0.64 0.59 0.55 0.48 0.42 0.49 0.45
Science/Tech 0.82 0.67 0.78 0.60 0.46 0.61 0.63 0.61 0.59 0.35 0.57 0.44
Sports 0.75 0.57 0.62 0.50 0.46 0.42 0.48 0.40 0.39 0.36 0.31 0.29
Entertainment 0.80 0.63 0.71 0.58 0.59 0.51 0.50 0.59 0.55 0.45 0.50 0.48
0.000.100.200.300.400.500.600.700.800.90
Jaccard Similarity Average - Trend 44/26/18 - 4/27/18 | 1:00PM
Top Trends Politics Science/Tech Sports Entertainment
26
Figure 22: Trend 4 Jaccard Similarity Averages: 5 to 60-Minute Intervals
Dataset Trends 5 was compared with Trends 4 to investigate whether different weekdays
have different news trend behavior and with Trends 2 to investigate whether there is consistency
in news trends’ behavior in daytime and nighttime. Figure 23 shows that for every interval, Top
Trends had the highest cumulative new trends followed by Politics, and a tie between
Science/Tech and Entertainment, and then Sports. Top Trends had the biggest different between
5 and 60-minute intervals at 49, then Politics at 22, then Science/Tech and Entertainment at 12,
then Sports at 8.
Figure 23: Trend 5 – Cumulative New Trends: 5 to 60-Minute Intervals
Figure 24 shows that Science/Tech had the highest Jaccard similarity for all intervals.
Top Trends had the lowest Jaccard Similarity most of the time, excluding at 20, 25, and 50-
minute intervals when Entertainment had the lowest. There is a sharp increase at 15-minute and a
sharp decrease at 50-minute for Sports and Entertainment. The other three topics did not have
any sharp increases or decreases. In general, as the interval increases, Jaccard similarity
decreases for all topics.
5 10 15 20 25 30 35 40 45 50 55 60
Top Trends 83 70 64 55 52 50 47 40 41 42 35 34
Politics 53 49 45 43 40 40 39 36 37 37 31 31
Science/Tech 43 41 40 38 38 37 36 33 35 37 32 31
Sports 34 32 33 30 32 31 29 27 28 30 25 26
Entertainment 42 42 41 40 41 40 37 34 37 36 31 30
0102030405060708090
Cumulative New Trends - Trends 5 5/15/18 - 5/16/18 | 10:00AM
Top Trends Politics Science/Tech Sports Entertainment
27
Figure 24: Trend 5 – Jaccard Similarity Average: 5 to 60-Minute Intervals
In the five datasets, there are some similar behavior. Top trends tend to have the highest
cumulative new trends and the lowest Jaccard similarity averages for all intervals in all datasets.
As expected, cumulative new trends and Jaccard similarity averages decreased as the interval
increased for the five datasets. Trends 1 and 2 showed that behavior of the same days of the
week but different weeks do not have similar trends behavior. There were more cumulative new
trends on Wednesday and Thursday, March 7-8 than on March 14-15 and the Jaccard similarity
averages were different as well. Trends 2 showed there was roughly double the amount of
cumulative new trends for 1-minute intervals than 5-minute intervals. As expected, the Jaccard
similarity averages were higher for 1-minute intervals than 5-minute intervals. Trends 3 showed
that cumulative new trends were lower on the weekends than on the weekdays and Jaccard
similarity was slightly higher on the weekdays than weekends. Trends 2 and 5 showed that news
trends’ behavior were more active at daytime than nighttime. Daytime had slightly more
cumulative new trends but significantly lower Jaccard similarity averages than nighttime. Trends
4 and 5 showed that different weekdays do not have the same news trend behavior. There were
different numbers of cumulative new trends overall and different topics had different cumulative
new trends compared to each other.
(2) Trends and Tabs
There was one dataset analyzed to answer the three questions listed in the methodology
section. Figure 25 shows Top Trends had the highest total unique news articles, then Politics,
then Science and Technology, then Entertainment, and then Sports. Overall, there were 6601
total unique news articles in all five topics, majority of them from Top Trends and Politics.
5 10 15 20 25 30 35 40 45 50 55 60
Top Trends 0.62 0.49 0.39 0.34 0.30 0.26 0.18 0.24 0.15 0.13 0.17 0.21
Politics 0.76 0.66 0.58 0.48 0.45 0.42 0.37 0.30 0.25 0.25 0.29 0.30
Science/Tech 0.79 0.71 0.68 0.61 0.52 0.52 0.45 0.46 0.37 0.30 0.33 0.41
Sports 0.63 0.53 0.65 0.41 0.31 0.48 0.42 0.35 0.32 0.10 0.25 0.25
Entertainment 0.66 0.55 0.63 0.32 0.23 0.40 0.37 0.29 0.23 0.04 0.20 0.29
0.000.100.200.300.400.500.600.700.800.90
Jaccard Similarity Average - Trends 5 5/15/18 - 5/16/18 | 10:00AM
Top Trends Politics Science/Tech Sports Entertainment
28
Figure 25: Trends and Tabs 1 - Total Unique News Articles
Figure 26 shows the top 15 news sources that had the highest exposure overall. MSN had
the highest exposure with 356 articles and the number of articles between the first and the second
had the largest difference amongst the top 15. The rest of the ranking can be found in Figure 26.
Ranking of News Source
Exposure Overall News Source # of Articles
MSN 356
CNN 272
The Hill 202
Fox News 199
The New York Times 198
Reuters 190
USA TODAY 163
CBS News 137
NBC News 127
HuffPost 126
Washington Post 124
BBC News 106
Business Insider 104
Yahoo 104
People 103
Figure 26: Trends and Tabs 1 - Ranking of News Sources Overall
Figure 27 shows the top 15 ranking of news sources Facebook published by topic. MSN
was ranked 1 for Top Trends, Politics, Science/Tech, and 2 for Sports, and 3 for Entertainment.
There are more recurring news sources between different topics such as CNN, but there are also
news sources that only occur in one topic, such as NPR. The list of all news sources Facebook
published on its “Trending” section can be found at Appendix 2.
Rank Top Trends
Politics
Science/Tech
Sports
Entertainment
1
MSN 163 MSN 124 MSN 23
The New
York Times 24 ESPN 27
2 CNN 121 CNN 122 CNN 21 MSN 24 USA TODAY 25
3 Fox News 108 The Hill 115 Reuters 21 TechCrunch 20 MSN 22
4 The New
York Times 104 Reuters 88 ABC News 13 Engadget 16
Bleacher
Report 21
3109
2242
509
459
493
6601
0 1000 2000 3000 4000 5000 6000 7000
Top Trends
Politics
Science and Technology
Sports
Entertainment
Overall
Total Unique News Articles
29
5 USA
TODAY 80 CBS News 75
Al Jazeera
English 13 ESPN 14 New York Post 12
6 People 75 Fox News 68 Fox News 12 Reuters 13 CBS Sports 11
7
HuffPost 73
The New York
Times 56 The Guardian 11
Business
Insider 10 E! News 11
8
BBC News 71
Washington
Post 55
Washington
Post 11
Washington
Post 10 BBC News 10
9
Reuters 68 NBC News 54 NBC News 10
The
Guardian 10 Daily Mail 8
10
The Hill 68 POLITICO 48
Business
Insider 9
Bleacher
Report 8 Fox News 8
11
NBC News 56 USA TODAY 46 NPR 9
USA
TODAY 8 Bloomberg 8
12
Yahoo 53 ABC News 44 The Hill 9 CBS Sports 8
Washington
Post 8
13
CBS News 52 Yahoo 42 TechCrunch 8 The Hill 8
Deadline
Hollywood 8
14
CBS Sports 50 HuffPost 39
The
Independent 8
Yahoo
Sports 7 Boston.com 8
15
CNBC 48
Business
Insider 37 The Verge 8 CNNMoney 7
The
Independent 7
Figure 27: Trends and Tabs 1 - Ranking of News Sources per Topic
In conclusion, there was a large amount of news sources Facebook published on its
“Trending” section, however, Facebook gave more news source exposure to certain news sources
than others. This suggested there may have been some favoritism for certain news sources. There
was 6601 total unique news article Facebook published on its “Trending” page in 26-hours.
(3) Geo-Location
There were two datasets analyzed to answer the question listed in the methodology
section. Figure 28 shows that the only difference between trends in Chicago and Northern
California was the quantity of news trends in Science/Tech, Sports, and Entertainment. However,
in all topics, there were the same number of unique trends in both locations. This meant that
every news that was exposed in Northern California was also exposed in Chicago, but at lower
frequency. Overall, there were 7917 news trends in Northern California and 7943 in Chicago but
only 129 unique news trends. Thus, on average, a Trending Topic was exposed for roughly 610
minutes or 10.16 hours.
Topic
Total Trends
in IL
Total Trends
in CA
Total Unique
Trends in IL
Total Unique
Trends in CA
Total Similar Unique Trends
Between IL and CA
Top Trends 1930 1930 52 52 52
Politics 1852 1852 33 33 33
Science and
Technology 852 858 15 15 15
Sports 1640 1650 41 41 41
Entertainment 1643 1653 32 32 32
Total: 173
Overall 7917 7943 129 129 129
Figure 28: Proxy 1 – Total Trends and Similarity by Topic
(4) Personal vs Puppet
There were two datasets analyzed to answer the question listed in the methodology section.
Figure 29 shows that there is a slight difference in the unique news trends between a personal
30
and puppet account. There were 95 unique trends for both accounts for Top Trends, but there
was one Trending Topic that appeared on the personal and not the puppet account. Thus, there
was only 93 similar unique trends. News trend “Lewis Hamilton” appeared in Top Trends for
the personal account once but did not appear in the puppet account. Lewis Hamilton news
appeared in Sports for both accounts. This means, this topic was promoted from Sports to Top
Trends for the personal account. News trend “Peru Two” appeared in the personal account once
but not in the puppet account for any of the topics.
Topic
Total Trends
in Personal
Total
Trends in
Puppet
Total Unique
Trends in Person
Total Unique
Trends in Puppet
Total Similar Unique Trends
Between Personal and Puppet
Top Trends 2559 2559 95 95 93
Politics 2507 2507 52 52 52
Science and
Technology 1854 1858 30 30 30
Sports 1937 1938 40 40 40
Entertainmen
t 2260 2249 42 42 42
Figure 29: Personal vs Puppet 1 - Total Trends and Similarity by Topic
V. Discussion
This paper investigated four categories: how often Facebook Trends change, if Facebook
personalizes its Trending Topics by demographic and geo-location, and if Facebook gave more
news exposure to certain news sources.
(1) Trends
I discovered that there was not an exact number on how often Facebook Trends update,
which aligned with my hypothesis. The highest number of new cumulative trends in 5-minute
intervals was 83 for Top Trends from Tuesday to Wednesday 10am (Trends 5) while the lowest
was 6 for Science/Tech from Saturday to Sunday 6pm (Trends 3). Even the lowest number for
Top Trend’s new cumulative trends was 43 (Trends 1), which was almost half of 83. However,
Trends 1 had the lowest Jaccard Similarities, which meant Trending Topics were changing often
but between news that have already been published before. This suggested that Facebook’s News
Trends truly updated news according to how often news update in the world. I believe it was no
coincidence there was a high amount of new cumulative trends for Trends 5 because it was only
a couple of days before the Royal Wedding of Prince Harry and Meghan Markle. In fact, Trends
5’s Politics and Entertainment new cumulative trends were the highest compared to the other
datasets. However, Science/Tech and Sports were not. Again, it was probably no coincidence
since a royal wedding would fall under the Politics and/or Entertainment topics. Topics 2 showed
Facebook Trending’s behavior was similar between 1-minute intervals than 5-minute intervals.
This suggested that Trending news were not changing as frequently as one-minute. In fact, it
seemed 10-minute intervals was plenty to gather most of the new cumulative trends and the
Jaccard Similarity had its first large difference between 5 and 10-minute intervals. Top Trends
had the most updates for a large majority of the intervals for all five datasets. This makes sense
since Top Trends can consist of the highest trending news from the other four topics, which
makes it more competitive for news to hold its stance.
31
The next question investigated whether different days and times of the week had different
Trending behavior. Trends 1 and 2 were both collected between Wednesday to Thursday and
were compared to investigate if same weekdays, but different weeks have different Trending
behavior. The data showed that even on the same weekdays, Facebook’s Trending behavior
differs. This aligns with the finding of how often Facebook Trends update. I hypothesize that
when Trends 2 was collected, there were more news produced that week than when Trends 1 was
collected. Trends 3, 2, and 1 showed that there was difference in Trending behavior between the
weekend and weekday. However, there is no clear trend on behavior because on average, Trends
1 had lower new cumulative trends than Trends 3, but Trends 2 had higher new cumulative
trends. The unexpected behavior was from Trends 3 where Science/Tech only had size new
cumulative trends for all intervals. This makes sense because most Science/Tech news are from
big technology companies which are usually closed on weekend. Furthermore, Trends 2 and 3’s
average Jaccard Similarity were more similar than between Trends 1 and 2. Again, this aligned
with the two findings from above and supports my hypothesis that Facebook Trending truly
follows the news industry. Lastly, Trends 4 and 5 were investigated to compare if different
weekdays have different behavior. On average, Tuesday to Wednesday had higher number of
new cumulative trends than Thursday to Friday, except for Sports. Additionally, Thursday to
Friday had higher Jaccard Similarity than Tuesday to Wednesday. Thus, it’s clear there were
more news activity on Facebook from Thursday to Friday. However, it was unclear whether this
result was a norm because I discovered that same weekdays on different weeks have different
trend behaviors. It could be that for the week Trends 4 and 5 were collected, the news industry
was had more news one week than the other. On the other hand, if this trend is a norm, it
suggests that there was more activity on Thursday and Friday than earlier in the week, which
may be because more people are more activity on social media near the end of the weekday than
the beginning.
The findings from (1) Trending suggest that there may be a relationship between how
often Facebook’s Trending news update and the overall news industry updates. This suggestion
aligned with Groshek’s paper which stated Facebook’s news agenda and traditional news
agendas have strong similarities35.
(2) Trends and Tabs
This section investigated if Facebook gave more news exposure to certain news sources
and how many news articles Facebook exposed on average. Data showed that MSN had the
highest news exposure by 84 more than the 2nd highest, CNN which had 70 more than the 3rd
highest, The Hill. However, after the 3rd highest, the rest of the list did not have as significant
gap between the ranking. In fact, the 10th had about 1/3 the number of articles than the 1st. It’s
interesting to note that CNN and Fox News are liberal and conservative leaning news outlets,
respectively, and even though CNN was ranked 2nd and Fox News was 4th, there was a gap of 73
articles. This meant every hour there were 2.8 more CNN articles than Fox News. Furthermore,
CNN, The New York Times, USA Today, CBS News, NBC News, HuffPost, Washington Post,
BBC News, and Yahoo have more consistently liberal audience, according to Figure 30, and they
ranked within the top 15 news sources with the highest number of articles posted on Facebook
35 Groshek, Jacob, and Megan Clough Groshek. “Agenda Trending: Reciprocity and the Predictive Capacity of
Social Networking Sites in Intermedia Agenda Setting across Topics over Time.” 2013
32
Trending36. While only Fox News ranked within the top 15. In other words, 9 liberal news
sources and 1 conservative new source ranked within the top 15 most number of articles
published on Facebook Trending. Out of the news sources in Figure 30, only three conservative
news outlets were published on Facebook Trending in the 26-hours at least once while at least 19
liberal news sources were published. In fact, the number was higher than 19 because, for
example, Yahoo had Yahoo Sports, Yahoo Finance, Yahoo News, and Yahoo Canada, which are
different news outlets under the same parent company. This aligned with my hypothesis that
Facebook gave more news exposure to certain news sources, more specifically liberal news
sources.
Figure 30: Liberal and Conservative Leaning News Source Metric
(3) Geo-location
This category investigated whether Facebook Trending news differed depending on
geographic location. The results from this category was shocking because there was no
difference in the type of Trending Topics Facebook posts between Chicago, IL and Northern
California. This showed that Facebook did not personalize its news trends based on location of
the user. However, different Trending Topics were presented at different times, but all Trending
Topics were exposed at some point in the data collection period for both locations. There are
different local news stations and local news in Northern California than in Chicago, IL. It was
highly unlikely trending local news in Chicago would become a trending local news in Northern
California and vice versa, or else it defeats the purpose of local news. In the beginning of the
year, Mark Zuckerberg announced Facebook would try to increase exposure of local news on
their News Feed37. It seems Facebook plans to only apply that for News Feed and not Trending
Topics, at least as of now.
(4) Personal vs Puppet
36 Engel, Pamela. “Here's How Liberal Or Conservative Major News Sources Really Are.”Business Insider,
Business Insider, 21 Oct. 2014, www.businessinsider.com/what-your-preferred-news-outlet-says-about-your-
political-ideology-2014-10. 37 Brown, Pete. “Facebook Struggles to Promote 'Meaningful Interactions' for Local Publishers, Data Shows.”
33
This category investigated whether Facebook Trending news differed between different
demographics. The results showed it did, but to an insignificant degree. There was only one news
trend that showed up on a personal account but not the puppet account. A possible explanation
for the minor difference between the puppet and personal account was that there was a 17 second
lag between the data collection of the two accounts. This meant the Trending Topic that appeared
on the personal account could have appeared for the 17 second grace period and disappeared
before the next scrape. The news trend that did show up on the personal account was about Peru
Two drug smuggling and how one of the smugglers gave birth to twins. In theory, a younger
female has a higher chance on clicking on this news than a middle-aged male because many
celebrity media companies (i.e., People) that report on celebrity pregnancy and babies have a
readership of 70% women38. This study with the results from Chakraborty, Messias, and
Benevenuto’s study showed that even though certain demographics had a stronger influence on
what ended up on Trending Topics, the news trends that get placed on Trending Topics were
presented to all demographics with very little personalization. This means topics certain
demographic find interesting or meaningful may not be meaningful to other demographics.
Furthermore, this meant that under-represented demographics on Facebook are forced to view
news that over-represented demographics boost. This raises concern because news presented on
Trending could favor one demographic over another.
Limitation
There were a few limitations in the methodology. Facebook has been making efforts to
prevent scraping of its data. Though I hacked around to get the data I collected for this study, I
had to scale back on the amount of data I originally wanted to collect. Initially, I planned to
collect a week’s worth of data for each category. Some of the limitations with Facebook included
the following: Issue with multiple Facebook logins, Account safety, Trends that do not have full
information, Fake accounts. Additionally, I planned to collect data with Amazon Mechanical
Turk (MTurk), a crowd sourcing tool, to gather Facebook Trending data from other personal
accounts from specific demographics and/or geo-location. However, after the Cambridge
Analytica data scandal, my adviser and I decided it would be best to stay away from MTurk data
crowd sourcing because the data given to Cambridge Analytica was collected through MTurk.
Facebook detected when an account was logged in multiple times in short periods of
time. This raised an issue for my data collection because when debugging my data scraping code,
I had to log into Facebook multiple times in short periods of time. From learning from my study,
I believe when an account was logged in multiple times, Facebook no longer allowed the HTML
elements to be recognized so the login would fail.
Facebook tracks where a user’s account logs in from and when there is “unusual
behavior” in an account. When Facebook notices unusual activity, they log the account out of all
devices and ask the user to change their password. This was an issue when I was collecting data
on the cloud on my personal account. The cloud was in Columbus, Ohio which is a location I
have never logged into Facebook from. When I tried to log into my account from the cloud,
Facebook thought someone was hacking my account and forced me to change my password. As
a result, I was not able to collect data for (4) Personal vs Puppet on my personal account from the
cloud. Furthermore, when I was scraping the data too fast, Facebook labeled this as “unusual
38 “35 Eye Opening People Magazine Demographics.” BrandonGaille.com, 14 Jan. 2017, brandongaille.com/35-
eye-opening-people-magazine-demographics/.
34
behavior” too. The results were similar, Facebook logged out of all devices and asked me to reset
my password. As a result, (2) Trends and Tabs data could only be collected at 30-minute
intervals.
Sometimes, Facebook’s Trending Topics would be missing information such as a pop-up
to the news article links or description of the topic. I noticed in my data collection that there was
no consistency in this random behavior. Thus, I hypothesized that Facebook purposely added
inconsistently throughout the Trending section to catch automatic scraping of its data. This
caused issues in data collection because once one of Facebook’s random tests caught my script,
my script would terminate and no longer collect data for that run. This limited how long I could
collect data for consistency.
Issues with scraping data from Facebook was expected, especially after the Cambridge
Analytica data scandal. However, there was added complications when Facebook announced
they were taking down fake accounts. Mark Zuckerberg posted about how Facebook has been
making effort to shut down fake accounts on May 15, 2018, as shown in Figure 31. I only ran
into issue with the puppet account once, but I did notice that when Facebook detected “unusual”
behavior” in my personal account, it only asked me to change my password. When it noticed
“unusual behavior” in the puppet account, it asked me to verify my phone number by typing in a
code it sent to the phone. In other words, Facebook was checking to make sure that account was
a true account with a real phone number instead of a fake bot account.
Figure 31: Zuckerberg’s Post
There was a limitation with the AWS cloud server. I experienced a bug where
Chromedriver cannot be reached. This error occurred about 2.5 hours after a script was launched
on the cloud. Once the error occurred, the script would no longer scrape data because the UI was
35
no longer reachable. My adviser and I did the best of our ability to try to fix the issue. Memory
was not an issue as there was consistent free memory available when the script first started until
when the error occurred. I tried closing and relaunching the chrome driver every interval but that
caused red flags with Facebook. I added no-sandbox into my script, as many internet sources
recommended but that did not fix my issue. All software and packages were updated to the latest
version. Overall, this limited my ability to collect data for a long, continuous amount of time.
Further Research
One of the insights of my study concluded that there may be a correlation on news
industry activity and Facebook Trending Topics. A future research could compare the news
activity in the news industry, both traditional and online, and Facebook Trending news to
investigate whether Facebook truly follows the new industry patterns. The results of this
suggested study would show whether Facebook filters certain news. If the news activity did not
closely follow the news industry’s activity that means Facebook presents bias in their Trending
by selecting specific type of news topics. Another suggestion is to collect Facebook Trending
data from different countries and study if different countries have different Trending Topics.
This study discovered that different US cities do not have different Trending Topics, but the
suggestion enlarges the scale internationally.
VI. Conclusion
This study investigated whether Facebook personalizes its Trending Topics by
demographic and geo-location. It further investigated how often Facebook trends change and if
Facebook gives more news exposure to certain news sources. In the data collect and analysis, I
find that Facebook does not personalize by geo-location and only slightly personalize by
demographic. Furthermore, my results show that Facebook gives more news exposure to liberal
news sources than conservatives. Lastly, the analysis showed Facebook Trending Topics update
irregularly. This leads to a hypothesize that Facebook follows the news industry and publishes
news when the news industry publishes news. This is significant because Facebook has claimed
they do not include any bias or filter their Trending News, but my results show otherwise.
Furthermore, only a small subset of Facebook’s demographic influence what goes on the
Trending, but my analysis show there is very little personalization by demographic. This raises
concern because under-represented demographics have Trending News that do not attractive
their interest or even worse, skew their views to a different demographic’s views. Lastly, even
though Facebook claims they do not present any bias in their news, my results show it gives
more news exposure to liberal news source than conservatives.
36
Reference
Alvarado, Oscar, and Annika Waern. “Towards Algorithmic Experience.” Proceedings of the 2018
CHI Conference on Human Factors in Computing Systems - CHI '18, 2018,
doi:10.1145/3173574.3173860.
Brown, Pete. “Facebook Struggles to Promote 'Meaningful Interactions' for Local Publishers, Data
Shows.” Columbia Journalism Review, 18 Apr. 2018, www.cjr.org/tow_center/facebook-local-
news.php.
“Browser Statistics.” W3Schools Online Web Tutorials, www.w3schools.com/browsers/default.asp.
Chakraborty, et al. Who Makes Trends? Understanding Demographic Biases in Crowdsourced
Recommendations. 1 Apr. 2017, arxiv.org/abs/1704.00139.
Cvijikj, Irena Pletikosa, and Florian Michahelles. “Monitoring Trends on Facebook.” 2011 IEEE Ninth
International Conference on Dependable, Autonomic and Secure Computing, 2011,
doi:10.1109/dasc.2011.150.
Diakopoulos, Nicholas. “Algorithmic Accountability.” Digital Journalism, vol. 3, no. 3, 2014, pp. 398–
415., doi:10.1080/21670811.2014.976411.
Engel, Pamela. “Here's How Liberal Or Conservative Major News Sources Really Are.” Business
Insider, Business Insider, 21 Oct. 2014, www.businessinsider.com/what-your-preferred-news-
outlet-says-about-your-political-ideology-2014-10.
Groshek, Jacob, and Megan Clough Groshek. “Agenda Trending: Reciprocity and the Predictive
Capacity of Social Networking Sites in Intermedia Agenda Setting across Topics over Time.”
2013, doi:10.12924/mac2013.01010015.
Groshek, Jacob, and Megan Clough Groshek. “Agenda Trending: Reciprocity and the Predictive
Capacity of Social Networking Sites in Intermedia Agenda Setting across Topics over Time.”
2013, doi:10.12924/mac2013.01010015.
Göös, Christine. “Blog.” Facebook Advertising Trends 2018, 15 Feb. 2018,
www.smartly.io/blog/facebook-advertising-trends-2018.
“Jaccard Index / Similarity Coefficient.” Statistics How To, www.statisticshowto.com/jaccard-index/.
Kazai, Gabriella, et al. “Personalised News and Blog Recommendations Based on User Location,
Facebook and Twitter User Profiling.” Proceedings of the 39th International ACM SIGIR
Conference on Research and Development in Information Retrieval - SIGIR '16, 2016,
doi:10.1145/2911451.2911464.
“Linux Nohup Command Help and Examples.” Computer Hope, 1 Apr. 2018,
www.computerhope.com/unix/unohup.htm.
“Nearly Half of U.S. Adults Get News on Facebook, Pew Says.” Nieman Lab,
www.niemanlab.org/2016/05/pew-report-44-percent-of-u-s-adults-get-news-on-facebook/.
Nunez, Michael. “Former Facebook Workers: We Routinely Suppressed Conservative
News.” Gizmodo, Gizmodo.com, 10 May 2016, gizmodo.com/former-facebook-workers-we-
routinely-suppressed-conser-1775461006.
Ohlheiser, Abby. “Three Days after Removing Human Editors, Facebook Is Already Trending Fake
News.” The Washington Post, WP Company, 29 Aug. 2016,
www.washingtonpost.com/news/the-intersect/wp/2016/08/29/a-fake-headline-about-megyn-
kelly-was-trending-on-facebook/?noredirect=on&utm_term=.2d050b7762f3.
“Pipenv & Virtual Environments¶.” Freezing Your Code - The Hitchhiker's Guide to Python,
docs.python-guide.org/en/latest/dev/virtualenvs/.
“PyPI – the Python Package Index.” PyPI, pypi.org/.
37
“'#Republic' Author Describes How Social Media Hurts Democracy.” NPR, NPR, 20 Feb. 2017,
www.npr.org/2017/02/20/516292286/-republic-author-describes-how-social-media-hurts-
democracy.
Riley, Charles. “Cambridge Analytica, Facebook and Your Data: Here's What to Know.” CNNMoney,
Cable News Network, 20 Mar. 2018, money.cnn.com/2018/03/19/technology/facebook-data-
scandal-explainer/index.html?iid=EL.
Rongala, Arvind. “Benefits of Python over Other Programming Languages.” Invensis Blog, 6 Apr.
2018, www.invensis.net/blog/it/benefits-of-python-over-other-programming-languages/.
Rosén, Josefin. “What Every Business Manager Should Know about Algorithm Audits.” SAS Learning
Post, 16 Oct. 2017, blogs.sas.com/content/hiddeninsights/2017/10/16/algorithm-audits/.
Tuesday, For five hours on. “Your Facebook Data Scandal Questions Answered.” CNNMoney, Cable
News Network, 11 Apr. 2018, money.cnn.com/2018/04/11/technology/facebook-questions-data-
privacy/index.html.
“Welcome to Python.org.” Python.org, www.python.org/about/.
“What Is a Central Processing Unit (CPU)? - Definition from Techopedia.” Techopedia.com,
www.techopedia.com/definition/2851/central-processing-unit-cpu.
“What Is a Proxy Server and Should You Risk Using One?” WhatIsMyIPAddress.com,
whatismyipaddress.com/proxy-server.
“What Is AWS? - Amazon Web Services.” Amazon, Amazon, aws.amazon.com/what-is-aws/.
“What Is Query? - Definition from WhatIs.com.” SearchSQLServer,
searchsqlserver.techtarget.com/definition/query.
www.facebook.com/tstocky/posts/10100853082337958.
38
Appendix
Appendix 1: Data from Trends
Trend 1: Jaccard Similarity – Wednesday, 3/7/18 to Thursday, 3/8/18
Type Interval Average Min Max Standard Deviation
Top Trends 5 0.5839342 0.181818 1 0.764155876
Top Trends 10 0.36485084 0.083333 0.666667 0.604028837
Top Trends 15 0.46557987 0.181818 0.818182 0.682334137
Top Trends 20 0.30984848 0.083333 1 0.556640355
Top Trends 25 0.43504274 0.266667 0.666667 0.659577694
Top Trends 30 0.26260823 0.181818 0.428571 0.512453144
Top Trends 35 0.43818681 0.357143 0.538462 0.661956806
Top Trends 40 0.19772727 0 0.5 0.444665349
Top Trends 45 0.28869048 0.1875 0.428571 0.537299243
Top Trends 50 0.21010101 0.181818 0.266667 0.458367767
Top Trends 55 0.31111111 0.266667 0.333333 0.557773351
Top Trends 60 0.22424242 0.181818 0.266667 0.473542421
Politics 5 0.86720143 0.666667 1 0.931236504
Politics 10 0.80554739 0.428571 1 0.897522921
Politics 15 0.73529501 0.428571 1 0.857493445
Politics 20 0.68851981 0.333333 1 0.82977094
Politics 25 0.69191919 0.333333 1 0.831816802
Politics 30 0.604662 0.333333 0.818182 0.777600157
Politics 35 0.60714286 0.333333 1 0.779193722
Politics 40 0.53627622 0.25 0.818182 0.732308831
Politics 45 0.46053293 0.176471 0.666667 0.678625767
Politics 50 0.48504274 0.25 0.666667 0.696450095
Politics 55 0.41666667 0.25 0.666667 0.645497224
Politics 60 0.42156863 0.176471 0.666667 0.649283164
Science/Tech 5 0.88660359 0.538462 1 0.9415963
Science/Tech 10 0.80707528 0.538462 1 0.898373685
Science/Tech 15 0.80313626 0.538462 1 0.896178697
Science/Tech 20 0.75203963 0.538462 1 0.867202183
Science/Tech 25 0.72105672 0.538462 0.818182 0.849150588
Science/Tech 30 0.67599068 0.538462 0.818182 0.822186521
Science/Tech 35 0.71794872 0.538462 1 0.847318546
Science/Tech 40 0.67832168 0.538462 0.818182 0.823602864
Science/Tech 45 0.58119658 0.538462 0.666667 0.7623625
Science/Tech 50 0.55555556 0.333333 0.666667 0.745355992
Science/Tech 55 0.53554779 0.25 0.818182 0.731811305
Science/Tech 60 0.39423077 0.25 0.538462 0.627877989
Sports 5 0.86987522 0.666667 1 0.932671015
Sports 10 0.81461676 0.666667 1 0.90256122
Sports 15 0.7425302 0.538462 1 0.861701919
Sports 20 0.75203963 0.538462 1 0.867202183
Sports 25 0.73304473 0.428571 1 0.856180316
Sports 30 0.65401265 0.428571 0.818182 0.808710488
Sports 35 0.64502165 0.428571 0.818182 0.803132396
Sports 40 0.65084915 0.428571 0.818182 0.806752224
Sports 45 0.58119658 0.538462 0.666667 0.7623625
Sports 50 0.5950716 0.428571 0.818182 0.771408838
Sports 55 0.58119658 0.538462 0.666667 0.7623625
39
Sports 60 0.53846154 0.538462 0.538462 0.733799386
Entertainment 5 0.85451803 0.538462 1 0.924401445
Entertainment 10 0.76923077 0.538462 1 0.877058019
Entertainment 15 0.73151091 0.538462 1 0.855284113
Entertainment 20 0.67810315 0.25 1 0.823470186
Entertainment 25 0.72999223 0.538462 1 0.854395827
Entertainment 30 0.64568765 0.538462 0.818182 0.803546916
Entertainment 35 0.67249417 0.538462 0.818182 0.82005742
Entertainment 40 0.53627622 0.25 0.818182 0.732308831
Entertainment 45 0.58119658 0.538462 0.666667 0.7623625
Entertainment 50 0.54456654 0.428571 0.666667 0.737947522
Entertainment 55 0.44230769 0.25 0.538462 0.665062172
Entertainment 60 0.53846154 0.538462 0.538462 0.733799386
Trend 2: Jaccard Similarities – Wednesday, 3/14/18 to Thursday, 3/15/18
Topic Interval Avg Min Max Standard Deviation
Top Trends 5 0.771027 0.181818 1 0.87808125
Top Trends 10 0.67951 0.083333 1 0.824324262
Top Trends 15 0.617197 0.083333 1 0.785618687
Top Trends 20 0.570689 0.181818 1 0.755439935
Top Trends 25 0.540294 0.083333 1 0.735046675
Top Trends 30 0.552632 0.083333 1 0.743392519
Top Trends 35 0.504391 0.083333 1 0.710204997
Top Trends 40 0.482348 0.083333 1 0.694512595
Top Trends 45 0.476563 0.083333 1 0.690335064
Top Trends 50 0.478761 0.083333 0.818182 0.691925498
Top Trends 55 0.559239 0.2 0.818182 0.74782316
Top Trends 60 0.381854 0 1 0.617943571
Politics 5 0.934812 0 1 0.966856783
Politics 10 0.903866 0 1 0.950718806
Politics 15 0.892534 0.666667 1 0.944739955
Politics 20 0.865741 0.666667 1 0.930451901
Politics 25 0.847842 0.538462 1 0.920783261
Politics 30 0.827506 0.538462 1 0.909673473
Politics 35 0.775712 0 1 0.880745192
Politics 40 0.783929 0.538462 1 0.88539767
Politics 45 0.770313 0.428571 1 0.87767478
Politics 50 0.775179 0.538462 1 0.880442414
Politics 55 0.754335 0.428571 1 0.868524846
Politics 60 0.726787 0.538462 1 0.852518095
Science/Tech 5 0.925648 0 1 0.962106145
Science/Tech 10 0.900104 0 1 0.948738369
Science/Tech 15 0.8816 0 1 0.938935301
Science/Tech 20 0.882997 0 1 0.939679005
Science/Tech 25 0.819106 0 1 0.90504485
Science/Tech 30 0.837753 0 1 0.91528822
Science/Tech 35 0.803059 0 1 0.896135442
Science/Tech 40 0.80303 0 1 0.896119581
40
Science/Tech 45 0.827433 0.538462 1 0.909633434
Science/Tech 50 0.774054 0 1 0.879803122
Science/Tech 55 0.815313 0.538462 1 0.902946783
Science/Tech 60 0.791667 0.666667 1 0.889756521
Sports 5 0.936478 0 1 0.967718029
Sports 10 0.921373 0 1 0.959881701
Sports 15 0.860981 0 1 0.927890588
Sports 20 0.877428 0 1 0.936711336
Sports 25 0.846797 0 1 0.920215671
Sports 30 0.824835 0 1 0.90820421
Sports 35 0.883052 0.538462 1 0.939708416
Sports 40 0.772339 0 1 0.878828068
Sports 45 0.864875 0.538462 1 0.929986402
Sports 50 0.79238 0 1 0.890157308
Sports 55 0.832526 0.538462 1 0.912428873
Sports 60 0.739899 0 1 0.860173814
Entertainment 5 0.812246 0 1 0.90124701
Entertainment 10 0.755799 0 1 0.869367226
Entertainment 15 0.619494 0 1 0.787079353
Entertainment 20 0.686027 0 1 0.82826743
Entertainment 25 0.652299 0 1 0.807650203
Entertainment 30 0.646465 0 1 0.804030252
Entertainment 35 0.708625 0 1 0.841798496
Entertainment 40 0.574722 0 1 0.758103934
Entertainment 45 0.638258 0 1 0.798910243
Entertainment 50 0.483321 0 1 0.695213116
Entertainment 55 0.757576 0 1 0.87038828
Entertainment 60 0.598533 0 1 0.773649411
Trend 3: Jaccard Similarity – Saturday, 4/21/18 to Sunday, 4/22/18
Topic Interval Avg Min Max Standard Deviation
Top Trends 5 0.830378 0.538462 1 0.911250764
Top Trends 10 0.734254 0.428571 1 0.856886131
Top Trends 15 0.695804 0.333333 1 0.834148785
Top Trends 20 0.613648 0.428571 1 0.783357043
Top Trends 25 0.566767 0.428571 0.666667 0.752839005
Top Trends 30 0.51974 0.333333 0.666667 0.720929622
Top Trends 35 0.528846 0.333333 0.666667 0.727218092
Top Trends 40 0.462062 0.333333 0.538462 0.67975124
Top Trends 45 0.403694 0.25 0.538462 0.635368813
Top Trends 50 0.354762 0.25 0.428571 0.595618926
Top Trends 55 0.37381 0.25 0.428571 0.611399643
Top Trends 60 0.342949 0.25 0.538462 0.585618236
Politics 5 0.922619 0.666667 1 0.960530607
Politics 10 0.861472 0.666667 1 0.928155085
Politics 15 0.819477 0.538462 1 0.90524959
Politics 20 0.780886 0.538462 1 0.883677419
Politics 25 0.719856 0.538462 1 0.848443222
Politics 30 0.731121 0.428571 1 0.855055981
Politics 35 0.679196 0.538462 1 0.824133366
Politics 40 0.665668 0.333333 0.818182 0.815884591
Politics 45 0.656122 0.428571 0.818182 0.810013368
41
Politics 50 0.516484 0.428571 0.538462 0.718667876
Politics 55 0.593407 0.428571 0.666667 0.770328887
Politics 60 0.607143 0.428571 0.666667 0.779193722
Science/Tech 5 0.96875 0.666667 1 0.984250984
Science/Tech 10 0.9375 0.666667 1 0.968245837
Science/Tech 15 0.902778 0.666667 1 0.950146188
Science/Tech 20 0.886905 0.5 1 0.941756212
Science/Tech 25 0.840909 0.666667 1 0.917010955
Science/Tech 30 0.805556 0.666667 1 0.897527468
Science/Tech 35 0.802083 0.5 1 0.895591053
Science/Tech 40 0.77381 0.5 1 0.879664438
Science/Tech 45 0.736111 0.5 1 0.857969178
Science/Tech 50 0.733333 0.5 1 0.856348839
Science/Tech 55 0.683333 0.5 1 0.826639785
Science/Tech 60 0.6875 0.5 1 0.829156198
Sports 5 0.906926 0.666667 1 0.952326838
Sports 10 0.830087 0.666667 1 0.911090874
Sports 15 0.819477 0.538462 1 0.90524959
Sports 20 0.760906 0.538462 1 0.872299124
Sports 25 0.716011 0.428571 0.818182 0.846174486
Sports 30 0.674437 0.538462 0.818182 0.821240936
Sports 35 0.645646 0.428571 0.818182 0.803521014
Sports 40 0.622378 0.333333 0.818182 0.788909134
Sports 45 0.584249 0.428571 0.666667 0.76436188
Sports 50 0.589744 0.538462 0.666667 0.767947648
Sports 55 0.535714 0.25 0.666667 0.731925055
Sports 60 0.470925 0.25 0.666667 0.686239687
Entertainment 5 0.934217 0.8 1 0.966549105
Entertainment 10 0.884668 0.666667 1 0.940567972
Entertainment 15 0.841362 0.538462 1 0.917258056
Entertainment 20 0.81292 0.538462 1 0.901620992
Entertainment 25 0.763479 0.428571 1 0.873772822
Entertainment 30 0.763533 0.538462 1 0.873803618
Entertainment 35 0.726399 0.538462 0.818182 0.85229021
Entertainment 40 0.669997 0.538462 0.818182 0.818533243
Entertainment 45 0.665584 0.357143 0.818182 0.815833571
Entertainment 50 0.649351 0.428571 0.818182 0.805822964
Entertainment 55 0.609424 0.357143 0.818182 0.780656076
Entertainment 60 0.61297 0.428571 0.818182 0.782924238
Trend 4: Jaccard Similarity – Thursday, 4/26/18 to Friday, 4/27/18
Topic Interval Avg Min Max Standard Deviation
Top Trends 5 0.785834 0.538462 1 0.886472813
Top Trends 10 0.674623 0.428571 1 0.821354367
Top Trends 15 0.584166 0.25 1 0.764307421
Top Trends 20 0.54326 0.25 0.818182 0.737061945
Top Trends 25 0.483971 0.333333 0.818182 0.695679937
Top Trends 30 0.435435 0.25 0.818182 0.659874939
Top Trends 35 0.46131 0.25 0.666667 0.679197706
Top Trends 40 0.367139 0.176471 0.666667 0.605920097
Top Trends 45 0.323063 0 0.666667 0.568385924
Top Trends 50 0.351961 0.176471 0.666667 0.593262829
42
Top Trends 55 0.330159 0.111111 0.666667 0.574594405
Top Trends 60 0.238051 0.052632 0.538462 0.487904762
Politics 5 0.849266 0 1 0.921556259
Politics 10 0.738833 0 1 0.859553719
Politics 15 0.73856 0.333333 1 0.859394953
Politics 20 0.691642 0.333333 1 0.831649981
Politics 25 0.511814 0 1 0.71541173
Politics 30 0.639435 0.333333 0.818182 0.799646572
Politics 35 0.594364 0.333333 0.818182 0.770950043
Politics 40 0.554814 0.25 0.666667 0.744858532
Politics 45 0.484235 0.176471 0.666667 0.695869758
Politics 50 0.415385 0 0.666667 0.644503387
Politics 55 0.488095 0.25 0.666667 0.698638131
Politics 60 0.452543 0.176471 0.666667 0.672712833
Science/Tech 5 0.819245 0 1 0.905121584
Science/Tech 10 0.665085 0 1 0.815527385
Science/Tech 15 0.776837 0.538462 1 0.881383684
Science/Tech 20 0.596023 0 1 0.772025275
Science/Tech 25 0.462326 0 0.818182 0.679945258
Science/Tech 30 0.612684 0.428571 1 0.782741089
Science/Tech 35 0.632784 0.333333 1 0.795477142
Science/Tech 40 0.610473 0.25 1 0.781327627
Science/Tech 45 0.591187 0.333333 1 0.768886592
Science/Tech 50 0.354762 0 0.666667 0.595618926
Science/Tech 55 0.571429 0.333333 1 0.755928946
Science/Tech 60 0.439139 0.176471 0.818182 0.662675857
Sports 5 0.754958 0 1 0.868883474
Sports 10 0.574212 0 1 0.757767446
Sports 15 0.618364 0 1 0.786361299
Sports 20 0.499304 0 0.818182 0.706614653
Sports 25 0.455333 0 1 0.674783333
Sports 30 0.422198 0 0.818182 0.649767783
Sports 35 0.476604 0 1 0.690365322
Sports 40 0.397762 0.111111 0.818182 0.630683863
Sports 45 0.389632 0.176471 0.818182 0.624204782
Sports 50 0.360218 0 0.818182 0.600181273
Sports 55 0.307516 0.111111 0.666667 0.554541558
Sports 60 0.286442 0.052632 0.666667 0.53520296
Entertainment 5 0.797166 0 1 0.892841527
Entertainment 10 0.627964 0 1 0.792441609
Entertainment 15 0.707398 0 1 0.841069477
Entertainment 20 0.578897 0 0.818182 0.760853004
Entertainment 25 0.592438 0 1 0.769699854
Entertainment 30 0.505236 0 1 0.710799202
Entertainment 35 0.502245 0 1 0.708692672
Entertainment 40 0.59431 0.333333 0.818182 0.770915334
Entertainment 45 0.54917 0.176471 0.818182 0.741059906
Entertainment 50 0.452525 0 0.818182 0.672699972
Entertainment 55 0.503692 0.176471 0.818182 0.70971289
Entertainment 60 0.478355 0.333333 0.818182 0.691632112
43
Trend 5: Jaccard Similarity – Tuesday, 5/15/18 to Wednesday, 5/16/18
Topic Interval Avg Min Max Standard Deviation
Top Trends 5 0.621414 0.25 1 0.788297871
Top Trends 10 0.49392 0.071429 1 0.702794129
Top Trends 15 0.392564 0 0.818182 0.626548963
Top Trends 20 0.34303 0 0.666667 0.58568722
Top Trends 25 0.295228 0 0.538462 0.543348802
Top Trends 30 0.259829 0 0.538462 0.509734303
Top Trends 35 0.179038 0 0.428571 0.423129155
Top Trends 40 0.238311 0 0.538462 0.488170778
Top Trends 45 0.153651 0.034483 0.25 0.391983565
Top Trends 50 0.131865 0.034483 0.25 0.363131689
Top Trends 55 0.166667 0 0.333333 0.40824829
Top Trends 60 0.214286 0 0.428571 0.46291005
Politics 5 0.759806 0.2 1 0.871668392
Politics 10 0.65994 0.153846 1 0.812366949
Politics 15 0.580551 0.153846 1 0.761938857
Politics 20 0.483766 0.071429 0.818182 0.695533057
Politics 25 0.452298 0.071429 0.818182 0.672530819
Politics 30 0.42371 0 0.818182 0.650929815
Politics 35 0.365764 0.034483 0.666667 0.604783884
Politics 40 0.302093 0.034483 0.538462 0.54962946
Politics 45 0.246032 0.071429 0.333333 0.496015873
Politics 50 0.25 0.071429 0.428571 0.5
Politics 55 0.286472 0.034483 0.538462 0.53523093
Politics 60 0.304945 0.071429 0.538462 0.552218304
Science/Tech 5 0.787653 0 1 0.887498286
Science/Tech 10 0.709751 0.25 1 0.842467424
Science/Tech 15 0.678055 0.153846 1 0.82344112
Science/Tech 20 0.607567 0.111111 1 0.779465866
Science/Tech 25 0.524032 0.153846 0.818182 0.723900217
Science/Tech 30 0.515917 0.071429 0.818182 0.718273914
Science/Tech 35 0.454924 0.034483 0.818182 0.674480827
Science/Tech 40 0.463709 0.034483 0.818182 0.680961603
Science/Tech 45 0.370469 0.034483 0.538462 0.608661328
Science/Tech 50 0.302093 0.034483 0.538462 0.54962946
Science/Tech 55 0.333333 0 0.666667 0.577350269
Science/Tech 60 0.409091 0 0.818182 0.639602149
Sports 5 0.626893 0 1 0.791765888
Sports 10 0.526892 0 1 0.72587358
Sports 15 0.654069 0.428571 0.818182 0.808745488
Sports 20 0.414038 0 0.666667 0.643458113
Sports 25 0.305556 0 0.538462 0.552770798
Sports 30 0.478555 0.333333 0.636364 0.691776538
Sports 35 0.420214 0.176471 0.583333 0.64823926
Sports 40 0.353846 0.25 0.461538 0.59484969
Sports 45 0.324405 0.1875 0.5 0.569565415
Sports 50 0.095238 0 0.285714 0.3086067
Sports 55 0.24697 0.227273 0.266667 0.496960458
Sports 60 0.24697 0.227273 0.266667 0.496960458
Entertainment 5 0.656177 0 1 0.810047626
Entertainment 10 0.546753 0 1 0.739427648
Entertainment 15 0.634911 0.26087 1 0.796813319
44
Entertainment 20 0.322594 0 0.818182 0.567973655
Entertainment 25 0.226018 0 0.428571 0.475413445
Entertainment 30 0.398289 0.176471 0.818182 0.631101779
Entertainment 35 0.369545 0.16 0.818182 0.607902504
Entertainment 40 0.291644 0.16 0.538462 0.540040778
Entertainment 45 0.232906 0.115385 0.333333 0.482603339
Entertainment 50 0.038462 0 0.115385 0.196116135
Entertainment 55 0.203704 0.074074 0.333333 0.451335467
Entertainment 60 0.287088 0.035714 0.538462 0.535805853
Appendix 2: Data from Trends and Tabs
Trends and Tabs 1: Every News Source Facebook Published
CNNMoney
The New York
Times Ars Technica Liverpool FC www.gizbot.com
7News - WHDH
Boston WDSU News MarketWatch
Idaho State
Journal
moneycontrol.co
m WESH 2 News
WGN TV PCWorld inews.co.uk Mirror Football GSMArena.com Fox 35 WOFL WFMJ Tennessean
nativenewsonline
.net
www.nationalne
wswatch.com
Tulsa's Channel 8
- KTUL
ESPN
Portland Press
Herald
lovinmanchester.
com NHL Guitar World SB Nation
www.13wmaz.co
m
WGRZ - Channel
2, Buffalo
Idaho Press-
Tribune
www.nationalobs
erver.com Bradenton Herald
Reuters RealClearPolitics
medicalxpress.co
m
London Evening
Standard
www.phonearena
.com
The Seattle
Times
Anchorage Daily
News
atlantablackstar.c
om www.kivitv.com
www.palmbeach
post.com
ABC Action
News - WFTS -
Tampa Bay
6abc Action
News Recode Autoblog theScore SlashGear
Talking Points
Memo Bustle
CBS7 News /
KOSA-TV KTVB The Raw Story NBC 6
The Bangor
Daily News
Smithsonian
Magazine
www.belfastlive.
co.uk talksport.com SPIN WDTN-TV Inverse
Hawaii News
Now Mic Teen Vogue TheBlaze
Bleacher Report The Epoch Times Digital Trends
Las Vegas
Review-Journal Stars and Stripes
The Columbus
Dispatch
The Washington
Times
WMC Action
News 5 National Post teleSUR
morungexpress.c
om
WPMT FOX43 The Guardian www.kqed.org IGN TechRadar
WRCB Channel
3 Eyewitness
News WebMD news360.com
Fox Carolina
News www.whio.com Quartz
News 12 Long
Island The Verge LiveScience ComicBook.com
www.zerohedge.
com WWLP-22News WWLTV Slate.com
East Idaho
News.com 9to5toys.com WMUR-TV
FOX8 Washington Post MacRumors
Entertainment
Weekly
taskandpurpose.c
om
www.consumeraf
fairs.com
www.dailywire.c
om Townhall.com www.wtol.com BGR WPSD-TV
NBC Sports WFAA
Manchester
Evening News New York Post
Android
Authority 13 On Your Side KMBC 9
www.centralmain
e.com
www.dailypost.c
o.uk
www.smartbrief.
com News 12 NBC 26
TechCrunch WSOC-TV
www.moneysavi
ngexpert.com
signup.freebies.c
om CBC Sports
www.fox25bosto
n.com
Washington
Examiner KVUE philly.com The Next Web
www.algemeiner.
com
The Hill Yahoo
www.news-
mail.com.au Variety CTV News speedsociety.com
WHO TV
Channel 13 News blogs.edweek.org
Lebanon Daily
News VentureBeat VICE News
TIME Daily Telegraph
www.sciencedail
y.com
The Baltimore
Sun VICE
www.bollywoodl
ife.com
KXXV Central
Texas News Now junkee.com exclaim.ca
www.512tech.co
m ABC 7 Chicago
WGNO - News
With A Twist www.edweek.org
www.sciencenew
s.org
www.catchnews.
com
www.comingsoo
n.net
The Dallas
Morning News NBC15 Madison WLOS ABC 13 thatgrapejuice.net
The Business
Journals WILX News 10
14 NEWS Engadget
www.statnews.co
m JoBlo.com Page Six WMTW-TV 10TV - WBNS
The Gaston
Gazette www.rap-up.com
Drew Curtis'
Fark.com WOOD TV8
azfamily 3TV
CBS 5 NBC Chicago The Atlantic WKYT TVLine abc3340.com
www.irishexamin
er.com The Times-News djbooth.net
FOX 47 News -
WSYM
wildfiretoday.co
m
BBC News NBC News The Sun
heroichollywood.
com Den Of Geek UK
Pittsburgh Post-
Gazette
FOX31
KDVR.com Boing Boing
www.greaterkash
mir.com KING 5
www.defencenew
s.in
Business Insider WIRED
The Mercury
News Gizmodo Vulture Wichita Eagle
ABC30 Action
News Hot Air Herald Sun
Popular
Mechanics Democracy Now!
The Charlotte
Observer Fortune The Scientist MovieWeb
www.flickeringm
yth.com WIBW KTLA 5 News
National
Geographic news.com.au Space.com IndyStar
Fox News Daily Mail The Daily Beast The A.V. Club GameSpot WIS-TV
The Sacramento
Bee
Asheville Citizen
Times
The West
Australian
www.zmescience
.com
The Hamilton
Spectator
www.heraldscotl
and.com People Metro Pitchfork
www.jambase.co
m 9NEWS (KUSA) syracuse.com CommonDreams
www.9news.com.
au ABC6 News
Sauk Valley
Media
InForum WJAC-TV News FOX Sports rolltide.com Rolling Stone al.com thefilmstage.com The World The FADER Boston Herald
electronicintifada
.net
Region 8 News Breitbart Goal Indonesia
www.burnleyexp
ress.net The Irish Times
www.campusrefo
rm.org theplaylist.net WNEP-TV
The Jerusalem
Post / JPost.com GOLF.com thehustle.co
KCBD
NewsChannel 11
Centre Daily
Times NBA E! News MLive.com
The Post and
Courier
ComicBookMovi
e.com PennLive.com
www.middleeast
monitor.com
The News &
Observer
www.hookem.co
m
KSLA News 12
Omaha World-
Herald
www.beinsports.
com Daily Express The Daily Caller ottawasun.com Highsnobiety Reading Eagle
Sky News
Australia NFL
www.technologyr
eview.com
KOTV - News
On 6 ABC News Chicago Tribune
The Hollywood
Reporter FOX 17
Tampa Bay
Times NBC Bay Area WFMZ
The Times of
Israel
www.thisismone
y.co.uk TigerNet.com
Newsweek Yahoo Finance www.espn.co.uk HuffPost Canada
New York
Magazine
The State
Newspaper SFGATE
The Edmonton
Sun
consequenceofso
und.net thespun.com
WTVC-TV
NewsChannel 9
News
NJ.com
Al Jazeera
English
www.football365
.com
Life & Style
Weekly Ottawa Citizen LJWorld.com
The Salt Lake
Tribune Toronto Sun bust.com
theundefeated.co
m
www.evertonfc.c
om
www.thisisinside
r.com azcentral
www.livesoccert
v.com
www.popsugar.c
om THE WEEK
crimewatchdaily.
com
www.thelondone
conomic.com
LNP +
LancasterOnline
www.dailyedge.i
e Golf Channel www.cbr.com
WTHI-TV
The Boston
Globe Sky Sports
www.screendaily
.com
ABC 7 News -
WJLA
The Times of
India KGAN CBS 2 The Incline www.flare.com Golf Digest Newsarama
Yahoo News The Denver Post Sporting News
www.shieldsgaze
tte.com Adweek
KENS 5 &
Kens5.com KWWL Public Opinion FanSided www.mlbtraderumors.com
Indiatimes Forbes www.teamtalk.co www.snapchat.co KALB News WPTV LifeNews.com The York lithub.com Philadelphia Eagles
45
m m Channel 5 Dispatch
Mashable NBC Connecticut
www.tribalfootba
ll.com The Toronto Star
KCTV5 News
Kansas City 10News WTSP The Gazette Ledger-Enquirer NYLON www.racingpost.com
phys.org NDTV
WEEI Sports
Radio Network
Time Out
London Ocala StarBanner KOAT
The Des Moines
Register
York Daily
Record/Sunday
News
www.instylemag.
com.au Saturday Down South
Bloomberg Newsday 247sports.com Today Show
www.uppermichi
ganssource.com FOX 12 Oregon
LifeSiteNews.co
m cleveland.com Mediaite www.seccountry.com
www.brisbanetim
es.com.au NPR clutchpoints.com Vanity Fair
WAFB Channel
9 Deadspin www.aclu.org Boston.com The New Yorker Foreign Policy
CBS News
Philadelphia
Magazine
english.manoram
aonline.com
twinning.popsuga
r.com WITN-TV GeekTyrant
WSMV News 4,
Nashville ABC Fox News Insider who…"
CBS Sports POLITICO Global News Eurogamer NewsOne Screen Rant Fox 13 News The Oregonian FOX 61 www.dailysabah.com
CNBC Salon NESN bearingarms.com TheGrio AJC KATU News KOIN 6 680 NEWS The Quint
CNET The Telegraph Scroll ABC7 AlterNet KHOU 11 News KRNV News 4 StateCollege.com
www.jerusalemo
nline.com NBC Charlotte
CNN USA TODAY Yahoo Sports
Deadline
Hollywood
Detroit Free
Press KSDK News
The Kansas City
Star KGW-TV The Jewish Press Firstpost
www.eveningexp
ress.co.uk Vox ABC15 Arizona Indian Express The Root NOLA.com
KTVN Channel 2
News Popular Science MassLive.com www.therichest.com
Fast Company
The Wall Street
Journal BBC Sport
Ultimate Classic
Rock
7 Eyewitness
News WKBW
The Advocate
(Baton Rouge,
LA) KXAN News Roll Call
The Roanoke
Times Android Central
www.gizmodo.co
m.au Fox Business Daily Record www.bgr.in
Newschannel 3,
CBS News,
WWMT, West
Michigan
The San Diego
Union-Tribune
St. Louis Post-
Dispatch
Statesman
Journal
South Florida
Sun Sentinel Fox2Now
HuffPost Yahoo Canada
www.eurosport.c
o.uk Complex KTRE-TV
FOX6 News
Milwaukee WLOX-TV Us Weekly constitution.com WSFA-TV
The Independent 9to5Mac.com GiveMeSport /Film
WKBN 27
Youngstown OH
WVUE FOX 8
News article.wn.com
The Register-
Guard
indiancountryme
dianetwork.com WSBT-TV
www.japantimes.
co.jp
I fucking love
science www.joe.co.uk WIRED UK AOL KPLC 7 News
The Virginian-
Pilot Idaho Statesman ipolitics.ca News 4 Tucson - KVOA
Daily Mirror Miami Herald
Milwaukee
Journal Sentinel BuzzFeed Hindustan Times
Los Angeles
Times Billboard
KHQ Local
News
WISC-TV /
Channel 3000 www.nationalmemo.com
MSN appleinsider.com
www.liverpoolec
ho.co.uk Financial Times
sports.mynorthwe
st.com Radio Times
Corpus Christi
Caller-Times Willamette Week
www.financialex
press.com yourstory.com
Top Related