Accern whitepaper article volume distribution
-
Upload
accern-corporation -
Category
Economy & Finance
-
view
81 -
download
0
Transcript of Accern whitepaper article volume distribution
1
Accern API Whitepaper
Article Volume Distribution Analysis
This paper examines the Accern data set, which
contains 14,230,577 records of financial news
articles spanning from August 2012 to August 2016
and how they are distributed across various metrics
such as sentiment, sectors, etc.
2
1. INTRODUCTION 3
2. TOTAL ARTICLE VOLUME 3
A. Volume of Articles Per Year 3
3. MARKET SECTORS 4
A. Total Volume of Articles by Sector 4
B. Annual Distribution Breakdown of Articles by Sector 5
C. Cumulative Annual Breakdown of Articles by Sector 6
4. INDUSTRY CATEGORIES 7
A. 20 Industries with the Highest Volume of Articles 7
5. EVENT GROUP ANALYSIS 8
A. Total Number of Articles per Year for Event Group 8
6. OVERALL SOURCE RANK 9
A. Number of Articles for Overall Source Ranks 9
7. ARTICLE TYPE 10
A. Annual Breakdown of Articles by Article Type 10
B. Percentage Breakdown of Articles by Article Type 10
8. ARTICLE SENTIMENT 11
A. Breakdown of Article Sentiment 11
B. Annual Breakdown of Total Vol. of Articles by Article Sentiment 12
9. EVENT IMPACT SCORE ON ENTITY 13
A. Annual Breakdown of Vol. of Articles by Event Impact Score 13
1. INTRODUCTION
Accern monitors over 20 million public websites in real time and uses proprietary AI algorithms
to help financial market investors find important stories to act on. The company derives metrics
such as sentiment, impact, source rankings, and more from every relevant article. This study
examines a data set of 14,230,577 records on financial news articles spanning from August
2012 to August 2016. Each record includes information about the article, and metrics derived
from technology utilized by the Accern. This brief and basic study is limited to the distribution
of the article records with respect to a selected number of fields in the records.
3
2. TOTAL ARTICLE VOLUME
The graph below shows the distribution of all the articles across the examined time period.
2012 and 2016 are partial years, with 2012 beginning on August 25 (approximately 4
months), and 2016 ending on July 31 (exactly 7 months). For the years in which there is a
full set of data (2013, 2014, and 2015), the totals are relatively consistent.
A. Volume of Articles Per Year
For 2012, the total 4-month period is slightly lower by about 6 percent relative to the full years in
the data set. However, for the year 2016, the total for the 7-month period shows a relative increase
of approximately 26 percent. The fact that the data for 2012 reflects the last 4 months of the year,
while the data for 2016 reflects the first 7 months, suggest that there may by a seasonal
component to how much news is generated over the course of a year.
3. MARKET SECTORS
The sector and industry category associated with each record is the primary means by
which to categorize the articles in the data. Each record includes information about the
article and metrics derived from technology utilized by Accern.
A. Total Volume of Articles by Sector
4
A close inspection of the above chart shows the volume distribution of articles among stock
market sectors. It reveals a very discernable pattern. The technology and consumer services
sectors garner as much as 58 percent of the news articles within the entire data set.
B. Annual Distribution Breakdown of Articles by Sector
5
Furthermore, this pattern is consistent across the entire duration of the data. The chart above
shows multiple years for each sector and clearly indicates the consistency of the technology and
consumer services sectors that account for a significant amount of the articles in the data set for
each respective year.
C. Cumulative Annual Breakdown of Articles by Sector
6
Additional confirmation is provided in the above chart, in which the shaded areas for the years in
which there is complete data are very similar, indicating that the total number of articles, and how
they are distributed across sectors, is consistent across all the years included in the data.
This concentration of articles in the technology and consumer services sectors may be the result
of several factors. First, these two sectors, year-to-year, are amongst the most volatile and most
traded (in terms of volume) sectors in the stock market. Second, both sectors are associated with
products and services that are highly ubiquitous in all aspects of American society, and throughout
the world. Also, these same products and services are very pervasive in other sectors. Lastly,
news is, in many ways, a profit-generating industry. Therefore, news outlets and journalists have
a propensity to write stories on topics that are popular for the sake of increasing readership, hence
the emphasis placed on “popular” sectors like technology.
4. INDUSTRY CATEGORIES
Stock market industry categories are a subset of market sectors. The chart below showing
the breakdown of the top 20 industry categories in terms of volume of articles, further
accentuates the conclusions drawn from the sector charts. Four of the top five industries are
directly related to the technology sector (the sector with the highest article volume).
A. 20 Industries with the Highest Volume of Articles
7
Virtually all of the industry categories listed on the chart utilize technology in one way or another
to manufacture, deliver, or provide a product or service. This makes the technology sector
interrelated with most industries. The ample representation of the consumer services sector is
also confirmed, with numbers 4, 6, and 7 on the industry categories list all being directly related
to it. This makes seven of the top 20 industries on the list directly related to the technology or
consumer services sectors. A closer inspection also reveals that a total of 12 of the 20 industries
on the list share the same two sectors. This concentration of article volume is consistently
reflected throughout the data set.
5. EVENT GROUP ANALYSIS
News articles are the result of news events taking place that are worthy of reporting and
disseminating information about to the general public. These types of news events will
determine the impact the news will have on markets. Each record includes a field that
categorizes the news articles according to the type of event.
A. Total Number of Articles per Event Group
8
The events that prompt the news articles to be written are categorized according to the 16
different events listed in the horizontal axis of the above chart. Company earnings and general
business actions garner the highest concentration of articles with respect to event group types.
These categories are also very common and periodic in nature. The events that are not so
common, and are unexpected, i.e. disasters and criminal and legal actions, are the ones that
have a greater impact.
6. OVERALL SOURCE RANK
The source from which the news article is obtained is also ranked. This helps to determine
the validity of the article. The source of each article is ranked on a scale of 1 to 10. The
charts below show the distribution of all of the articles in the data. The chart indicates a very
strong concentration in the range of 4 to 9, in sharp contrast with the rest of the scale. The
lower chart shows the consistency of this pattern across all of the years.
9
A. Number of Articles for Overall Source Rank
Both of the above charts clearly indicate that
news feeds are where the bulk of the articles
are sourced from. The upper chart further
indicates that this has not changed much over
the years. While the total number of articles has
varied, the percentage breakdown from year to
year has remained consistent.
7. ARTICLE TYPE
Another data field related to the source of the article that is included in each record is one
that identifies whether the article was sourced from a news feed or a blog. Along with other
data about the article, this is critical in determining the overall validity and reliability of the
article and its source.
10
A. Annual Breakdown of Articles by Article Type (News vs. Blog)
B. Percentage Breakdown of Articles by Article Type (News vs. Blog)
8. ARTICLE SENTIMENT
An important metric derived from Accern technology is article sentiment. This metric
measures the positive, negative, or neutral sentiment of each article, and assigns a number
from -1, indicating the highest degree of negative sentiment, to +1, indicating the highest
degree of positive sentiment.
The chart below breaks down the three sentiment categories across all years.
A. Breakdown of Article Sentiment
11
Articles with a greater indication of sentiment neutrality tend towards a measurement of “0” on
the scale, which is the measurement with the highest concentration of articles, as seen in both
charts below. Another noticeable characteristic is the higher volume of articles on the positive
side of the sentiment scale, and the wider distribution.
B. Annual Breakdown of Total Volume of Articles by Sentiment
12
Although there are slight variations, this is also consistent year by year. This consistency may be
attributed to the fact that the stock market has been trending upwards over the duration of the
time period covered by the data.
9. EVENT IMPACT SCORE ON ENTITY
Each article record also includes a field that measures the impact of the article on the entity
that the news article is about. The impact is scored on a scale of 1 to 100. The chart below
shows a peak in the volume of articles with an impact score in the 26 to 30 range, followed
by a steady decrease all the way to the top of the scale.
A. Annual Breakdown of Volume of Articles by Event Impact Score on Entity
13
The distribution pattern is very consistent from year to year, as shown in the lower chart. This
consistency should make finding an optimal threshold on which to base a trading decision easier
to determine.