11_Statistical Analysis of Users Who Chatting About Beer on Twitter
-
Upload
rodrigo-ribeiro -
Category
Documents
-
view
219 -
download
0
description
Transcript of 11_Statistical Analysis of Users Who Chatting About Beer on Twitter
-
Statistical Analysis of users who chatting about beer on Twitter
Rodrigo Otvio de Arajo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen
PMKT Revista Brasileira de Pesquisas de Marketing, Opinio e Mdia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), So Paulo, Brasil, V. 14, pp. 175-195, Abril, 2014 - www.revistapmkt.com.br 1
Statistical Analysis of users who chatting about beer on Twitter1
Anlise de Usurios que Conversam sobre Cerveja no Twitter
Submission: Mar./28/2014 - Approval: Apr./14/2014
Rodrigo Otvio de Arajo Ribeiro
Doctor and Master in Production Engineering from Universidade Federal Fluminense - UFF.
Bachelor's degree in Statistics from Escola Nacional de Cincias Estatsticas - ENCE/IBGE. He has
a large experience on statistical modeling in large databases. Nowadays he is Director of Marketing
Intelligence at IBOPE DTM.
E-mail: [email protected]
Professional Address: IBOPE DTM - Rua Voluntrios da Ptria - n 89 - sala 803 - 22270-000 -
Botafogo - Rio de Janeiro/RJ Brasil.
Tarsila Gomes Bello Tavares
Bachelor's degree in Statistics from Escola Nacional de Cincias Estatsticas - ENCE/IBGE.
Nowadays she is Coordinator of Marketing Intelligence at IBOPE DTM.
E-mail: [email protected]
Daniel de Oliveira Cohen
Bachelor's degree in Statistics from the State University of Campinas - UNICAMP. He performs
statistical analysis as regression, segmentation and social network analysis on data collected
through quantitative surveys. Nowadays he is Statistician at IBOPE Inteligncia.
E-mail: [email protected]
1 This was one of the papers presented at ABEPs 6th Brazilian Market, Opinion and Media Research Congress (held on March 24 and 25, 2014), winner of the Prize Alfredo Carmo turned into an article by its author(s), submitted to PMKT, and approved for publication.
-
Statistical Analysis of users who chatting about beer on Twitter
Rodrigo Otvio de Arajo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen
PMKT Revista Brasileira de Pesquisas de Marketing, Opinio e Mdia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), So Paulo, Brasil, V. 14, pp. 175-195, Abril, 2014 - www.revistapmkt.com.br 2
ABSTRACT
The identification of influential users in social media is a subject that has generated great interest by
companies in recent years. This work aims to evaluate this influence through the use of graphs for
understanding the existing relational structure between users, established through their
conversations on Twitter. Exploratory data analysis and text mining techniques have been used to
further conclusions about the subject. The "conversation environment" was chosen is Brazilian beer,
and the search related words were the major active brands in domestic market. The evaluation was
performed considering a sample of 25 days between the months of December 2013 and January
2014.
KEYWORDS:
Beer, Twitter, Social network analysis.
RESUMO
A identificao de usurios influentes nas mdias sociais um assunto que tem gerado grande
interesse por parte das empresas nos ltimos anos. Este artigo visa avaliar esta influncia por meio
da utilizao de grafos para entendimento da estrutura relacional existente entre os usurios,
estabelecida por suas conversas no Twitter. A anlise exploratria de dados e as tcnicas de
Minerao de Textos foram utilizadas para concluses complementares acerca do assunto. O
ambiente de conversas escolhido para avaliao foi o das cervejas brasileiras, sendo as buscas
realizadas por palavras relacionadas s principais marcas atuantes no mercado nacional. A avaliao
foi realizada, considerando uma amostra de 25 dias entre os meses de dezembro de 2013 e janeiro
de 2014.
PALAVRAS-CHAVE: Cerveja, Twitter, Anlise de Redes Sociais.
-
Statistical Analysis of users who chatting about beer on Twitter
Rodrigo Otvio de Arajo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen
PMKT Revista Brasileira de Pesquisas de Marketing, Opinio e Mdia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), So Paulo, Brasil, V. 14, pp. 175-195, Abril, 2014 - www.revistapmkt.com.br 3
1. INTRODUCTION
This article aims to identify the most influential users on Twitter who posted messages about beer.
A sample of 25 days between the months of December 2013 and January of 2014 was used,
considering only posts made in Portuguese, in Brazil.
The content of the conversations was also assessed by applying text mining algorithms. A
descriptive analysis of the general behavior of Twitter users who talk about the subject, toward an
understanding of aspects about the use and impact of different brands and the users profile was
performed.
The largest number of posts on the subject took place in the afternoon and evening, where there is a
strong asymmetry with respect to the distribution of the number of messages posted by users; the
majority posted only a single message during the period. By observation of the peaks in the time
series of the total number of messages posted, it was possible to evaluate the effect of holidays: the
behavior of users during the New Year was very close to what was observed at Christmas.
As semantic evaluation of posts about beer, many topics (themes) within the main subject were
identified. This kind of information can assist companies in targeting their strategies and ongoing
monitoring of consumer behavior. It was noticed that when many users post messages about beer,
they mention information about where, with whom, or even when they will consume it. Many times
they also mention the brands of their preference as well.
The analysis of influence of users in social networks allows the creation of various marketing
strategies. Most influential users on a particular subject can be contacted by companies to publicize
their brands being used as links between companies and other end users.
The measurement of the influence made in this work was done based on the number of connections
that the user had during the study period. On Twitter, users can target their messages to each other
and pass on information disclosed by any of their connections (retweets).
One way to assess the degree of influence of users consists in verifying the number of connections
that pass their messages or the number of connections to which they direct their posts. This paper
aims to evaluate these two points of view.
This study was structured as follows: after the introduction, in the second part, we pointed out the
main features of the different techniques used in the analysis. In the third, there was a small
explanation about Twitter and its strong growth in Brazil. In the fourth part, the contextualization of
the domestic beer market, its evolution, its trends and key brands. In the fifth part, it was detailed
the analytical methodology applied, clarifying the questions answered by the study. In the sixth
part, there was an explanation about the data information used. In the seventh, the results of data
analysis were shown. In the eighth, it was presented the main conclusions and, finally, the limitation
and suggestions for new research.
-
Statistical Analysis of users who chatting about beer on Twitter
Rodrigo Otvio de Arajo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen
PMKT Revista Brasileira de Pesquisas de Marketing, Opinio e Mdia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), So Paulo, Brasil, V. 14, pp. 175-195, Abril, 2014 - www.revistapmkt.com.br 4
2. THEORETICAL BACKGROUND
2.1 SOCIAL NETWORK ANALYSIS
A social network is determined by a set of actors (or nodes) and pre-established relationships
between them (WASSERMAN; FAUST, 1994). Actors can take many forms and represent different
groups of individuals as users, companies and entities. Because of its great flexibility, social
network analysis (also commonly called Social Network Analysis - SNA) can be applied in almost
any context.
Generally, SNA techniques are visually represented by "graphs". In these graphs, the actors or
nodes are represented by dots and the relationship between a pair of nodes is defined by edges or
connections.
The connections can be direct when it is important to highlight that the actor was the source of this
relationship (WASSERMAN; FAUST, 1994). According to the authors, in addition to being
visually displayed, a social network can be described by an n x n matrix, where n is the total
number of nodes on that network.
The existence of relationship between the pair of nodes u and v, for example, would be given the
value 1 in the corresponding cell of the matrix. The reading can be done as follows: the rows
represent nodes where the relationship goes (actors of origin) and the columns, where the
relationship ends (actors of destination). Thus, an indirect social network will always give a
symmetric matrix.
In order to assist in understanding the relationships between the actors, there are some metrics that
can consider the network as a whole or each node in specific. Among them are:
Degree (degree): number of edges connected to each node. PageRank: spectral measure of popularity set to direct graphs with non-negative weights of
connections (PAGE et al., 1998) , and can be given by:
= (1 )1 + (
)
Where:
n = total number of nodes in the network (users).
A = {1,0, +1}nn is the adjacency matrix with values Auv = +1 when user u marked user v as a friend and Auv = 1 when user u marked user v as a foe. A is sparse, square and asymmetric.
= absolute diagonal matrix defined by = || .
nn = is a matrix full of ones of the specied size, and 0 < < 1 is the teleportation parameter.
The matrix G is left-stochastic, each row sums to one (KUNEGIS; LOMMATZSCH;
BAUCKHAGE, 2009).
The software used in this study was Gephi, a freeware that allows different forms of editing and
customization of the final results. It can be used in the creation of graphs and calculating the metrics
analysis.
-
Statistical Analysis of users who chatting about beer on Twitter
Rodrigo Otvio de Arajo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen
PMKT Revista Brasileira de Pesquisas de Marketing, Opinio e Mdia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), So Paulo, Brasil, V. 14, pp. 175-195, Abril, 2014 - www.revistapmkt.com.br 5
2.2 TEXT MINING
Text Mining is the process of extracting useful information or knowledge from unstructured text
documents (BARYON; LAKE, 2008). In the context of this study, this technique is applied to
identify patterns of comments and opinions expressed by users of Twitter about the Brazilian beer
market.
Information Retrieval Techniques or Information Extraction Techniques are applied over a set of
texts, with the aim of making it structured. From these structured data, data mining techniques are
applied to obtain relevant information, as shown in Figure 1.
Source: BARION, E. C. N.; LAGO, D. Minerao de textos. Revista de Cincias Exatas e Tecnologia, 2008.
FIGURE 1 Text Mining Process
The first step Mining is the indexing process that stores an index structure, from the words of the
text, and makes it possible to search for documents by all terms contained therein (SALTON;
MCGILL, 1983). Some steps to an analysis of Text Mining (BARYON; LAKE, 2008):
Lexical Analysis: converts a string into a sequence of words that are candidates for index terms. Removal of Stop-words: removes a set of words that appear frequently in texts, but have no
semantic value, such as prepositions, articles and conjunctions. This phase is extremely
important, because it reduces the base to be indexed and facilitates mining.
Stemming: removes all variations of words, leaving only the root of each, for example, the word dreaming" becomes identified as the root of "dream".
Selection of index terms: determines which words or radical elements will be used as indexing. These words are selected according to the weight assigned to them.
Bag of Words - BOW: a matrix in which each different term in this collection of documents is indexed. From this indexing, each document can be represented by a first vector xn, where n is
the total number of terms; each entry of this vector is the number of times the terms appear in
this document (SIVIC, 2009).
Determination of weights: filling the BOW matrix is based on metrics that weigh the frequency of occurrence of terms in documents and in the total collection (set of all documents). The
metric most commonly used for this purpose is called tf-idf (term frequency inverse document
frequency).
Correlation (similarity) between terms : BOW based on the matrix, one can calculate the Pearson correlation between different words, in order to measure how they are related by the
formula (HUANG, 2008):
(
,
) = , ,=1 ,
=1 ,
=1
[ ,2
=1 ,=1
2][ ,
2=1 ,
=1
2]
-
Statistical Analysis of users who chatting about beer on Twitter
Rodrigo Otvio de Arajo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen
PMKT Revista Brasileira de Pesquisas de Marketing, Opinio e Mdia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), So Paulo, Brasil, V. 14, pp. 175-195, Abril, 2014 - www.revistapmkt.com.br 6
Where:
= vector created by the BOW
m = total number of distinct terms in the entire collection of documents
,= weight (tf-idf) of term t in the document a.
2. TWITTER
Twitter was founded in 2006 by partners Jack Dorsey, Evan Williams, Biz Stone and Noah Glass,
in San Francisco, USA. The service is a social network that allows users to post and read tweets,
which are nothing more than a 140 character messages. Its access can be made directly on any
internet browser, for applications in mobiles. In some countries, the posts can be made by SMS as
well. The idea quickly spread and gained popularity throughout the world: in 2012, there were more
than 500 million registered users who posted 340 million tweets per day (LUNDEN, 2012).
According to the information site of hits on web pages (), Twitter was one of the
ten most accessed pages of the world that year.
Once registered, the user defines an address on the site that is not already being used. From then on,
he will always be known by other users for that address preceded by the @ symbol.
Set this address and registered the account, the user can "follow" or "be followed" by other users.
This means when a user posts something, the message appears directly for the users that follow him.
By default, tweets are publicly visible. However, you can restrict viewing messages only to their
followers. Another possibility is to repost the message that has already been posted by someone
else, a practice known as retweet, and which is characterized by the abbreviation RT. In this case,
the goal is to get the message out (STRACHAN, 2009).
When a post that is on a specific topic, users can apply hashtags on their messages - phrases or
words that begin with the # symbol (STRACHAN, 2009). Likewise, its possible to display only
messages that on that specific topic.
When a word, a phrase or an expression are often mentioned simultaneously by a large number of
different users, they can be considered a trending topic (CHOWDHURY, 2009). Trending generally
occurs when efforts of a group of users with common interest are brought together for the sake of
some goal or when large and popular events are happening.
3. BEER MARKET IN BRAZIL
Currently, Brazil has a highly competitive beer market in which companies stand out as AmBev,
Brazil and Petrpolis Kirin Group. With a turnover of R$ 63 billion in 2012, the country is the third
largest brewer in the 26th international consumer ranking (ECONOMIC VALUE, 2013).
The market share of this market in Brazil is concentrated in AmBev breweries, Kirin Group and
Grupo Petrpolis, which together have 90% of the market. Another important information is the per
capita consumption in liters per year. In 2012, consumption reached 66.7 liters per capita (Chart 1).
-
Statistical Analysis of users who chatting about beer on Twitter
Rodrigo Otvio de Arajo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen
PMKT Revista Brasileira de Pesquisas de Marketing, Opinio e Mdia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), So Paulo, Brasil, V. 14, pp. 175-195, Abril, 2014 - www.revistapmkt.com.br 7
CHART 1
Brazilian consumption of beer (liters per capita)
Since 2008, the beer consumption in Brazil has presented a significant increase (Chart 2).
CHART 2
Market share of the Brazilian beer market
Due to the relevance of the beer market in the Brazilian economy and its continued growth we
decided to perform this study in which the monitoring was conducted following brands: Antarctica,
Baden Baden, Bohemia, Brahma, Budweiser, Eisenbahn, Itaipava, Nova Schin, Serramalte, Stella
Artois and Skol, besides the word cerveja (beer) and two of its regional variations: breja and cerva.
4. ANALYTICAL METHODOLOGY
The analytical methodology consists in the execution of three steps: the first refers to the analysis of
the general behavior and the profile of users on the use of Twitter to make posts about beer; the
second, the semantic analysis based on text mining techniques and multivariate statistics to identify
the most relevant topics of discussion within the brewing environment and, finally, evaluation of the
influence of users.
-
Statistical Analysis of users who chatting about beer on Twitter
Rodrigo Otvio de Arajo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen
PMKT Revista Brasileira de Pesquisas de Marketing, Opinio e Mdia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), So Paulo, Brasil, V. 14, pp. 175-195, Abril, 2014 - www.revistapmkt.com.br 8
5.1 GENERAL BEHAVIOR AND PROFILE OF USERS WHEN THE SUBJECT IS BEER
In the first analytical step, we sought to assess the main aggregated metrics present at work grouped
in time. The most important were the following:
Number of posts: measures the total number of posts made by time interval. Number of distinct users: measures the total number of distinct users who have had postings per
time interval.
Average Posts per user: calculated by dividing the number of posts by the number of distinct users.
Percentage of posts: proportion of posts classified in each of the existing categories.
The analysis of the total number of posts makes it possible to evaluate the total intensity of impacts
occurred during the observed period. Through the average of posts per user we can verify, in
general terms, the degree of intensity of disclosure of the matter considered among the users , so
that, the closer to 1 is the average, the lower the intensity. The percentage of posts evaluates the
weight of each existing category within a given categorical variable in the total of posts considered.
The evaluation of these metrics is aimed to understanding the characteristics of the general behavior
of the Twitter users about beer. The identification of the peaks was made by visualization of time
series of the number of posts. The same procedure must be performed to evaluate the time curve.
When we analyze the average posts per user we could evaluate changes on the behavior of
individual users. Often there are large variations in this metric on time intervals, due to specific
users who tend to perform more posts about the specific topics or events.
Twitter allows the use of specific metrics that denote the different types of behavior of its users,
among them you can highlight the penetration (proportion of posts with certain characteristic).
These characteristics were the following:
RTs: tweets which passed on a message that had already been posted by another user. @: directing messages to another person. Http: tweets possessing information contained on websites. Hashtag (#): group discussion on a specific topic. Other: tweets which do not contain any of the aforementioned characteristics.
5.2 INFLUENCE ANALYSIS
The analysis of influence is taken from a network of conversations in which two distinct cases of
influence were observed, the first case considers the retweets. The other case of influence includes
the tweets sent directly to other users.
In the first case, it was noted how influential a user is checking how many other users have made
retweet its post. Then, there was the influence of the number of directed conversations between
users.
-
Statistical Analysis of users who chatting about beer on Twitter
Rodrigo Otvio de Arajo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen
PMKT Revista Brasileira de Pesquisas de Marketing, Opinio e Mdia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), So Paulo, Brasil, V. 14, pp. 175-195, Abril, 2014 - www.revistapmkt.com.br 9
In this article, we considered the two cases and all sorts of connections between users. However, in
practical terms, the effect of retweets has always more impact, because it happens with more
frequency.
5.3 SEMANTIC ANALYSIS
Correlation analysis between topics was accomplished as the following process: first the lexical
analysis was performed. In a second step, the cleaning of stop-words (words without semantic
value) for later execution stemming algorithm (extraction of radicals) was taken. After these steps,
the BOW matrix was calculated. In this array, each term corresponding to a considered column and
each row to a document (tweet).
The measure was used to assess the tf-idf (term frequency inverse document frequency). Based on
the information matrix it was possible to obtain the measures most associated with the particular
word. This similarity was assessed by Pearson correlation.
The classification of posts on the theme was generated through the development of a heuristic based
on the selection of keywords defined by experts. The evaluation process of the words to be
considered is:
Step 1: definition of keywords that characterize certain theme. Step 2: development of algorithm to count the keywords defined in step 1. Step 3: Repeat steps 1 and 2 until the proportion of posts classified into any theme can be
considered satisfactory.
Generally, the minimum proportion of posts classified into themes for obtaining consistent results is
50 percent.
5. AVAILABLE INFORMATION
The extraction of information was done through a program developed by IBOPE DTM that
connects directly to the Twitter API.
Based on the distribution of market share in the Brazilian beer market, it was decided to study only
the brands of the most significant companies in the segment: AmBev, Kirin Group and Grupo
Petrpolis. Therefore, we carried out the monitoring of the following brands: Antarctica, Baden
Baden, Bohemia, Brahma, Budweiser, Eisenbahn, Itaipava, Nova Schin, Serramalte, Stella Artois
and Skol, besides the word cerveja (beer) and two of its regional variations: cerva and breja. The data refer to all messages posted during the study period containing the specified words.
After 25 days of monitoring, 438,507 tweets (posts) related to beer were obtained. However, the
study was done focusing on disclosure in Brazil, we only considered posts in Portuguese and work
was started with 291,043 posts (66.4%).
The monitoring period from 10/12/2013 to 01/03/2014 was chosen based on the assumption that the
holidays of the end of the year: Christmas and New Year influencing the number of posts on
Twitter about beer.
-
Statistical Analysis of users who chatting about beer on Twitter
Rodrigo Otvio de Arajo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen
PMKT Revista Brasileira de Pesquisas de Marketing, Opinio e Mdia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), So Paulo, Brasil, V. 14, pp. 175-195, Abril, 2014 - www.revistapmkt.com.br 10
6. DATA ANALYSIS
The analysis followed the same structure of the methodology presented. First the general
distribution of the posts was evaluated.
7.1 GENERAL BEHAVIOR AND PROFILE OF USERS WHEN THE SUBJECT IS BEER
The impacts caused by the holidays put considerable variation in the daily number of tweets posted.
In chart 3, we can see that the days that had incidence peaks posts were 24, 25 and December 31, or
Christmas Eve, Christmas and New Year's Eve, in which there was an increase of over 4000 posts
for the total period average.
CHART 3
Distribution of posts about beer in Twitter
As for the timing of posts (chart 4), there was a sharp increase from 10 o'clock in the morning,
which has stabilized at between 15 and 21 hours.
CHART 4
Number of posts per hour
-
Statistical Analysis of users who chatting about beer on Twitter
Rodrigo Otvio de Arajo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen
PMKT Revista Brasileira de Pesquisas de Marketing, Opinio e Mdia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), So Paulo, Brasil, V. 14, pp. 175-195, Abril, 2014 - www.revistapmkt.com.br 11
By analyzing the average of posts per user, it can be seen that there was a peak in the middle at 9
am (Chart 5). But this peak was not large enough to consider the behavior very different from other
hours of the day.
CHART 5 Average posts per user by Total
However, when Christmas and New Year holidays were detailed, it was seen that, at Christmas, the
highest average incidence of posts occurred between 8 and 9 o'clock, while in the New Year this
higher average incidence of posts occurred in the period as from 23 hours, as shown in Chart 6.
CHART 6
Average posts per user by Christmas and New Year
In chart 7, we note that 85.5 % of the posts pertaining to beer do not mention a specific brand.
However, considering 14.5% of the posts with quote of some brand, Skol is the one with higher
participation in Twitter with 4.3 %, followed by Brahma Itaipava with 3.5% and 2.2% of the posts.
-
Statistical Analysis of users who chatting about beer on Twitter
Rodrigo Otvio de Arajo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen
PMKT Revista Brasileira de Pesquisas de Marketing, Opinio e Mdia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), So Paulo, Brasil, V. 14, pp. 175-195, Abril, 2014 - www.revistapmkt.com.br 12
CHART 7
Percentage of posts of search words
As to individual metrics (Table 1), it was seen that the only brand that had featured a significant
number of posts with hashtag (#) was the Eisenbahn with 17.9% of the posts. Brands which
contained links to sites (http) were: Baden Baden with 41.4 %, with 36.5% Eisenbahn, Antarctica
with 32.4% and Stella Artois with 29.4% of the posts. In context messages directed (@), the
Serramalte brand stood out with 32.5%, followed by Nova Schin with 23.4% of the posts. Finally,
the transfer of messages previously posted (RT) were higher in the Budweiser brand in 26.9% and
21.1% of Antarctica in posts.
TABLE 1
Search Words Metrics
Focusing on users who made some comment about beer, it was possible to see that only one person
was responsible for 1416 posts of beer (Table 2), but the person has only 153 followers, in other
words, just his 153 followers directly viewed the information disclosed.
RT @ HTTP HASTAG OTHERS
215.229 74,0% 21,1% 13,2% 8,0% 3,8% 56,3%
19.537 6,7% 11,0% 16,9% 6,1% 3,5% 64,4%
14.112 4,8% 11,2% 15,8% 5,7% 3,8% 65,7%
ANTARCTICA 5.781 2,0% 21,1% 11,3% 32,4% 4,5% 33,7%
BOHEMIA 1.943 0,7% 6,2% 13,2% 17,4% 9,0% 59,1%
BRAHAMA 10.234 3,5% 15,7% 13,9% 16,1% 6,7% 52,0%
BUDWEISER 3.232 1,1% 26,9% 8,7% 12,1% 9,5% 50,5%
SERRAMALTE 114 0,0% 4,4% 32,5% 13,2% 12,3% 45,6%
SKOL 12.632 4,3% 15,4% 13,5% 12,9% 9,0% 55,7%
STELLA ARTOIS 574 0,2% 9,2% 7,0% 29,4% 6,3% 52,4%
BADEN BADEN 331 0,1% 3,0% 16,6% 41,4% 6,9% 38,1%
EISENBAHN 263 0,1% 6,1% 11,8% 36,5% 17,9% 44,9%
NOVA SCHIN 538 0,2% 15,2% 23,4% 9,3% 3,2% 50,2%
PETRPOLIS ITAIPAVA 6.523 2,2% 10,2% 14,2% 19,5% 3,7% 54,9%
291.043 100,0% 19,1% 13,6% 9,2% 4,2% 56,5%
% PENETRATION BY POST TYPE
CERVEJA (beer)
BREJA
CERVA
AMBEV
SEARCH WORD POSTS %
KIRIN
TOTAL
-
Statistical Analysis of users who chatting about beer on Twitter
Rodrigo Otvio de Arajo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen
PMKT Revista Brasileira de Pesquisas de Marketing, Opinio e Mdia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), So Paulo, Brasil, V. 14, pp. 175-195, Abril, 2014 - www.revistapmkt.com.br 13
TABLE 2
Top ten users with large number of posts
Following this line of reasoning, the singer Claudia Leitte sent only one post about beer, but this
information was seen by her 7,869,106 followers (Table 3).
TABLE 3
Top ten users with the highest number of followers on Twitter
To analyze the influence users, it was made a ranking of the 20 largest users by PageRank. The user
"frasesdebebada" has a PageRank of 0,007 and 365 connections (Table 4), it had the greatest
influence on the network. You can also see in Table 4, the presence of two users who talked about
beer in Twitter, which are the top 20 users with the largest number of followers (Table 3).
1 BEEINNDEX 1461 153
2 SKOL_ 443 107
3 DJ_RICARDOO 348 512
4 CERVEJA_DUFF 208 155
5 RENATORDM 188 514
6 ITAIPAVA_ 185 415
7 PREDRERO 162 28.107
8 MARCIO_SKOL 157 171
9 SERRALHERO 107 2.181
10 GORONAH 105 769
3364TOTAL
RANKUSERS
(TWITTER )POSTS
FOLLOWERS ON
TWITTER
1 CLAUDIALEITTE 1 7.869.106
2 DANILOGENTILI 1 5.324.329
3 SPIDERANDERSON 1 4.226.383
4 CLARORONALDO 1 3.625.623
5 PRETAGIL 2 3.450.693
6 PORTALR7 5 2.835.528
7 VEJA 2 2.825.215
8 BGAGLIASSO 1 2.735.376
9 G1 11 2.220.615
10 SIGNOSFODAS 1 1.432.674
26TOTAL
POSTS FOLLOWERS ON
TWITTERRANK
USERS
(TWITTER )
-
Statistical Analysis of users who chatting about beer on Twitter
Rodrigo Otvio de Arajo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen
PMKT Revista Brasileira de Pesquisas de Marketing, Opinio e Mdia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), So Paulo, Brasil, V. 14, pp. 175-195, Abril, 2014 - www.revistapmkt.com.br 14
TABLE 4
Ranking of 20 users with higher Page Rank
Among the influential people there is Anderson Silva, a famous MMA fighter, with more than 4
million followers and the site G1 (from GLOBO organizations) has only 2 million followers. In the
case of Claudia Leitte, she is the person with the most followers who chatted about beer, but her
posts were retweeted by people who do not have the habit of chatting about beer, and because of
this, their position in the ranking of influencers was not superior.
Anderson Silva posted a message to thanks his sponsor, a famous brand of American beer, before
his fateful struggle: ... equipando j pra sair... Aproveito para agradecer a todos os meus parceiros: Budweiser, Burger King... (). The ability to determine the real influence of distinguished users reinforces the importance of this type of analysis.
There is the presence of users who represent companies among the influential, even if the tweet is
not directed to certain person, its information resonate with various groups within the network.
In Figure 2, you can see the full network of users who talk about beer on Twitter. Figures 3, 4 and 5
show the networks of users: "frasesdebebada", "Irma_Zuleide" and "Spider Anderson",
respectively. The "frasesdebebada" user, being the most influential network in relation to the
number of connections, got further spread their messages recorded by the intensity of red color in
Figure 3.
RANK USERS COMPANY? DEGREE PAGERANK
1 FRASESDEBEBADA NO 365 0,0070
2 IRMA_ZULEIDE NO 51 0,0033
3 SPIDERANDERSON NO 40 0,0029
4 ASTROSLUMINOSOS YES 73 0,0024
5 SIGNOSFODAS YES 48 0,0021
6 FACTBR YES 160 0,0020
7 SOUVODKA NO 60 0,0018
8 SENTOAVARAEMVCS NO 32 0,0017
9 EDUTESTOSTERONA NO 98 0,0016
10 EVERTOUS NO 108 0,0016
11 PIADAMALIGNA NO 19 0,0015
12 G1 YES 89 0,0014
13 RELAXEI NO 96 0,0013
14 MATEUSALIANO NO 93 0,0012
15 LUCASPFVR NO 49 0,0011
16 FELIXPASSIVA NO 22 0,0010
17 B1TCH_MALVADA NO 15 0,0010
18 EUZOERO NO 24 0,0010
19 PREDRERO YES 25 0,0009
20 UMVINGADOR NO 12 0,0009
-
Statistical Analysis of users who chatting about beer on Twitter
Rodrigo Otvio de Arajo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen
PMKT Revista Brasileira de Pesquisas de Marketing, Opinio e Mdia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), So Paulo, Brasil, V. 14, pp. 175-195, Abril, 2014 - www.revistapmkt.com.br 15
FIGURE 2
Full network
FIGURE 3
Network of user frasesdebebada
-
Statistical Analysis of users who chatting about beer on Twitter
Rodrigo Otvio de Arajo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen
PMKT Revista Brasileira de Pesquisas de Marketing, Opinio e Mdia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), So Paulo, Brasil, V. 14, pp. 175-195, Abril, 2014 - www.revistapmkt.com.br 16
FIGURE 4
Network of user Irma_Zuleide
FIGURA 5
Network of user SpiderAnderson
-
Statistical Analysis of users who chatting about beer on Twitter
Rodrigo Otvio de Arajo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen
PMKT Revista Brasileira de Pesquisas de Marketing, Opinio e Mdia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), So Paulo, Brasil, V. 14, pp. 175-195, Abril, 2014 - www.revistapmkt.com.br 17
In semantic analysis, we can see that there is not one only word that associates strongly with more
than one brand. Therefore, in order to facilitate visualization, we selected only the ten words most
associated with the brands. The brands were chosen according to their volume of posts.
It was found that the Skol was responsible for 4.3% of the posts related to beer, Brahma with 3.5%
Itaipava with 2.2 % and 2.0% with Antarctica. the word most often associated with Skol was
redondo, with a Pearson correlation equal to 0.21 , followed by the words beats and vire with a correlation of 0.16 (Chart 8). These words are related to the marketing campaign of the brand.
A differential that Brahma had over other brands was the poster girl of the brand , Claudia Leitte,
appeared in the 6th position of the words most associated with correlation of 0.14 (Chart 8) .
In the case of Antarctica brand, it has a higher correlation related to a soft drink (guaran) than beer
specifically (Chart 8). It happens because the name of the brand is the same for both products.
CHART 8
Top 10 words with highest correlation with brands
A group of experts in semantics was responsible for the selection of keywords grouped into some
issues as major when it comes to beer. A total of 39.2% of posts with no classification was
obtained. These posts generally have information on beer, but without relevant content. However, it
can be seen in chart 9, the distribution of 60% of rated posts. As from this point, there was a
concentration of posts relating to the PLACE where the drink was consumed (19.8%), WITH
WHOM the person was drinking (13.8%) and specifically about the BRANDS (13.0%).
-
Statistical Analysis of users who chatting about beer on Twitter
Rodrigo Otvio de Arajo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen
PMKT Revista Brasileira de Pesquisas de Marketing, Opinio e Mdia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), So Paulo, Brasil, V. 14, pp. 175-195, Abril, 2014 - www.revistapmkt.com.br 18
CHART 9
Proportion of posts by theme
When analyzing the most discussed themes in the brands studied (Chart 10), it was seen that the
beers produced by AmBev, the Stella Artois brand has 35 % of posts on the theme
COMMEMORATIVE DATES (Chart 10), unlike other brands of the same company with posts on
the subject PLACE. The beers of Kirin Group showed up into three themes: Baden Baden with 44%
of posts in COMMEMORATIVE DATES, the Eisenbahn with 32 % in the theme PLACE, the
Nova Schin with 23 % of posts in WITH WHOM theme. The beer Itaipava, Grupo Petrpolis, got
31 % of the posts in PLACE against 25 % in the theme WHEN.
CHART 10
Percentage of posts by theme by beer brands
-
Statistical Analysis of users who chatting about beer on Twitter
Rodrigo Otvio de Arajo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen
PMKT Revista Brasileira de Pesquisas de Marketing, Opinio e Mdia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), So Paulo, Brasil, V. 14, pp. 175-195, Abril, 2014 - www.revistapmkt.com.br 19
7. CONCLUSION
It was noted in this article that holidays have a great influence on the number of posts related to
beer, reaching increases in excess of 35% on the average number of daily posts.
During the day, in general, there is an increase of posts in afternoon and evening. Schedules with
greater intensity postings were between 23 hours and 02 hours.
The social network analysis identified efficiently influential users by the quantity and quality of
connections during the period. Several influencers were identified, among them stand out Anderson
Silva who sent a tweet thanking his sponsors before the fight, and G1, a communications company.
Semantic analysis of posts to identify issues related to beer demonstrated that there is a
concentration of posts related to the place of consummation of the drink, consumed with WHOM
and WHICH were the brands consumed.
In Kirin Group each brand had a higher incidence in different themes: Baden Baden had large
numbers of postings associated with COMMEMORATIVE DATES, the Eisenbahn posts associated
with the PLACE and the Nova Schin posts associated with the theme WHOM. Itaipava, Grupo
Petrpolis, had a higher incidence in posts with the theme PLACE.
8. LIMITATIONS AND FUTURE WORK
There was no sudden break in the time series of the total number of posts. It is understood that there
was no problem of disconnection with the Twitter API, so we can rely on the consistency and
quality of the information used in this study.
In future studies, it is a useful idea to perform the analysis with larger historical information in order
to understand if there is a seasonality behavior on the theme.
Another hypothesis under study is the evaluation of the difference between the hours of
consumption and posting.
9. REFERENCES
ALEXA. Disponvel em: . Acessado em: 6 jan. 2014.
BAEZA-YATES, R.; RIBEIRO NETO, B. Modern information retrieval. Addison-Wesley, 1999.
BARION, E. C. N.; LAGO, D. Minerao de textos. Revista de Cincias Exatas e Tecnologia,
2008.
BAVELAS, Alex. A mathematical model for group structure. Applied Anthropology 7, 1948.
CERVBRASIL. A Cerveja Contribuio econmica, s. d. Disponvel em: . Acessado em: 6 jan. 2014.
-
Statistical Analysis of users who chatting about beer on Twitter
Rodrigo Otvio de Arajo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen
PMKT Revista Brasileira de Pesquisas de Marketing, Opinio e Mdia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), So Paulo, Brasil, V. 14, pp. 175-195, Abril, 2014 - www.revistapmkt.com.br 20
CERVEJAS DO MUNDO. Histria da cerveja, 2009. Disponvel em:
. Acessado em: 6 jan. 2014.
CHOWDHURY, A. Top Twitter Trends of 2009. Twitter Blog, 15 dez. 2009. Disponvel em:
. Acessado em: 3 fev. 2014.
CORRA, A. C. G. Recuperao de documentos baseada em Informao Semntica no Ambiente
AMMO. UFSCAR, 2003.
COUTINHO, C. A. T.; QUINTELLA, C. A. S.; PANZANI, M. M. Histria da Cerveja no Brasil.
Portal So Francisco, s. d. Disponvel em: . Acessado em: 6 jan. 2014.
HUANG, A. Similarity Measures for Text Document Clustering. Department of Computer Science,
The University of Waikato, 2008.
KUNEGIS, J.; LOMMATZSCH, A.; BAUCKHAGE, C. The Slashdot zoo: mining a social network
with negative edges. Track: Social Networks and Web 2.0 / Session: Interactions in Social
Communities, 2009.
LIU, Bing. Web Data Mining: exploring hyperlinks, contents, and usage data. Springer, 2011.
LUNDEN, I. Analyst: Twitter Passed 500M Users In June 2012, 140M Of Them In US; Jakarta
Biggest Tweeting City. TechCrunch, 30 jul. 2012. Disponvel em: . Acessado em: 3 fev. 2014.
MANNING, C. D.; RAGHAVAN, P.; SCHUTZE, H. Scoring, term weighting, and the vector
space model: introduction to information retrieval. Stanford, 2008.
MELO, I. D. et al., Anlise de Redes Sociais. Universidade Federal da Paraba, 2013.
MOURA, M. F. Proposta de utilizao de minerao de textos para seleo, classificao e
qualificao de documentos. Campinas: Embrapa Informtica Agropecuria, 2004.
NCLEO EDUCACIONAL DE BROGLIE. Produo e consumo de cerveja no Brasil e no mundo,
2013. Disponvel em: . Acessado em: 6 jan. 2014.
PAGE, L. et al. The PageRank citation ranking: bringing order to the web. Technical report,
Stanford Digital Library Technologies Project, 1998.
QUEIROZ, D. F. Anlise estrutural do setor cervejeiro. FAEC Departamento de Economia, 2010. Disponvel em: . Acessado em: 6 jan. 2014.
SALTON, G.; MCGILL, M. J. Introduction to modern information retrieval. Computer Science
Series, USA: McGraw-Hill, 1983.
-
Statistical Analysis of users who chatting about beer on Twitter
Rodrigo Otvio de Arajo Ribeiro/ Tarsila Gomes Bello Tavares/ Daniel de Oliveira Cohen
PMKT Revista Brasileira de Pesquisas de Marketing, Opinio e Mdia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), So Paulo, Brasil, V. 14, pp. 175-195, Abril, 2014 - www.revistapmkt.com.br 21
SILVA, Anderson. (SpiderAnderson) tweets. Disponvel em: .
Acessado em: 15 abr. 2014.
SANTOS, M. A. M. R. Extraindo regras de associao a partir de textos. PUC, 2002.
SINDICATO NACIONAL DA INDSTRIA DA CERVEJA SINDICERV. Mercado, s. d. Disponvel em: . Acessado em: 6 jan. 2014.
SIVIC, J. Efcient visual search of videos cast as text retrieval. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, v. 31, n. 4, IEEE, 2009.
STRACHAN, D. Twitter: how to set up your account. Telegraph, 19 fev. 2009. Disponvel em:
. Acessado
em: 3 fev. 2014.
TWITTER, Finding your Twitter short or long code. Disponvel em:
. Acessado em: 3
fev. 2014.
VALOR ECONMICO. Ritmo de produo de cerveja cai em 2013. 2013. Disponvel em:
.
Acessado em: 6 jan. 2014.
WASSERMAN, Stanley; FAUST, Katherine. Social network analysis: methods and applications.
Cambridge: Cambridge University Press, 1994.