pure.au.dkpure.au.dk/...on...Kasper_Br_db_k_Christensen_Elektron… · Web viewby taking the...
Transcript of pure.au.dkpure.au.dk/...on...Kasper_Br_db_k_Christensen_Elektron… · Web viewby taking the...
STAKEHOLDER INTELLIGENCE ON SOCIAL MEDIAby
Kasper Brødbæk Christensen
Advisor: Lars Haahr
Cand. ITIT, Kommunikation og Organisation
Aarhus School of Business01/08-2012
Attached to the thesis is a dataset, which can be downloaded through the following link:https://rapidshare.com/files/3479799533/Data.zip
Table of contents0. Abstract...................................................................................................................................1
1. Introduction.............................................................................................................................21.1. Problem Statement..................................................................................................................................................................5
2. Problem Area...........................................................................................................................52.1. Discussion: Community or Influence?.............................................................................................................................9
3. More ideas lead to a better end-product: A Collective Intelligence perspective......................123.1. Discussion: Recapitulation................................................................................................................................................17
4. Method and Discussion..........................................................................................................20
5. Stakeholder Engagement as a Collective Intelligence system..................................................22
6. Text Mining to extract information from social media............................................................266.1. Text Mining Basics................................................................................................................................................................286.2. Preparing data for Text Mining.......................................................................................................................................316.3. Categorization of documents............................................................................................................................................336.4. Clustering of documents.....................................................................................................................................................346.5. Text Mining for stakeholder opinion............................................................................................................................36
7. Enter Twitter, “Instantly connect to what’s most important to you.”.....................................377.1. Twitter as a Collective Intelligence System...............................................................................................................38
8. Stakeholder Intelligence on Twitter........................................................................................43
9. The case of a Communications Manager at Novo Nordisk......................................................479.1. CSR-Communication on Twitter.....................................................................................................................................499.2. Establishing a business case (domain)........................................................................................................................519.3. Selection of stakeholders (balance diversity and expertise).............................................................................529.4. Initial analysis of information quality..........................................................................................................................54
10. Evaluation of results and model...........................................................................................63
11. Conclusion...........................................................................................................................70
12. Bibliography.........................................................................................................................7112.1. Articles (Order of Appearance)....................................................................................................................................7112.2. Books (Order of Appearance)........................................................................................................................................7212.3. Links (Order of Appearance).........................................................................................................................................7312.4. Programs Used (Order of Appearance).....................................................................................................................75
0. AbstractIn this thesis we have through a review of theoretical perspectives analyzed the possibilities
of a model for stakeholder engagement on social media. We took our outset in stakeholder
theory and brought into discussion two logics of engagement, the logic of influence and
community respectively. We found further inspiration for the proposed model in collective
intelligence, business intelligence and text mining theory, which we discussed in relation to
the two logics of engagement. Our analysis resulted in a model, where stakeholder
engagements on social media could be conceptualized as the establishment of a collective
intelligence-system. With this we found support for the argument that stakeholder
engagement, as a discipline which seeks to listen to and learn from stakeholders, can be taken
to social media. When taking this view communications from stakeholders on social media
becomes information that may aid a company in daily decision-making processes. In order to
obtain this information we look to the text mining discipline and here we found, given the
nature of text mining and social media respectively, that it may be necessary to narrow down
the purpose before applying the model. We find that properties of Twitter as a social
technology may support such information-gathering activities especially well. Our case relates
to the position of a Communications Manager at Novo Nordisk and the dataset applies to his
position alone. Upon applying the model on the case of his work and the data we found a
collective of stakeholders communicating largely about the same issues. However, we found
only indications of such activity and were not able to derive from our dataset information of a
quality with which we could qualify decisions. We suggest that this is attributed to the nature
of the dataset and the scope of the text mining capability in this thesis more than a failure of
the proposed model.
We end the thesis with a discussion of the inherent challenges related to social media and
therefore also the model. We find that when engaging in social media to find information
there are challenges relating to the nature of online identities, as well as the information
disseminated by those online identities. Furthermore, we discuss the consequences of
gathering information in such a way in relation to stakeholder engagement. We end by
concluding that there are challenges left to overcome but that the model may yet be
applicable.
1
1. IntroductionIn 1979 Michael E. Porter presented a framework for analysis of factors affecting the
capabilities of a company’s strategic development. Porter’s Five Forces have become a
mainstay of business theory, and are thought by Porter to arise from the inherent
competitiveness in a company’s industrial environment. (Porter, 1979) The relevance of such
an analysis of forces impacting a company and its strategic development persists to this day,
and is perhaps more salient than ever as we have unequivocally entered into the millennium
of globalization. Whether Porter had envisioned the intensely competitive nature of the
twenty-first century is difficult to assess but while the concept of analyzing the forces of your
environment remains relevant today, the question is if industry competition can still suffice to
describe what affects companies.
The threat of external forces impacting on a company’s activities is now more than ever a
reality. Globalization has brought with it an intensified threat of new entrants and substitute
products (Porter, p. 141, 1979), but perhaps the most significant change to the external
influences has come about with the increased focus on the ethical and moral responsibilities
of companies. Corporate Social Responsibility, while not exactly a new concept, surely within
recent years has seen an increased focus in both the minds of political leaders and common
people.
R. Edward Freeman (1984), the Father of stakeholder theory, included the concept as a way to
describe that companies carry responsibilities beyond that of accountability to shareholders.
(Freeman, p. 38-40, 1984) As mentioned, today there is a much-increased focus on the
responsibilities of companies and as such especially Porters “Bargaining Power of Customers”
has increased dramatically. Perhaps one might rightly suggest that today a more fitting
description of this force would be the “bargaining power of stakeholders”, encompassing any
and all who are affected by or take interest in the activities of a given company.
“Corporate social responsibility is often looked at as an "add on" to "business as usual," and
the phrase often heard from executives is "corporate social responsibility is fine, if you can
afford it…Given the turbulence that business organizations are currently facing and the
very nature of the external environment, as consisting of economic and socio-political
2
forces, there is a need for conceptual schemata which analyze these forces in an integrative
fashion." (Freeman, p. 40, 1984)
It should be no surprise that a company’s activity generally tends to lean towards making a
profit, and by that logic it seems somewhat rational to narrow our attention on e.g. opinions of
shareholders. This concept of broadening the spectrum of a company’s responsibilities go
back decades but today the attention and perceived importance of such dealings have no
doubt increased dramatically. So what changed? The contention of many authors (Li &
Bernoff, 2008; Benett, 2003; Castelló et al., Forthcoming; Scherer & Palazzo, 2011) is that the
modern era of digitalization has brought about changes in the power relations between
stakeholders and companies. The proliferation of digital communication and information
enabled especially by the coming of Web 2.0 technologies has been seen as a threat to
companies across the globe. This is not the prediction of some fortune cookie; it is the reality
that surrounds us. Stakeholders today are able to gather information, analyze it, form an
opinion (sometimes well-founded, sometimes not) and disseminate it in a digital space, where
potentially millions of stakeholders sit in wait to consume it. Some present such opinions in
the form of an opinionated blog, some as a status update on either Twitter or Facebook and
some as an informative video. Facebook now has over 900 million users, Twitter over 500
million users and this is exactly what has changed, the proliferation of social media use. In
2012 communication about anything and everything is running rampant, and where
companies in the past may have had a say in what the newspaper printed, management of
such content is today at best an illusory concept.
Perhaps this serves as an explanation of why companies slowly but surely have adopted the
use of social media. In November 2011 the McKinsey Global Institute carried out a survey
asking 4.261 global executives about their adoption of social technologies and the perceived
benefits gained from the adoption. They showed that of the corporations involved 72% have
adopted at least one social technology into their efforts. However, only 1.949 of the
respondents reported at least one measurable benefit, which may speak to the fact that
obtaining benefits from efforts on social media is a difficult task. (Bughin, Byers & Chui, 2011)
3
The motives for companies engaging in social media are no doubt many. Some might be
engaging to manage the threat of having no presence, and thereby no chance of exerting any
control whatsoever, while others may be engaging to exploit the opportunities presented by
the technology. In this thesis we treat the developments within the last ten years partially as a
threat and partially as an opportunity. Part threat because the brand nature of companies is
sensitive to information that give them a bad name and as we have outlined this is now
difficult to control. Part opportunity because we believe that the correct approach to engaging
in social media provides unprecedented potential for connecting with stakeholders in a way
that may strengthen relationships. This is the crux of the discussions and perspectives
presented in this thesis. Engaging in social media with the express purpose of connecting with
what stakeholders, listening to what they have to say, and from that derive which areas a
company may focus on to increase value.
We take our starting point in contemporary discussions within the field of stakeholder
engagement, highlighting two competing logics, the logic of influence and the logic of
community. From this we derive the concepts we believe may fit when the prospect is to take
stakeholder engagement to social media. We analyze the practical applicability of these
perspectives and find that social media is of such nature that another supporting perspective
is needed. This takes us into the field of theory related to collective intelligence, which might
aid us in deriving value from social media by conceptualizing it as a place where ideas and
solutions are generated each and every day. We couple these perspectives with those of
stakeholder theory in order to find support for a model that in practice may leverage the use
of social media as a source for information. To harvest such information we include
methodology from the increasingly recognized area of text mining. In order to make an
attempt at applying the model we derive we focus in on a single social technology, namely
Twitter, and case material provided to us by a Communications Manager at the department
for Corporate Sustainability at Novo Nordisk. In the case we analyze the capability of our
model in relation to the position of this manager by applying text mining methods on 7763
tweets from 58 accounts on Twitter. We conclude the thesis by evaluation of our results and
our model in order to assess whether the unified perspectives from theory may be brought
into business practice. We summarize the project in our thesis in the following statement.
4
1.1. Problem Statement
This thesis should be seen as an attempt to unify theoretical aspirations and capabilities of
stakeholder theory and collective intelligence respectively in order to conceptualize a model
for bringing stakeholder engagement to social media.
2. Problem AreaWhile we touch on and allow inspiration to flow from many fields of study throughout the
thesis (e.g. collective intelligence, social media, business intelligence, text mining and
stakeholder theory), relations between people and corporations seem inseparable from the
field of stakeholder engagement. As such moving toward the betterment of corporate efforts
within this business discipline becomes our primary focus and the locus of our analysis.
The rising attendance on social media seems a self-perpetuating effect as people tell their
friends, they tell their friends and so on. As demonstrated in a five-wave study published in
2011 the last wave where 37.600 people globally were polled showed considerable
attendance on social media. 61% answered that within six months they managed a profile on
a social media site, while 64% read blogs and perhaps just as exciting 75% answered that they
visited company/brand websites. In sharp contrast to this, the first wave of the study four
years earlier only 27% of respondents had created a profile on a social media site. (Hutton &
Fosdick, 2011) This is interesting because provides some proof of the proliferation of social
media use among stakeholders, while at the same time establishing that companies are within
the realm of stakeholder interests online.
As such it might even seem a natural development that stakeholder engagement is moving
toward initiatives on social media. However, as with any initiative the road toward
implementation is paved with challenges. As presented by Castelló, Etter and Morsing
(Forthcoming) in a study of a company’s assessment of the possibilities of taking stakeholder
engagement to social media, two competing logics of engagement are highlighted. In this they
focus heavily on the managerial and institutional challenges of communication on social
media as a part of stakeholder engagement. (Castelló et al., p. 1, Fortcoming) In the context of
this thesis we derive from this article perspectives in these two logics and treat them as a
foundation for analysis and discussions, which may aid us in highlighting the difficulty as well
5
as the perceived value in moving from a traditional view of engagements to one where
engagements happen on social media. This will help us assess what the challenges are from a
corporate perspective, and the discussion has served as great inspiration for our perspective.
The following descriptions of the logics as they present them are a derivation of their study of
a single company and we will seek to further qualify that these fit the contemporary
perceptions within the field.
The logic of influence (Castelló et al., p. 15-17, Forthcoming)
- Influence: The company seeks to influence stakeholder opinion through their
engagements with the purpose of preventing conflict, reducing risks and gaining
knowledge from key stakeholders. E.g. a company wanting to erect a wind turbine to
decrease their energy consumption costs may meet resistance from locals in the area
where the turbine is to be. They may then attempt at engaging in dialogue with the
locals with the purpose of reaching a compromise or solution agreeable to both parties.
- Firm centered: The company decides what is and what is not a good topic for
engagement, and the selection of who to include. Not that stakeholders have no say in
this but different topics are analyzed and prioritized in accordance with internal
perceptions of importance.
- Contract based: The engagements are organized around hierarchical processes and
rules. What this means is that the engagements are subject to internal regulation of
employees, and while this is somewhat of a broad description, it stands to reason that
some companies regulate at least part of what their employees can and cannot discuss
in public.
- Face-to-face: The ideal and largely preferred method for engagement is described as
face-to-face. The reasoning behind this is not explicitly defined but a reasonable
suggestion seems to be as Ikujiro Nonaka describes it, that tacit knowledge may only
be made explicit through a process of externalization.1
1 In his article, ”Organizational Knowledge Creation” of 1997 the process of externalization is as described by Nonaka the process by which one person transfers her tacit knowledge to another person. He stresses dialogue through face-to-face interaction as a means to this end.
6
We include into our considerations the best practice descriptions delivered by the
organization AccountAbility, “Since 1995, AccountAbility has been focused on “mainstreaming”
sustainability into business thinking and practice. Our widely-used AA1000 standards, leading-
edge research, and strategic advisory services help organisations become more accountable,
responsible, and sustainable.”2. What is interesting about these standards is that many of the
descriptions correlate directly to the concepts of the logic of influence, while at the same time
presenting descriptions that seem to support arguments for the logic of community.
(AccountAbility, 2011) We start by correlating AccountAbility standards to the logic of
influence and then do the same when we have presented the concepts of the logic of
community.
“Stakeholder engagement then is the process used by an organisation to engage relevant
stakeholders for a clear purpose to achieve accepted outcomes.” (AccountAbility, p. 6,
2011)
The above citation alone may lead one to think that they argue for the logic of influence. It is
about including relevant stakeholders with a clear purpose in mind to achieve only accepted
outcomes. It seems plausible to suggest that this reiterates the statement that engagements
are firm-centered as well as contract-based. While it is stressed that the owners of the
engagement must include stakeholders in the definition of the purpose they go on to describe
the importance of carefully considering who needs to be involved. (AccountAbility, p. 22-24,
2011) Describing this as a paradox may be over the top but it seems easily imaginable that if
the company decides on whom to include, they are at least in part also in control of the
purpose and the outcome.
The problem in relation to integrating social media into the engagements may, with these
descriptions in mind, be as Dellacoras (2003) describes it that the volatility and
unpredictability of the communication makes it very difficult to assess the outcome of the
engagement. This seems a given due to the sheer volume of communication happening daily.
A company might then have a very clear purpose when engaging on social media but how
2 Taken from www.AccountAbility.org, the official website of the organisation.
7
would one predict the outcome when anyone can join the conversation? We return to this
discussion in section 2.1.
The logic of community
(Castelló et al., p. 17-18, Forthcoming)
- Collective interest: The company seeks to engage in dialogue on social media,
encouraging a broader spectrum of stakeholders to participate in conversations and
thereby enhancing inclusivity.
- Topic centered: As inclusivity and dialogue increases and more stakeholders join the
conversation controlling the topic of each engagement becomes an arduous task. As
such the logic of community represents an engagement logic, which allows the topic
for discussion to be spawned by stakeholder interest and not company prioritization.
- Participation: Not only does it encourage increased participation among stakeholders
but also among employees. They argue that means should be established for each
member of a company to participate to increase visibility.
- Network: When including more and more stakeholders into engagement efforts
recognizing that this enables multiple conversations across space and time boundaries.
Most interesting to us here will be that it seems the perception of an engagement now focuses
on including as many stakeholders as possible, and letting them decide what is an interesting
topic of discussion. Li and Bernoff (2008) speak to the same issues and although the angle is
different the message is seemingly the same and quite clear:
“…So work on both fronts in your company – muster up the humility to listen and tap into
the skill to take what you’ve heard and make improvements. That’s embracing the
groundswell, and it pays by shortening the distance between you and your next successful
innovation.” (Li & Bernoff, chap. 10, 2008, n.p.)
To clarify, the groundswell is a broad definition encompassing any and all members present
on the sum total of all social technologies on the web. (Li & Bernoff, chap. 1, 2008, n.p.) They
continuously stress the fact that it is the stakeholders in the groundswell, and not the
companies, who are in control and encourage companies to alleviate this threat through e.g.
8
acts of listening in on and talking to the groundswell. (Li & Bernoff, chap. 5-6, 2008, n.p.)
These concepts are fairly self-explanatory and we do not wish to dwell on these. But if the
purpose, as it seems to be, is to see stakeholders and communications on social media as
valuable resources that disseminate information usable in both innovation and relationship-
building (Li & Bernoff, chap. 4, 2008, n.p.), then the view seems to correlate strongly to the
logic of community. As mentioned the angle is different, their focus lies in their contention
that if you have a brand that you wish to maintain or develop, you’re under threat from the
groundswell.
“If you have a brand, you’re under threat. Your customers have always had an idea about
what your brand signifies, an idea that may vary from the image you are projecting. Now
they’re talking to each other about that idea. They are redefining for themselves the brand
you spent millions of dollars, or even hundreds of millions of dollars, creating.” (Li &
Bernoff, chap. 1, 2008, n.p.)
If this is indeed the case, which a myriad of examples in their book demonstrate and
stakeholder engagement is at least in part about building brand trust and value, then it may
suggest that engagements on social media in todays world are an absolute necessity. In the
section to come we discuss the elements of the two perspectives with the purpose of
uncovering why, in the case of engagements on social media, the logic of influence is not
suitable and what challenges remain in regards to the application of the logic of community in
the same respect.
2.1. Discussion: Community or Influence?
Taking into consideration the descriptions presented in the previous section it should be clear
that there are contradictions between the two logics. First, it seems reasonable to suggest that
there is a considerable shift in perspective when going from one where the purpose of the
engagement is to influence stakeholder opinion, to one where the essential question is: “What
is the opinion of the stakeholder?”. Second, another considerable shift occurs when the issue
to be addressed by an engagement removes itself from company control and ends in
stakeholder control. If we process these shifts in an idealized way one might conclude that a
company uncritically must listen to opinions, and move to engage itself in the issues
9
expressed by those opinions with little regard for relevance. This is most probably an
exaggeration of the intentions behind this perspective. However, if we take engagements to
social media and encourage anyone to join and speak to issues important to them, all the
while knowing that the technology is molded in such a way that we cannot assess who they
are and what they stand for (Dellacoras, p. 1410, 2003), then dissecting value of opinion
relative to the company seems very difficult.
Todays companies are without a doubt highly professionalized, competition demands it.
Information drives decisions, and as such good decisions derive from good sources of
information. This may demonstrate, as the logic of influence seems to argue, that carefully
assessing which stakeholders to include is a rational choice. E.g. a patient suffering from high
blood pressure may be of very little value in evaluating the relative efficiency of a medicines
chemical synthesis, conversely she may be able to deliver valuable insight into the effects that
synthesis has on a human body. As such one might rightly suggest that she would be a
valuable resource in an engagement where the topic is the one, but not the other.
This leaves us in somewhat of a dilemma at least if the project of stakeholder engagement
remains as described by AccountAbility:
“They then discover that it (stakeholder engagement) can contribute just as much to
strategic as to operational improvement. Engagement can be a tremendous source of
innovation and new partnerships. Leading companies are discovering that a growing
percentage of innovation is coming from outside the organisation and not from within.
They realise that stakeholders are a resource and not simply an irritant to be ‘managed’.”
(AccountAbility, p. 8, 2011)
It might seem now that the influence logic prevails in its considerations and as such might be
the best for social media as well. However, social media does not facilitate face-to-face
interaction and it seems highly unlikely that they could ever be contract-based if we cannot
predict who joins the discussion. We might have a stakeholder of malicious intent joining the
discussion and purposefully providing false or misleading information, which may lead to an
10
unintended outcome. It seems that from this discussion we might rightly ask the question:
“What is the purpose of taking stakeholder engagement to social media?”
If it is purely supposed to be about communicating with more stakeholders, and this is seen as
a good in itself, then it seems we may allow ourselves to be less critical of who joins the
conversation and who does not. But what sort of value does this bring into the company? How
do you measure the effects of a perceived positive interaction on social media? As a report by
Hypatia Research (2011) suggests, these questions remain under scrutiny by professionals
within the companies. They report that challenges to investing in social media among others
are lack of standard ROI (return on investment) metrics, which is under heavy debate3, and
lack of business case goals. (Hypatia Research LLC, p. 4-5, 2011) This lust for predictability is
certainly understandable but the question remains if it is attainable on social media. Lastly, if
stakeholder engagement is about solving issues in cooperation then simple judgments of
efficiency should lead one to the thinking that social media is not suited for stakeholder
engagement. However, in such a view it seems that stakeholder engagement is at least in part
about treating stakeholders as a source of knowledge and if this is so then we should treat it
as such. Doing this we believe may demand a different, although conceptually similar,
perspective.
In the coming section we lay out a theoretical framework which we believe provides strong
support for the argument that communications on social media, when approached correctly,
may deliver value and help guide decisions within the company. As our area of interest is the
stakeholder engagement discipline, collective intelligence theory might be just the perspective
we need to discover the information-potential of communications on social media.
3. More ideas lead to a better end-product: A Collective Intelligence
perspectiveOne might find the prospect of basing decisions on information gathered from
communications on social media a strange one, no less as we have just stated that we believe
3 Debate over whether to use financial ROI metrics or non-monetary metrics. (Hypatia Research LLC, p. 4-5, 2011)
11
companies to be highly professionalized. How could we conceive that it might at all be
valuable to listen in on social media when the company most likely already has its internal
experts, or information-workers, tackling whatever information-demand might arise? Can it
be valuable at all?
Generally speaking, collective intelligence is about people in cooperation or perhaps even in
contest to solve a problem or come up with an idea, which lines up fairly well with general
presumptions of the benefits of stakeholder engagement. It promotes a perspective where if
we wish to solve a problem or come up with an idea, we may harness the power of a given
collective to find better solutions and better ideas. Such descriptions might provide us with
some insight into why stakeholder engagement is generally a good idea, but it also outlines an
incipient foundation for channeling engagements into social media. We will spend some time
assessing the theoretical notions and then through discussion address the perceived
compatibility with stakeholder engagement as a business discipline.
The concept of Collective Intelligence stems from years of research in different fields of study
e.g. biology and computer science (Leimeister, p. 245, 2010). It seems to describe a reality in
which inclusivity as a bearing principle promoted by the community logic might actually be a
reasonable one.
“A group of average people can – under certain conditions – achieve better results than any
individual of the group. This seems to hold even if one member of the group is more
intelligent than the rest of the group.” (Leimeister, p. 245, 2010)
First off, the term collective describes a group of individuals who may be but are not
necessarily of the same opinion on a given subject. Leimeister (2010) describes that this gives
rise to the possibility of revealing different opinions leading to the establishment of a more
nuanced perspective on a given task, which will then lead to better solutions. The term
intelligence refers to the ability to use ones existing knowledge to learn, understand and adapt
to an environment so as to be able to handle a wide variety of situations. (Leimeister, p. 245,
2010) In other words an intelligent collective in this view may describe many different
compositions of people put together deliberately or by coincidence to solve a task which has
12
their interest. Such a composition would then be referred to as a collective intelligence system
(CI-system).
At the MIT Center for Collective Intelligence a study in 2009 sought to uncover the building
blocks of CI-systems from examples of what they called Web enabled collective intelligence.
(Malone, Laubacher & Dellacoras, p. 2, 2009) They asked the following questions in the
process (Malone et al., p. 3, 2009):
- Who is performing the task? Why are they doing it?
- What is being accomplished? How is it being done?
From these they found that they were able to describe such a system through a set of genes,
which enabled them to answer the above questions. Of course this with the purpose of being
able to recreate such a system in any setting fitting for such activity and outline what they
believe to be happening in such systems. (Malone et al., p. 3, 2009)
Our point of departure is social media and as such we look for descriptions that fit a platform,
where no direct hierarchical structures exist. That is to say if we imagine social media as a
place where daily ideas are generated and problems are solved, then it is an important
distinction that this is an autonomous effect and not one which arises because someone is in
control saying, “Solve this problem” or “Come up with an idea for this”. (Li & Bernoff, chap. 1,
2008, n.p.) This of course is not a claim that no such structures do exist on social media at all,
they might, but in its essence it is an open environment where users are free to focus on what
they want. With this in mind, if we are to see social media as a CI-system, then the MIT study
describes this as the crowd gene, which serves as an answer to who is performing the task. As
they say, in the crowd gene, activities can be undertaken by anyone in a large group who
chooses to do so, without being assigned by someone in a position of authority. (Malone et al., p.
4, 2009)
Now one might wonder what would motivate people to join in on such activities. People
would probably be quick to say that if asked to spend time solving a problem, some sort of
compensation would be required. However, the study found that the motivation for engaging
in such a system might be more than just financial gain. They found three genes, which
describe motivations for engagement: money, love and glory. (Malone et al., p. 5, 2009) Money
13
is as is and the fact that this motivates people should not be a surprise to anyone. It may very
well be less obvious how the two remaining genes can describe motivation. Love as they
define it is either the intrinsic enjoyment of an activity; the joy of socializing with others and
the feeling of contributing to a cause and as such people who engage with such motivations
may not necessarily require financial compensation. (Malone et al., p. 5, 2009) E.g. a person
might work as a teacher but in her free time be totally absorbed by gourmet cooking. It seems
very reasonable to suggest that such a person would engage in activities, be it online
discussions or cooking for friends, without a prospect of financial gain. Glory as a motivating
gene is also an important one. It describes that people will engage in activities if they believe
they may be recognized for doing so. (Malone et al., p. 5, 2009) This also correlates to the
previous example. She might cook for her friends for the love of cooking but might also do so
because she knows they will think highly of her cooking skills if she does it well.
So we seem to have established that CI-systems might arise anywhere and with the bearing
foundation being a range of different motivations. We then move on to the question of what is
being done in such systems. Researchers at MIT found two genes pertaining to this, what they
call the create and the decide gene. The create gene describes an actor in the system who
brings something new to the table, and the decide gene describes an actor who evaluates and
selects alternatives. (Malone et al., p. 5-6, 2009) Many examples of such activities already exist
e.g. when Google went looking for a new logo for their Google Chrome page they asked users
of the community to create their suggestion of what this logo should look like with the
message “Let your creative juices flow, #chromies!”.4 One could conceive of such a request as
the formation of a CI-system, where perhaps a graphical designer who finds great enjoyment
in such an activity might submit his proposal not for financial gain but for the recognition of
having her logo on a page owned by Google. In this the create gene would be activated by the
designer and the decide gene by those evaluating which design is the best fit for the page.
The answer to the last question of how it is being done relates to the create and decide genes.
They describe two ways in which the create gene is activated, either through collection or
collaboration. The way they separate themselves from each other is the way in which
contributions are generated. Collection describes a process where an actor creates an item for
4 https://plus.google.com/u/0/100585555255542998765/posts/h7LNZ8zUAdF
14
contribution independently from other items. They found that a property of this process is
that contributions may be viewed as being in contest with each other. Collaboration then
describes a process where actors work together to create an item for contribution. (Malone et
al., p. 6-7, 2009) Our previous example from Google could describe creation through the
collection gene, where users submit their proposals in a contest for recognition. And the
activities leading to the conception of different articles on Wikipedia, where users work on the
same articles to ensure their quality would be an example of the collaboration gene.
“But also for companies there are various new potentials for improving their creativity and
innovation capabilities. The challenge is to understand how to unleash the vastly unused
knowledge or experience of their employees, customers, or partners, and thus leveraging
their inherent collective intelligence.” (Leimeister, p. 246, 2010)
This all sounds quite wonderful but before lunging ourselves into the formation of such a
system we remember Leimeister (2010) point out that such a system will be better only under
certain conditions. Earlier we spoke briefly about the fact that good decisions demand good
information and as such if we were to tap into a CI-system with the purpose of generating
support for decisions within the company, it would seem wise to attempt to assess the quality
of the information generated by the collective. Bonabeau (2009) speaks to this issue and
states that a detriment is that our basic human nature can lead us astray when we’re making
important decisions. (Bonabeau, p. 47, 2009) We tend to favor information that fits our
current beliefs, be untrusting of information which speaks to the contrary and let ourselves be
influenced by how the information is presented. (Bonabeau, p. 46, 2009) While this may serve
as a good argument for why companies should seek information in their external environment
to strengthen their decisions, it also sets the requirement for ways to escape these bias-
distortions.
Bonabeau (2009) suggests a framework that may help diminish these tendencies and thereby
strengthen decisions. Outreach is one aspect of this framework and simply states that value
can be obtained through increasing the number of contributing individuals in the system.
Another aspect is that of additive aggregation the concept of which is to collect information
from a large number of sources, and perform some kind of averaging to make the information
15
collected from the system more reliable. (Bonabeau, p. 47, 2009) Lastly, he states that self-
organization is an important aspect. Mechanisms must be in place that allow for actors in the
system to interact, which he states is what allows the whole to be more than the sum of its
parts. (Bonabeau, p. 48, 2009)
Bonabeau (2009) then goes on to outline important considerations when attempting to
establish a CI-system. We will outline a few of these considerations most important to our
continuing project. First off, there must be a balance between diversity and in-depth expertise
when considering who to include in such a system. Diversity may lead to different
perspectives and a lot of different solutions, which is indeed seen as a good, but if no one
actually has any knowledge of the area the system is supposed to operate within, then it
seems unlikely that it will deliver additional value. Furthermore, the company must prepare
itself to lose control which poses predictability and liability issues. (Bonabeau, p. 48-49, 2009)
We comment on this later in section 3.1.
To sum up, CI-systems when founded correctly, with the proper precautions and
considerations taken into account, are able to deliver great value to a company seeking
decision-support in its external environment. In fact, as we have seen this may help a
company make better decisions by diminishing bias-distortion. As such, this seems to
correlate quite well with the assumption that tapping into stakeholder opinion with the
purpose of generating better solutions may be very valuable to a company. In the coming
section we discuss how this perspective correlates with the theoretical standpoints of
stakeholder theory and begin to outline our proposal that stakeholder engagement, when
conceived of as a CI-system, may fit social media very well.
3.1. Discussion: Recapitulation
Outreach promotes inclusivity
Both perspectives seem to promote inclusivity as a resource in itself. It seems an overarching
principle is that the more eyes we have on a task, the better the solution generated in
response to this will be. This seems to lend credence to the argument that we should include
more stakeholders in our pursuit of better decision-making. In the perspective of the logic of
16
community, more people is a benefit because it will allow us to establish stronger
relationships with a broader spectrum of our stakeholders. So should we move to include
everyone who wants to when we have topic for discussion? This brings us to our next
discussion.
Diversity is great but expertise is a must
The influence logic states that only key stakeholders should be included in the engagement.
Whether this is due to the fact that proponents of this logic find the inclusion of more diverse
groups of stakeholders irrelevant or too costly, or it is the contention that topics of
engagement are complex topics and demand professional insight, is hard to say. We can
however, with collective intelligence in mind, say that there needs to be some sight of who we
include into our engagement efforts. Let us assume for a moment that social media as a whole
is a CI-system. Undoubtedly, with what we know of social media this would most probably
end in a case of too much diversity but does this fact disqualify it or does it perhaps, as
Hypatia Research suggests, stress the fact that there needs to be a business case? (Hypatia
Research LLC, p. 4-5, 2011) In so far as having a business case means us zoning in on a specific
process of decision-making. E.g. if we are looking to establish a CI-system for enhanced
innovation capability in relation to a specific product-line, then the system should hold
stakeholders who discuss this topic. Exactly what constitutes the correct balance is not
explicitly defined and we return to this discussion continuously throughout the thesis.
It seems that the well-known concept of segmentation is prevalent in this way of thinking. Li
and Bernoff (2008) also argue for this when they suggest that a company’s successful
entrance into social media hinges on their ability to develop what they call a social
technographics profile (Li & Bernoff, chap. 3, 2008, n.p.):
“To truly understand the groundswell, you need to dissect and quantify the dynamics that
separate different participants. Why? Because a strategy that treats everyone alike will
spell failure – people aren’t alike and won’t respond in the same way.” (Li & Bernoff, chap.
3, 2008, n.p.)
17
Another argument for segmentation can be found within a philosophical debate which has
been going on in recent years. We include here two considerations expressed in this debate.
The first is the concept of ubiquitous expertise, which speaks to the fact that any person may
be considered an expert by merit of having lived a life in a given context. E.g. almost all of us
can be seen as experts by being able to speak our native language and this applies in other
aspects of life as well. As such, people who we would not normally consider to possess
valuable insight into a given field of knowledge might yet carry a perspective we ourselves
have not considered. (Collins & Evans, p. 16, 2007) However, they also point out that this does
not merit the idea that we should consider everyone’s judgments to be equally valuable, when
looking for insights we should be considerate of the fact that there are different levels of
expertise.5 (Collins & Evans, p. 14, 2007) We include these considerations because it seems to
stress the fact that in any case where we wish to include stakeholder insight into our decision-
making processes, we must make an attempt at assessing what we actually can learn from
those involved.
Social media is self-organization
Another aspect of securing a successful CI-system is to allow for it to be self-organized. The
argument here seems to be that when interaction among stakeholders in the system is
enabled it will create additional value. This concept may seem somewhat vague and as such
we allow ourselves an attempt at an interpretation. In the context of social media interaction
may speak to mechanisms that allow stakeholders to evaluate, learn from and comment on
each other’s contributions. Through this interaction actors may receive comments on their
contributions, which may help them visualize the strengths and weaknesses of their own
standpoints and aid them in the further development of these.
This also seems to correlate with the company’s release of control, which as shown by
Bonabeau (2009) they must prepare for. Social media may very well be perceived as a self-
organizing whole since there are no explicit hierarchical structures. When you create an
account on e.g. Twitter you will have the same options as when a company creates an account.
Even though a company is most probably more acknowledged and easily recognizable,
5 Please refer to Appendix-1 for The Periodic Table of Expertises as presented by Collins and Evans.
18
principally you will be able to participate in the same discussions as they do. With this in mind
it would seem that social media is able to live up to this criteria as well.
Additive aggregation may find topics for engagement
One of the ideals of the logic of community is for the topic of the engagement to be spawned
by stakeholder interest. We note that this is not a claim that the concept of additive
aggregation is the same but it seems to provide a perspective on how a company might find
those topics. It seems unlikely that we could find it rational to base topics for engagement on
what a few stakeholders feel is important. However, if we instead employ some form of
averaging by analyzing what a larger number of stakeholders have to say about a given topic,
then this might provide strength to the decision that it would be a good topic for engagement.
To conclude this recapitulation we find that there are arguments speaking to the benefit of
taking stakeholder engagement to social media. Social media will with its nature provide us
with the possibilities of expanding our concept of inclusion but as we have hopefully made
clear, there should be with regard for relevance. It may be an ideal that we would be able to
respond and assess any issue brought forward by any stakeholder, but the sheer manpower
required to do so coupled with increasing odds of ending up tackling irrelevancies seems to
speak against this. As the theory of collective intelligence suggests, we must aim to find the
right balance between diversity and expertise. However, this should not excuse the choice to
continue keeping everything within company walls. With what we have shown it seems that
the logic of influence in its ideal may suffer from bias-distortions due to the fact that nearly
everything is decided internally. Taking a level-headed look at this, with the many diverse
groups of stakeholders today interested in company activities how could we possibly justify
only considering a few? Whether the motivation is to tackle the threat of not having a
presence online or it springs from genuine interest in cooperation, it seems the arguments
predominantly speak in favor of tapping into the groundswell. In the coming section we take a
step back and clarify how we got here, what merits our contentions so far and throughout the
rest of the thesis.
19
4. Method and DiscussionThe considerations and proposals presented in this thesis are largely based on a theoretical
review of contemporary discussions regarding social media, stakeholder engagement and
collective intelligence. The mass proliferation of social media use has fundamentally changed
the way in which we communicate, who we can communicate with and also what information
is available to us on a daily basis. Scrolling down your Facebook stream, how much
information will you find that you would not otherwise have found? We are now more
connected with the world than ever before. There may be numerous reasons for why
companies have chosen to suit up and participate. It may be that it provides a generous
platform for low-cost marketing initiatives, or as Li and Bernoff (2008) claim because brands
are under serious threat from stakeholders interacting online.
Our interest in stakeholder engagement sprung mainly from study by Castello, Morsing and
Etter (Forthcoming), which showed that the discipline is moving toward social media.
Although here they showed that the goal was mainly communicating with more stakeholders,
the argument for which seems to be increased visibility. (Castello et al., p. 22, Forthcoming)
Visibility is of course beneficial in so far as it enables more people to get familiar with a brand,
and the fact that a stakeholder is able to have a conversation with a representative of that
brand may help in building trust. However, this “communication-alone” approach has led
some professionals to scrutinize the perceived ROI in social media. Some have proposed
annual customer satisfaction and retention as measures of ROI (Hypatia Research LLC, p. 4-5,
2011), and while we do not question that these measures may be relevant, it does seem hard
to objectively assess the causal relationship between these and interactions on social media. If
we have more satisfied customers how would we know whether this is due to a great product
or a positive interaction on social media? The same goes for sales. An employees great
handling of a stakeholder on social media may very well lead to a sale but to be sure of this,
we would need assess if that stakeholder goes on to buy something online. Not to mention
that the stakeholder might just as well go to a physical store and buy a product, in which case
tracking would be even harder. There may very well be companies who have succeeded in
figuring this out but it does speak to the challenges of investing in social media.
20
When we in the coming section move toward our own proposal of a way to engage in social
media it becomes important to clarify that the proposals are based primarily on our review of
literature. Not much, at least to our knowledge, has been written on the relationship between
social media and stakeholder engagement, or the practicality of engagements on social media
in general.
“However, the foundational literature in stakeholder engagement ill prepares us for a
world of networked societies with “geographically distributed cognition” and globalized
relations.” (Castello et al., p. 2, Forthcoming)
The consequence of this in the context of this thesis is that we have attempted through our
theoretical review to find support for the argument that stakeholder engagement as a
discipline which seeks to learn from stakeholders to find solutions to prevalent issues, can be
taken to social media. There may be a myriad of challenges pertaining to this and we do not
claim to possess the insight to cover all of them (to name a few; legal, organizational and
financial constraints may be barriers to entrance), as such we will not propose that our model
will fit everyone. It has instead become our project to derive the best possible model which
complies with the collection of theory presented in the thesis. In other words, this thesis is in
its essence an attempt to convert theoretical aspirations to practical applications. This also
serves to note that we remove ourselves from any scientific claims and focus solely on a
model, which may allow companies to engage now. However, when we are proposing the
application of a CI-system, where in essence social interaction is producing an outcome, it may
be relevant to portray our view of how such outcomes might be founded. Here we apply a
social constructionist view that anything constructed in such a system will be contingent on
the social reality in which it was created. (Hacking, p. 11, 1999) This is not to say that the
ideas a CI-system produces cannot be relevant or usable elsewhere but it underlines a view
that when asking humans to produce knowledge, we need consider the context. E.g. if we have
two groups of 20 people discussing the construction of the perfect car, in the one group they
may agree that the color red is the perfect choice, while in the other the color chosen is blue.
As such claims of generalizability and objectivity of the knowledge produced in such a system
would most likely demand supporting information. The need for generalizability and
21
objectivity would most likely hinge on the company in question and which decisions the
knowledge derived is meant to support.
As we move forward we start by presenting core principles which we believe may assist
companies in conceptualizing what a model for stakeholder engagement on social media
might look like. We then present what we believe to be a method fit for gathering information
from stakeholders on social media. We then go on to discuss how these principles might be
applied in practice in so far as we make a specific choice of a social technology, namely
Twitter, which will allow us to discuss in detail whether such principles can be supported by
functionality. Furthermore, we analyze claims from the field of Business Intelligence in order
to bring the model into implementation before applying it on a specific business case: The
case of how the model might aid a Communications Manager at Novo Nordisk. In this we will
relate our model specifically to his position in the company and the company in general,
which we find poses interesting challenges in relation to regulatory constraints. We end the
thesis by discussing the constraints of our model in general and in relation to the case, where
we bring up relevant debates in relation to social media in general.
5. Stakeholder Engagement as a Collective Intelligence systemWhen we cross-reference the guiding principles presented in stakeholder engagement and
collective intelligence theory respectively, we seem to have found some stables. First off, we
seem to have established that there are benefits to including more stakeholders in our
engagements, while at the same time noting that this does not mean that anyone and
everyone will be relevant candidates. We believe that viewing the stakeholder engagement
discipline as one, which is constantly trying to tap into the opinions and proposed solutions
produced in a CI-system may be highly beneficial. However, as we have also noted there is a
demand for some skepticism when doing so. With this in mind we derive the first carrying
principle of our model.
Include many but do so with care
Li and Bernoff (2008), AccountAbility (2011) and Bonabeau (2009) all propose a sort of
segmentation as a necessary mean to ensure effectiveness. We therefore find it reasonable to
22
adhere to this principle and propose that when engaging in social media to “listen in” we
should include as many stakeholders as we can find, who can by us be perceived to be
relevant candidates. We remember here that a balance is needed between the level of
diversity and expertise in the system. How we might secure such a balance is not explicitly
defined and as such we allow for conceptualization through the following example. E.g. if we
imagine that we are a pharmaceutical company this might be done by making sure that in the
system we include common people, patients, NGOs, advocacy groups, competitors, news
sources and perhaps even employees alike. Even more categories might exist relative to the
company in question but the overarching principle is to allow for different perspectives on the
same topics, which then may provide a more complete picture of what the prevalent issues
are that we need to respond to.
One might be tempted to question whether these stakeholders are at all present on social
media. It is a difficult question to answer but if we attempt at an assessment according to the
above example common people might be a description encapsulating a category of users too
broad to say something general about. There may be a myriad of motives for people to create
an account on a social media site and as such the perceived usefulness of such a stakeholder in
the system would most likely demand knowledge of the person behind the screen. In turn, it
seems reasonable to suggest that we may benefit from listening to what non-governmental
organizations are saying as they work professionally with areas like securing the
environment, human rights and the like. We note here that it is of course hard to say with any
certainty that the NGOs relevant to your company will be on social media. However, there is
some indication that NGOs have a presence: If we look to www.wefollow.com, which is a site
listing top twitter accounts in relation to number of followers6 and through this site the same
indication can be found for advocacy groups.7 The belief that competitors are on social media
might be qualified partly through the McKinsey study which we presented in the introduction,
where they showed that 72% of respondents had employed at least one technology.
Furthermore, in a study of the Fortune 500 Barnes and Andonian (2011) concluded the
following:
6 http://wefollow.com/twitter/ngo will show that there are many pages of accounts affiliated with various forms of NGOs. This of course is no objective claim but merely an indication that may lead us to believe that they are present on social media.7 http://wefollow.com/twitter/advocacy
23
“Three hundred eight (62%) of the 2011 F500 have corporate Twitter accounts with a
tweet in the past thirty days.” (Barnes & Andonian, p. 6, 2011)
All of this of course does not cement the fact that those relevant to establishing your specific
CI-system are out there, however, it does at least confirm that for some, they will be. Finding
the right candidates and the right balance between diversity and expertise will no doubt
demand some research to be done before a choice is made.
Now let them decide
Additive aggregation is as mentioned in section 3.1 one of the properties of the framework,
which may enable a CI-system to deliver value. In relation to stakeholder engagement it might
speak to a perspective on how we may rightly decide what is important. We take in the sum
total of opinions and solutions proposed by stakeholders in the system and perform some
kind of averaging on this to qualify what is important. If we put this in the context of the
inclusivity principle as proposed by the logic of community it seems that even though we do
not include everyone we may be moving toward a more inclusive engagement strategy. At
least if this is as it seems about connecting to more stakeholders with the purpose of getting a
broader picture of what is important for us as a company to focus on. In other words we allow
for topic-centered instead of firm-centered engagements. It is the contention of the collective
intelligence theory that more proposals will lead to what one might call a better end-product.
As such basing what issues we choose to take seriously on the opinions of diverse groups of
stakeholders might allow our engagements to yield better results overall.
This means that stakeholder opinion now must decide relevant topics and as such relates to
what Bonabeau (2009) terms the loss of control. However, as we discussed in section 3.1, if
the system is to be self-organized to create additional value it seems a necessary evil. But if we
imagine having picked out who we find relevant to listen to and allow these stakeholders to
carry out business as usual by not interfering, then we have at least in part made sure that the
issues we find through additive aggregation are genuine. In so far as it is untouched by the
company’s bias. This is not to say that we claim to completely escape bias-distortions. As we
noted when presenting the logic of influence, if we decide who joins we also in part decide
24
what comes out of it. However, this may be as far as we can go to resolve bias-distortions
while still maintaining sight for relevancy.
Decide on a business case before engaging
We have talked a lot about how securing a balance between diversity and expertise is a
bearing principle of CI-systems. It seems reasonable to suggest that we then will have to
decide beforehand what it is we want to gain more insight into. E.g. how would we define an
expert if we did not have a domain within which this is expressed? As stakeholder
engagement is related to the concept of corporate social responsibility, and this is as Freeman
(1984) noted related to economic and socio-political forces it seems that it must tackle many
different areas. These are broad descriptions of course and as such one might imagine that
establishing a CI-system that cares for all the intricacies included in these forces would be a
difficult task. Especially if we are make sure that the actors we include in the system have
knowledge of the area we want to look into.
Establishing a business case also relates to the general difficulties revolving around a
company’s investment in social media. We mentioned in section 1 that only about half of the
respondents in the McKinsey study reported at least one benefit from engaging in social
media, while Hypatia Research proposed the lack hereof as one of the main challenges
pertaining to company difficulties related to evaluating return on investment. (Hypatia
Research LLC, p. 4-5, 2011) Naturally, we do not assume with any certainty that the
establishment of a business case will ease this burden but perhaps it may be easier to evaluate
the return when we draw out information instead of assuming that dialogue alone is the goal.
E.g. the business professionals using the information might be able to tell us whether or not
the information is helpful. We provide further argumentation for the establishment of a
business case in section 6 and section 8.
As we said in section 4 the purpose of this was to outline the core principles of our
perspective on what may be a helpful model for taking engagements to social media. There
are some intricacies left to cover before we make an attempt at applying the model on our
case. One of which we will look to in the coming section, namely, the method by which we
would be able to perform some form of additive aggregation on the opinions of the
25
stakeholders in our system. We look here to a discipline which has received a lot of well-
earned attention within recent years. Text mining cannot be said to be new in general, but it is
most probably a new concept to many companies. We start by providing an introduction to
the discipline relating it to the field Business Intelligence and the tool data mining. We present
the basic terminology of text mining and two methods, categorization and clustering
respectively before we, through discussion, analyze and qualify why this fits our model.
6. Text Mining to extract information from social mediaH. P. Luhn in 1958 was one of the first to use the term Business Intelligence in his article A
Business Intelligence System. (Luhn, p. 314, 1958) However, it was not before the rise of
information-technology that the discipline truly took off. Since then the discipline has
received ever-increasing recognition for its capabilities of delivering value to a company. The
ability of business intelligence and data mining to analyze raw data, and from that derive
information to be used within the company has been praised time and time again. Data mining
concerns itself with what we would call structured data, or, data which lends itself to
tabularization. This is because when we think of data in this context we are typically referring
to data which exists in a database where single data elements are stored in tuples, which
represent a specific fact. A sale of a shirt would when stored in a database be set in a pre-
existing structure designed for a sale i.e. the structure could contain elements which refer to
the quantity, ID, place and time of a sale. These types of data elements are of course in
themselves valuable as they e.g. may help us keep track of how many shirts we have sold.
However, when storing data in structures like these has become popular it is not so much
because it is a convenient way of storing it. It is because when set in such structures data
mining, through its methods, is able to inform us about unknown patterns in our data, thereby
allowing us to get more information out of it than would previously have been possible.
(Berry & Linoff, 2004)
Text mining, text data mining or text analytics roughly refers to the same process. The main
contrast between the two disciplines, which actually employ many of the same methodologies,
lies within the fact that text mining concerns itself with unstructured data. This is termed as
such because in text mining we look to analyze textual data from a given language. You might
26
already be able to imagine how intensely difficult it would be to set up a pre-defined structure
to encapsulate the elements of a natural language, which is most likely the main reasoning
behind the term unstructured. And as such is of course not a claim that a language has no
structure at all. (Feldman & Sanger, p. 1, 2007)
Text mining as a discipline stems from a variety of fields which have been in the grasp of
scientists (it has especially been employed to analyze massive amounts of biomedical
literature) for quite some time. Natural language processing (NLP) is an important part of text
mining and is a discipline which in very broad terms can be said to revolve around handling
language through a computer (or allowing computers to understand and process natural
language). (Feldman & Sanger, 2007, n.p.) As should be commonly known anything a
computer is able to handle, from the very complex low-level programs that allow it to start up
to high-level applications such as Microsoft Word, needs some kind of structure. This is a
given because it is our instructions which allow it to function and as such if we do not hold the
knowledge, neither does the computer. We mentioned a contrast between data mining and
text mining in so far as the data, when we collect it, is quite different. However, just as it is the
purpose of data mining to uncover previously unknown patterns in sets of data it is the
purpose of text mining as well. And in doing so it derives much of its methods from the data
mining discipline and as such this field could also be said to be part of text mining. (Feldman &
Sanger, p. 1, 2007) In the context of this thesis, text mining is especially interesting, which we
attribute to the fact that almost every form of communication on social media presents itself
in the form of text.
”Even so, the very volume of comments out there is a vast source of information. And that’s
the second problem. Volume. There’s so much information flowing out of the groundswell,
it’s like watching a thousand television channels at once. To make sense of it, you need to
apply some technology, boiling down the chatter to a manageable stream of insights.” (Li &
Bernoff, chap. 10, 2008, n.p.)
Luckily for us, when applied with care and thought, this is exactly what text mining is capable
of doing. In an effort to qualify this statement we will spend some time taking you through a
basic terminology for text mining. We do this in an attempt to introduce the reader to the
27
domain and hopefully in the process reveal how information might be derived from analysis
of textual data. This we believe may clarify how text mining may serve as our vehicle for
performing additive aggregation on stakeholder communications on social media, and as such
stays congruent with our project.
6.1. Text Mining Basics
There are a lot of different presentations of what text mining is and what it is capable of. As
such attempting to derive an explanation of text mining from different sources may lead to
some confusion. Therefore, in an effort to stay consistent and avoid misinterpretation we
derive our taxonomy from Feldman and Sanger8 (2007) alone. What we present throughout
this section is but a small part of the total number of the methods text mining has to offer. We
have chosen to include the most important distinctions and will later, in sections 6.3 and 6.4,
present the methods by which we will attempt to analyze communications in our case.
Common text mining terms and their definitions
We have a bit of ground to cover so in an effort to provide the reader with a manageable
overview we list each term and its definition consecutively along with examples, which serves
to conceptualize text mining as a crucial part of our project. As we mentioned when we work
with data in text mining we typically refer to the data as being unstructured. Much of what we
will present in the following serves to alleviate this by treating textual data conceptually not
as language but as numbers that we then may deliver to statistical methods to uncover
information.
A document (Feldman & Sanger, p. 3, 2007): When we refer to a document in the context of
text mining it can be defined simply as a sequence of text. In this interpretation it may
obviously represent a lot of different items e.g. an e-mail, a book, a blog-entry, a status-update
on Facebook and a tweet on Twitter. It does not necessarily have to be a meaningful text and
as such it rings true that…
8 ”The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data”, Ronen Feldman & James Sanger, 2007.
28
“Just tried out my new insulin-pen. Much better than its previous models. No more needlestick injuries for me!”
…would qualify as a document and
“Yellow red drove big little home bus station in train.”
…would also qualify as a document.
A document collection (Feldman & Sanger, p. 2, 2007): Our document collection then can be
defined as the sum total of all individual documents in our possession. However, an important
distinction is that we may easily have more than one collection of documents in our
overarching document collection.
Document features (Feldman & Sanger, p. 4-8, 2007): The concept of a document having
features demands a bit more of an in-depth explanation. It relates to the fact that any natural
language contains a vast amount of different words, everyday expressions, commas, dots and
even in many cases two words are interpreted and perceived as being one word. I.e. is the
adjective everyday consisting of one or two words? If questions like these may confuse us
humans then you might imagine how much trouble they would cause a computer. Because of
this we make a distinction between the following document features:
- Characters: Any letter, numeral, special character and perhaps surprising to some, a
white space is also a character in a document.
- Words: Any single token that makes up a word in the context of some natural language.
Interestingly, in this interpretation “everyday” would have two tokens “every” and
“day”.
- Terms: Terms are referred to as words and/or expressions taken from a document and
which we (or a computer) could take to represent the individual document as a whole.
We wish here to refer to our previous example found in our description “A document”.
Here the terms representing the document might be “insulin-pen”, “better”, “models”,
“No” and “injuries”. Without complete knowledge of words and sentence structure in
the document we might still be able to infer that it is about some improvement of a
model of an insulin-pen, which cause no injuries.
29
- Concepts: The definition of concepts may be a bit more abstract since it does not
necessarily relate to words or terms found in the document. Concepts can be said to
characterize a document with a word without the need for reference to a word in the
document. I.e. imagine a short story (very short) about a girl’s tragic loss of her mother
in a car accident. It is entirely possible that the word “love” is not mentioned once in
the document but a concept of the story may still be found to be exactly that. Concepts
are typically found through methods of categorization within the domain of text
mining. We return to and expound upon categorization later in section 6.3.
This process of describing an individual document by its features relates to both the
previously described notion that natural languages contain a vast number of complex
relationships between words9 and the like. Therefore in an effort to focus in on what is most
important to us as well as the task being initiated with text mining we analyze and decide on
the most important features of a document or a document collection. E.g. returning to our
insulin-pen example we found that we could infer meaning from the document without
including certain words and characters. Having done that we would then name this our
representational model for that document and doing so is a crucial part of preparing our
textual data for analytical processing. (Feldman & Sanger, p. 4, 2007)
The domain (Feldman & Sanger, p. 8, 2007): The domain in relation to text mining can
broadly be defined as our area of interest. The area which we through our efforts wish to
know more about or add knowledge to. It seems reasonable to argue that any specific area of
knowledge has a wide range of concepts describing important distinctions, relationships and
terms important to that area. Because of this background knowledge of the domain is often
found to be helpful in discovering new concepts. E.g. if you performed a text mining study on
the total of the documents that deal with text mining you would find that concepts is also
sometimes referred to as keywords, although this is not included in what we have presented
here. Furthermore it seems quite obvious that we would not be able to make the judgment
that concepts and keywords represent the same thing if we did not know of concepts.
9 Currently, the English language holds more than a million words and as such a document in English can potentially hold more than a million different features. Source: http://www.languagemonitor.com/global-english/number-of-words-in-the-english-language-1008879/
30
When we in section 5 claimed the importance of having a business case in mind before we
start to establish a collective intelligence system we presented a couple of arguments.
However, as is hopefully now shown this is also highly related to text mining methodology. It
seems that in order to get the most out of a text mining initiative we require some amount of
knowledge about what it is we are looking for. Because of the many different dimensions of
meaning language can contain, text mining is not a magical device that we may employ to find
every hidden intention contained in the written word. To ensure that the methods we employ
deliver information, and perhaps even that we could conceivably make use of, they need a fair
amount of guidance. In the coming section we look to methods of preparing the data for
analytical processing. We present methods that serve to reduce the aforementioned
dimensionality and convert the individual documents to their representational models.
6.2. Preparing data for Text Mining
There are a myriad of ways to help prepare a document collection for analytical processing.
Some focus on the semantic properties of the documents to find e.g. nouns, verbs and
adjectives. Others focus on breaking up the document into smaller pieces by a process called
tokenization, which treats the document as a continuous stream of characters and then
(dependent on our choice) display it as e.g. sentences, words or even syllables. Most
commonly we would be most interested in finding out which sentences or words are in the
document by the mere rationale that it must be easier to then decipher meaning. It does
depend on the task at hand though. You could also, if it fits the task, stem the words in the
document. Stemming means that we return each word in the document to its simplest form
i.e. by removing ‘ing’ and ‘un’ from words. (Feldman & Sanger, p. 60, 2007) However,
undoubtedly the most important of any of these processes (or more commonly preprocessing
techniques) is the conversion of the textual data into numeral representations. We do this by
interpreting its document as a vector containing features, whereby each unique feature
represents a single dimension in the vector. (Feldman & Sanger, p. 89, 2007) It may be
difficult to see how this understanding applies so we will attempt to show this through the
following example. The sentence: “A dog is an animal. A dog is the best friend of man. A man is
an owner of a dog.” when interpreted as a vector of features looks like this:
31
a an of the is dog man best friend owner animal .4 2 2 1 3 3 2 1 1 1 1 3
or simply
(4,2,2,1,3,3,2,1,1,1,1,3)
In other words we take each feature of the document, count their frequency and from that
form a vector with a dimensionality corresponding to the number of unique features. Most
would probably agree that when we break the sentences apart and depict them as such the
punctuation marks seem less meaningful relative to the words in the document. Some might
even agree that the words a, an, of, the and is are less meaningful. Just as with our insulin-pen
example we might be able to derive the meaning of the document from “dog”, “man”, “best”,
“friend”, “owner” and “animal”. E.g. that a dog is an animal and that a man is an owner, since it
seems hard to conceptualize a dog as being the owner of something. When we have this
interpretation in hand we might even be able to infer that there is some “best friend”-
relationship between the man and the dog or the owner and the animal. Since it is our wish to
reduce the document to its simplest meaningful interpretation we commonly remove those
words (called stop words) we deem to be less meaningful. (Feldman & Sanger, p. 68-69, 2007)
When we have gone through these aforementioned motions pertaining to the preprocessing
of our textual data we can move on to where the fun starts, namely, the application of
methods that will draw out information.
6.3. Categorization of documents
We start with categorization which can broadly be defined as a process of dividing each
document into one or more specific categories based on a judgment of what category a given
document best fits. Categories can be decided on either by reference to the content of the sum
total of the documents within a collection or by reference to some background knowledge
about the area of interest. (Feldman & Sanger, p. 64, 2007) There are a lot of different
proposals for how this can be done, so we outline here some of the basic distinctions in
regards to categorization and then move on to show the method by which we will be
categorizing documents in our case.
32
As we mentioned, dependent on the task at hand, we can decide whether our documents can
belong to one or more categories. If a document can belong to one category only we call this
single-label categorization. In turn if it a document can belong to one or more categories it is
called multilabel categorization. (Feldman & Sanger, p. 67, 2007) I.e. if we are only interested
in finding out how many of our documents are about patients of diabetes then a single-label
approach may be appropriate. If however our interest lies only in documents about patients of
diabetes and their relationship with some specific medicine, then a multilabel approach will
most likely be best. Categorization can also be either document- or category-pivoted, which
merely serves to say whether we are trying to find documents that fit a certain category or all
categories that fit a document. In other words whether we take our outset in the categories or
the documents and this speaks primarily to how the given method of categorization functions.
(Feldman & Sanger, p. 67, 2007) Furthermore, methods may perform either hard or soft
categorization. When hard the division into categories is fully automated, which leads to a
result where a document either belongs or does not belong to a category. Here there is no
ambiguity, no in between. When soft the method will deliver to us a list of possible categories
for each document to which they might belong. The final choice of categories is then in the end
for us to decide. Lastly, this relates to what is termed the categorization status value which in
short is a number between zero and one that in its essence serves to tells us “how much” a
document belongs to a certain category. (Feldman & Sanger, p. 67-68, 2007)
Term Frequency – Inverse Document Frequency (Feldman & Sanger, p. 68, 2007)
Most commonly referred to as the TF-IDF weighting scheme this method seeks to deliver to us
a representation of a given feature’s relevance based on analysis of the document collection as
a whole. In other words TF-IDF calculates a weight for each feature in a document relative to
the sum of that feature’s presence in the collection. The mathematical description of the
method looks like this:
33
It states that we calculate the weight of the word w in the document d by taking the frequency
of that word in the document and multiplying it with the logarithm of the total number of
documents divided by the number of documents containing that word. In other words if we
have a total of 100 documents in our collection and 50 of these contain the word diabetes, and
a single document contains the word diabetes 5 times, we would find that the weight of the
word diabetes in that document would be 1.50515. This number would then be our indication
of the relevance of the word diabetes in relation to our collection of documents. TF-IDF does
this for all features in a document collection and ranks these features according to their
weight. We can then use this as a way to find the most relevant features and from that derive
categories fitting for the document collection, and furthermore it will allow us to detect
documents which are outside the scope of our interest. I.e. if there is no mention of the
‘category’ we are looking for we may decide that the document is not of relevance to us.
6.4. Clustering of documents
The second method we wish focus on is the method of clustering and when defined broadly it
is a process by which we group documents together in so-called clusters based on a
calculation of their similarity. Similarity in the context of text mining is assessed with regard
to the feature-content of the documents in the collection. (Feldman & Sanger, p. 82, 2007)
In order to conceptualize how this is done we refer back to section 6.2 where we noted that
when applying text mining we typically transform documents into a vector in which each
unique feature represents a given dimension. This is highly relevant in the context of methods
of clustering because in order to calculate the relative similarity between each document we
perceive each document as a vector in a given space. There are a couple of things to note
about the method of clustering. To detect similarity between documents we use what is called
the cosine similarity measure and the way in which this calculates similarity is by referring to
the weights calculated by the TF-IDF method and from that find the relative angle or distance
from a document to another based on those weights. In other words based on the relative
composition of word-level features in the documents. (Feldman & Sanger, p. 85, 2007) Based
34
on such a measure of similarity the k-means clustering algorithm then seeks to group similar
documents together and separate those that are not similar. Just as with categorization we
can apply either hard or soft clustering, where again hard clustering means that a document
can belong only to one cluster and soft means they may belong to more than one. (Feldman &
Sanger, p. 85-86, 2007)
“Irrespective of the problem variant, the clustering optimization problems are
computationally very hard.” (Feldman & Sanger, p. 85, 2007)
We had the good fortune of being able to explain in short how the TF-IDF algorithm works
and delivers results, however, the same cannot be said for the cosine similarity measure nor
the K-means clustering algorithm we will be applying later. In any case, as you shall see, we
will not manually be involved in the execution of these algorithms. As such we find it sufficient
that we understand how to make use of it, and possess an understanding of the type of
information it reveals about our document collection. In other words we stick to a level of
conceptual understanding in regards to the algorithms applied in our clustering activity. As
such the following the following example serves to enhance that understanding:
35
This is of course a simplification of what happens when we perform the k-means clustering
activity but if we perceive of this as the space in which our documents are represented as
vectors, we can see that three clusters have been formed. Here we have 19 documents in total
and based on TF-IDF weights and subsequent similarity measure of the distance between
them within this space, we find that 14 documents are deemed to be similar enough to be put
into some cluster, while five documents do not and as such are separated from the clusters.
The red cluster is frequently using the word ‘happy’, the blue uses a mix of ‘happy’ and ‘sad’
and the green is a bit less ‘happy’ than the blue. Such could be the results of a clustering
activity performed on a document collection and it would tell us that some of our documents
have been deemed to have similar content.
6.5. Text Mining for stakeholder opinion
When we have spent all this time explaining in more or less detail concepts and methods
derived from the field of text mining the reason is that to our knowledge this is the best
(perhaps even the only) way to effectively access and decipher communications on social
media. When a user is actively involved in maintaining an account on e.g. Facebook or Twitter
they, in this interpretation, disseminate into the digital space a number of unique documents.
Whether they do so in a commercial respect or just to communicate with friends is in this
context irrelevant. We will be able to apply text mining to find out information about or derive
information from these communications. When we move into the case material we propose
that we may use categorization as a way of making sure that the document collection in
question is actually about our domain of interest. Furthermore, we will propose that we look
to clustering activities as a way of performing additive aggregation on the opinions and
proposed solutions by stakeholders in our system. In other words when we, and this is exactly
our purpose, take the sum total of the communications of stakeholders we have chosen to
include in our system, we may be able to apply these methods to perform the kind of
averaging needed to gain insight into the relative significance of these opinions and solutions.
In the coming section we take a close look at the social technology which we will be focusing
on throughout the rest of the thesis. Until now we have focused mostly on social media in
general terms and we note here that the selection of a specific technology is not a rejection of
36
our belief that the proposed model for engaging in social media can be applied on multiple
technologies. In this we will be outlining the argument for our choice of Twitter and then
move on to analyze the capacity of Twitter as a social technology to meet the demands laid out
in section 5.
7. Enter Twitter, “Instantly connect to what’s most important to you.”Twitter can be described as a technology for communication and information sharing. A
popular term within literature describes it as a microblogging service, which serves to relate
it to the well-known concept of blogging. The term micro figures in this description because of
a restriction the service puts on the amount of characters that can be included in any single
post (or as its called, a tweet). If you want to make use of the service you have to grow
accustomed to formulating your thoughts and opinions in 140 characters or less. (Lovejoy,
Waters & Saxton, p. 313, 2012) Even so it has become one of the largest online social
networks and continues its very rapid growth each and every day. At the time of writing the
service holds more than 628 million unique accounts and every second 12 more accounts are
registered.10
“Social media sites allow for the rapid dissemination of information as well as the rapid
exchange of information. Twitter amplifies the rapidity of the information exchange by
limiting the size of the messages to easily digestible information pieces.” (Lovejoy et al., p.
313, 2012)
This is so because Twitter like most social technologies work in real-time. As soon as you
press the “Tweet”-button the information contained in your message is sent out into the
Twittersphere to millions of potential readers. A common criticism of Twitter as a social
technology is that no meaningful information can be contained in 140 characters (Lovejoy et
al., p. 313, 2012) but as we will see users seem to have found ways to circumvent this, and
quite obviously it has not hindered the adoption of the technology. In order to understand in
detail how Twitter works and how it, as is our wish, may facilitate the establishment of a CI-
system we refer to a study published a few years back by Kwak, Lee, Park and Moon (2010).
The study is by these authors proclaimed to be the first ever to study Twitter in its entirety 10 http://twopcharts.com/twitter500million
37
and as such conclusions are derived from analysis of 41.7 million user profiles, 1.47 billion
social relations and 106 million tweets. (Kwak et al., p. 1, 2010) In the following we will
highlight some of the conclusions presented in this study along with some presented by Finin,
Java, Song, Tseng (2007) in order to thoroughly understand what Twitter is and how
stakeholders are using it. First, however, we cover the most basic functionalities of Twitter as
a tool for communication and social interaction.
7.1. Twitter as a Collective Intelligence System
When you create an account on Twitter you may provide the service with your full name,
location, web page along with a short biography. Nothing more is needed before you are able
to start sending out your tweets into the Twittersphere. In order for your tweets to actually
reach people there are a few options. You can either spend some time connecting to other
accounts by using Twitters internal search engine to find accounts which speak of topics
important to you. If you have a somewhat clear image of which topics you are interested in
you may use the hashtag function to find these. Lets pretend you were quite interested in
anything having to do with sports you might then type this in the search field.
If you are wondering why both sports and #sports have been typed in this is because Twitter
has the capacity to search on keywords such as ‘sports’, which will return to you all the tweets
containing the word ‘sports’. The so-called hashtag #sports is then another functionality of
Twitter that allows users to tag their tweets so as to pre-emptively categorize their tweets.
(Lovejoy et al., p. 314, 2012) Generally you could interpret a hashtag as a users declaration
that this is the topic her tweet is about. Not that this is strictly upheld; you might compose a
tweet solely of words with a hashtag in front of them. However, a quick search for any
hashtag-topic should return a picture, which makes it seems plausible to suggest that this is
the general consensus.
38
Having done your topic-specific reconnaissance you might then start to delve deeper into the
heart of Twitter by beginning to ‘follow’ other accounts. When you decide to follow an account
this means that in the future you subscribe to a feed from their account, which means that in
the future you will be receiving every new tweet that account sends out. (Kwak et al., p. 1,
2010) From then on these tweets and the tweets from any other account you choose to follow
will be shown on your ‘home’-tab.
Depending on each unique situation an account might choose to reciprocate the act by
following your account as well. This would then constitute a friend-relationship but in its
essence it simply means that this account will be receiving your tweets as well. Another basic
element we will cover before moving on is that of the ‘retweet’, which is a core functionality of
Twitter that establishes the foundation for interaction between accounts. You can direct
messages at specific accounts (using @account), however, when you decide to retweet a tweet
you are effectively copying another accounts tweet and posting it again for your followers to
read. (Kwak et al., p. 1, 2010) E.g. if an account with 10 followers sends out a tweet this would
only be sure to reach 10 other accounts, however, if one of the followers has 50 additional
followers and chooses to retweet it then it is sure to reach 60 accounts. When we see Lovejoy
et al. (2012) in section 7 claiming that information disseminates rapidly on Twitter the
retweet has a huge part in this. We will return to this later in the section.
There are of course many ways of getting into Twitter when you first start out but this short
account will be sufficient for our purpose. Kwak et al. (2010) take a closer look at this
functionality and how users in the Twittersphere are actually employing it. Looking at the
follower-followed relationships on Twitter they found that 77.9% of the time when a user
decides to follow another account it remains a one-way connection, and as such only 22.1% of
relationships on Twitter can be said to be reciprocal. Furthermore, they showed that 67.6% of
users are not being followed by any of those who they have chosen to follow. (Kwak et al., p. 3,
2010) Furthermore, they studied the network properties of Twitter and found that to get
39
from any one given account to another given account there was an average separation of only
4.12 jumps, which in network topology speaks to how many people you would have to get
through to get to a complete stranger. This deviates from classic real-world networks, where
on average six jumps would yield the same result. (Kwak et al., p. 3-4, 2010) They note that for
93.5% of users information needs to travel less than five jumps to go from any one account to
another. What this means is that Twitter is a compact network and because of this
information on Twitter may spread more easily to users outside ‘your own network’. (Kwak et
al., p. 4, 2010) These observations led the researchers to conjecture that for many users
Twitter might be a source of information more than a site for social interaction. (Kwak et al.,
p. 3-4, 2010) Although this is only conjecture it presents an interesting indication in relation
to our project since it is essentially our goal to do just that, namely, find information.
The last observation we wish to include presents itself in the researchers study of the retweet.
Here they found that when any given user sends out a tweet, and another user decides to
retweet it, the original tweet will on average reach 1.000 users irrespective of how many
followers the original author of the tweet had at the time of writing. (Kwak et al., p. 8, 2010)
“Individual users have the power to dictate which information is important and should
spread by the form of retweet, which collectively determines the importance of the original
tweet. In a way we are witnessing the emergence of collective intelligence.” (Kwak et al., p.
8, 2010)
This again may speak to the strength of Twitters functionality in regard to information
dissemination and collection. These observations at the very least grant us an indication that
people may be able to use and may be using Twitter as a source of information. We can stay
happy with the fact that information is spread easily, quickly and perhaps this might even
provide us with an indication that Twitter is especially suited for stakeholder engagement
initiatives seeking to learn from stakeholders. Theoretically the demands are met, since the
core functionality that is given means we will be exposed to more opinions and more
information. Potentially we might even reach an enhanced number of stakeholders with our
communications referring to the compact network structure and the properties of the
retweet. However, to stay congruent with our project we cannot uncritically indulge ourselves
40
in this perception. We need a way to assess the relative relevance of each stakeholder (or
account) included in our system. Undoubtedly, as we discussed in section 5, this will demand
some research to be carried out since effectively all the information we have about a given
user are the descriptions she chose to include when the account was created. In order for us
to have some starting point in this we refer to the following four categories of user intentions
(Finin et al., p. 7-8, 2007):
Daily chatter: Tweets containing information about e.g. daily routines and what a given user is
doing at the time of writing.
Conversations: Tweets containing a given conversation between two accounts, characterized
by the presence of @account in the tweet.
Sharing information/URLs: They characterize a tweet as one that has the purpose of sharing
information if that given tweet contains an URL linking to some source of information.
Reporting news: Tweets containing information about some form of latest news. Could either
be a reiteration of news from an external source or a reference to some news pertaining to
Twitter.
These are obviously quite broad categories and they stem from research carried out in 2007
and we know that Twitter has evolved greatly since then. However, it is to our knowledge the
only study which has attempted to tackle the challenge of deriving categories of user
intentions on Twitter. As we mentioned previously in the section they are meant to serve as a
starting and reference point, which hopefully may aid us in the interpretation inevitably
required to assess whether an account can be deemed suitable for inclusion in a collective
intelligence system. However, what is perhaps a more interesting observation presented by
Finin et al. (2007) is that they were able to find multiple community structures within the
Twitter network e.g. as when they found a community in which the talk was about gaming.
“Based on our study of the communities in Twitter dataset, we observed that this is a
representative community in Twitter network: people in one community have certain
41
common interests and they also share with each other about their personal feeling and
daily experience.” (Finin et al., p. 6, 2007)
It may not immediately be clear why this is interesting, however, if we remember back to
section 5 where we spoke of self-organization as a means to a successful CI-system, it may be
more clear. Self-organization speaks to the demand that interaction need be possible between
the stakeholders we include in the system. This study then reveals that the foundation for
such interaction might well exist on Twitter, that is, if we can find it. Unfortunately it will not
be possible for us in the context of this thesis to carry out the link and network analysis which
could reveal such a community structure. As such we leave it at this mention.
The purpose of this section was to delve deep into the functionalities of Twitter in order to
reveal how the technology works, how people are using it and more importantly how it might
be able to serve as the foundation for the employment of a CI-system. Referring to section 3,
you might even say that the necessary genes are present on Twitter. The crowd-gene is
activated in so far as anyone can join Twitter, the create-gene when a tweet is posted and we
might construe of the functionality behind the retweet as one which supports the decide-gene.
A tweet is a unique contribution and as such the collective-gene is activated, whether the
collaboration-gene is also present through the retweet we leave up to interpretation. We
believe that Twitter has capabilities that other social networks do not. The mere fact that you
can follow someone without that act being reciprocated provides possibilities for gathering
information that other services, where relation is built on a mutual agreement, do not. As we
said at the end of section 6.5 this does not mean that we reject the possibilities of
establishment on other social technologies. They may each hold benefits as well as drawbacks
e.g. it will probably be easier assessing the relevance of an account on Facebook than on
Twitter by sheer accessibility of personal information, which might grant credence to the
information disseminated from the account in question. This question of whether the
information disseminated from an account can be said to be credible is a topic we discuss
later in section 10. From what we have presented in this section, however, it does seem
reasonable to suggest that if it is information we are looking for Twitter might just be an
effective vehicle.
42
In the coming section we take a step back and assess what we now know about text mining
and Twitter respectively in order to correlate these considerations to the core principles
presented in section 5, and lay the finishing touch on the proposed model for taking
stakeholder engagement to social media.
8. Stakeholder Intelligence on TwitterWe have now wandered across multiple fields of theory in an effort to find aspirations and
perspectives that may aid us in taking stakeholder engagement to social media. We looked at
stakeholder theory and uncovered its own perspective on what the prospect might entail. We
hinged on perspectives such as those proposed by the logic of community and found that in
essence the aspiration is to allow for more inclusivity and for engagement-topics to be
spawned in the stakeholder community. Given that these perspectives provide little insight
into how we might bring them into business practice we found further support in collective
intelligence. With this we found support for the benefits that might be derived from including
more stakeholders and allowing them to contribute to solutions by their own admission.
Furthermore, collective intelligence taught us that we would have to somehow assess the
stakeholders we choose to include if we are to produce value from such efforts. We then
moved on to text mining in order to propose a method for extraction of the information
spawned by stakeholders on social media. Lastly, we provided a detailed description of the
activities on the social technology Twitter to propose a place, where we could establish a CI-
system. However, as we described in section 5 we have yet to cover a few aspects before we
can be satisfied that we have something we can bring into business practice.
In section 3 we spoke in short of how the purpose of all of this is to strengthen decision-
support and thereby decision-making in the company. In saying so we are however also
aware that decision-making in a company is far from a static concept. Broadly speaking
everyone can be said to constantly make decisions with immeasurably different types of
information lending support to their choices. This might shed some light on the importance of
zoning in on which decisions we are trying to support with a CI-system, as one decision needs
some information, while another needs some other information. To further support this we
refer to section 6 where we have shown that text mining, to be an effective medium for
43
gathering information, needs a guiding domain. Furthermore, in section 7 we have shown that
Twitter holds more than 500 million different accounts and that tweets posted may contain
everything from daily chatter to news reports. All of this coupled with the points emphasized
in section 5 may bespeak the necessity of having a clear-cut business case (or domain) before
engaging and before we relinquish control.
What we are trying to establish in this thesis might carry many different connotations. Some
might call it a CI-system, some an enhanced focus group. However, if we try to conceptualize
who might be put to the task of establishing it in the context of an actual company it would
most probably be handed to the business intelligence department. As such the field of
business intelligence seems the right place to look for inspiration for how to ‘zone in’. From
this we derive another core principle of the model, which relates to the model as a whole and
how we might best bring the information into the company.
Include the business professional
Imhoff and White (2011) present a TDWI11 Best Practices Report pertaining to the
implementation and execution of business intelligence initiatives in a company. In this report
they introduce the concept of self-service business intelligence, which broadly speaking can
be defined as establishing the institutional and technological capabilities that allow
information workers to decide for themselves what information needs their work entails.
(Imhoff & White, p. 4, 2011) It is an extensive report which goes into great detail in relation to
technological requirements. Here we will focus on the benefits described from providing
information workers with the opportunity to enter into the process of deciding what
information they need.
“To create a sustainable and appropriate self-service BI environment, the implementers
must thoroughly understand the information workers who will be using the environment.
They must understand their motivations, mode of working (e.g., mobile, geographically
dispersed, virtual) and, of course, their technological skill sets.” (Imhoff & White, p. 11,
2011)
11 The Data Warehouse Institute, www.tdwi.org
44
The benefits described from applying such a perspective when bringing information into the
business includes a declining demand for involvement from the IT department in the daily
workings, more satisfaction with the services (and the IT department) from information
workers and that the IT department may become a partner to these instead of a nuisance. The
first and the last benefit described, one might suggest, could relate to the thinking that if we
allow those with information needs to enter into the process of deciding what information is
sent their way, then we might be able to bring about a more sustainable solution. The second
benefit may speak to a perceived satisfaction of being involved in the creation of the processes
that shape one’s daily routines. (Imhoff & White, p. 11, 2011) We will not claim this to be an
exact reiteration of the motives behind such an initiative but if we take a closer look it seems
to promote a perspective, where we as a company take the stand that our employee’s might
be quite knowledgeable about what they need for their work to be carried out. If we perceive
of an employee as a stakeholder then one could suggest, according to our previous
argumentation, that they too could be involved into decision-making with the promise of a
better solution.
Accordingly, we propose to include the business professional in the establishment of a CI-
system. We believe this will provide us with the best possible way of zoning in on what
information we need from the system, and if this does indeed increase employee satisfaction
then we shall gladly reap that benefit as well. Most of all, however, we relate this to the
establishment of a business case and the selection of which stakeholders to include. Imagine
an employee whose daily work revolves around relating to stakeholders to communicate the
company’s position on a given topic to them and learn from their perspectives. Would it not
be reasonable to suggest that this employee would be an invaluable resource in deciding what
information is important, and who might be able to deliver to us valuable information? We
note here that this of course must not lead us to a too narrow selection of stakeholders,
because as Bonabeau (2009) mentioned in section 3 outreach is needed. However, we believe
there is support for the rationale of taking such suggestions and insight into our
considerations.
In order to provide an overview of the steps or processes laid out in this section as well as in
section 5 we include here a graphical representation of our proposed model:
45
Much of what is shown in the model has already been discussed and as such should not need
further explanation. However, when we state that research should be done to find one or
more social technologies this may demand a bit of explanation. This relates to the increasing
capabilities of text mining systems to encompass more than one domain. (Cohen & Hunter,
p.2, 2008) In this thesis we stick to the one. When the model presents a step involving an
initial analysis of the information extracted from the stakeholders selected this relates to an
attempt at ensuring that we do not deploy a system, which delivers information that the
employee in question cannot make use of. Generally, it seems common sense to suggest that
we should first make sure that the stakeholders we have chosen to include also are able to
deliver value before financing the deployment of a long-term system.
In the following section we move into the case material related to this thesis. We had the good
fortune of corresponding with Scott Dille, a Communications Manager from Novo Nordisk
who sits in their department for Corporate Sustainability. We describe Novo Nordisk as a
46
company in relation to stakeholder engagement along with descriptions of the daily work
carried out by the manager. Here we include descriptions of what, in essence, the goal of his
work is and we also describe some of the challenges he is faced with being an employee
communicating on behalf of a large pharmaceutical company. Lastly, we attempt at applying
our own model on this position in this specific context in order to, after we have been through
the case, be able to better evaluate our model.
9. The case of a Communications Manager at Novo NordiskNovo Nordisk A/S is a large European pharmaceutical company with headquarters based in
Denmark, and departments in 75 countries total. The company as it is today came to be
through a corporate merger in 1989 and has since then worked to ensure the progress of the
capabilities within the area of diabetes care and other diseases. (Novo Nordisk 1, 2012)12 In
later years, rising demands for treatment of haemophilia drove the company to establish The
Novo Nordisk Haemophilia Foundation, which as they describe was to underline the
company’s social responsibility within haemophilia care. (Novo Nordisk 2, 2012) This citation is
included because a look at Novo Nordisk’s history tells the story of a company that, from the
get go, has had a stern focus on its societal, scientific and environmental context. Before the
merger Nordisk, as one of the companies involved was called, in 1926 established the Nordisk
Insulin Foundation to support research, and people with diabetes in Scandinavia. The second
company involved, named Novo, in 1951 established the Novo Foundation to support
scientific, social and humanitarian causes. Fast forward to 2006 the company signed an
agreement the World Wide Fund for Nature to become part of the WWF Climate Savers
program, committing to a 10% reduction of the carbon emissions by 2014. (Novo Nordisk 2,
2012) The list could continue on, but we will stop here, and as this information stems from
the company’s own website we take this picture-perfect account with a grain of salt. However,
looking at the facts and the actions carried out through the years most would probably agree
that we are here dealing with a company which stays aware of how it affects its external
environment. This may be supported by the fact that they have been thoroughly recognized
through the years, winning several awards for its performance related to Corporate Social
Responsibility. (Novo Nordisk 2, 2012)
12 The totality of the information presented about Novo Nordisk is taken from their website. Specific links and dates of viewing can be found in the Bibliography.
47
The reason for this short account naturally relates to the fact that we are occupied with the
field of stakeholder engagement, which as described in section 5 invariably relates
maintaining a responsible nature in accordance with the Triple Bottom Line. And the
company is highly focused on cooperation with their stakeholders, who in their own words
include:
“Novo Nordisk's key stakeholders include people with diabetes and others who rely on our
medicines, customers (ie public healthcare providers and payers), employees, investors,
suppliers and other business partners, neighbours, and key publics. For us, the patient is at
the centre – and hence the ultimate stakeholder to which the company must hold itself
accountable.” (Novo Nordisk 3, 2012)
They recognize the benefits of building trust-based relationships and including stakeholders
in the conception of well-founded decisions. However, they also reveal that such relationships
are built on membership and partnership, which a closer look reveals are seemingly
exclusively available to organizations i.e. business associations, advocacy groups and think
tanks. (Novo Nordisk 4, 2012) Naturally, such established affiliations should be held in high
regard but they also have the aspiration to take stakeholder engagement to social media. This
is shown by their presence on Twitter, however, here things become a bit ambiguous, at least
if we ask the question of why they have a presence on social media. They stress that there are
quite a few subjects that they cannot discuss on social media, and that they intend to read
tweets directed at them but are not be able to reply back to them all. They also state quite
clearly that their purpose on Twitter is to tweet about Corporate Sustainability (Novo Nordisk
5, 2012), which might indicate unidirectionality in their purpose. However, our
correspondence with the communications manager responsible for one of these accounts on
Twitter yielded a deeper understanding of the premise that builds constraints on their
engagements on Twitter.
9.1. CSR-Communication on Twitter
As a communications manager at Novo Nordisk, in his daily work he is responsible for
communicating about issues and news in relation to corporate sustainability. He manages the
48
Twitter account ‘@NovoNordiskTBL’, which with a total of 2.544 tweets, 1.970 followers and
1.908 followings seems to be an account with a solidified presence.
Being responsible for the daily dissemination of information on Twitter for a well-renowned
pharmaceutical company like Novo Nordisk of course brings about some challenges. First off,
in our correspondence it quickly became clear that they strive for a very high standard in the
content they send out into the Twittersphere. The background for this seems to be that Novo
Nordisk is a world leader in the area of diabetes care and has been for a long time. One might
easily imagine the demand for standards in the content of the information disseminated if, as
we assume is their wish, they are to maintain this status. In this position he daily makes
decisions on what to communicate about, which statements qualify for reiteration (retweet)
and as such follows a great demand for sources of information. The interest in gathering
information from sources on social media most probably sprung from this demand. As the
dataset provided was a list of accounts on Twitter this may indicate the recognition that these
can be treated as sources of information, which can be said to be congruent with our
perspective. Therefore we will attempt at applying the model in the context of Novo Nordisk
and the manager’s work in this company. However, before we move into this we need to cover
a challenge related to his work with communication on Twitter.
Since Novo Nordisk is a large pharmaceutical company dealing with products that directly
affect the health of individuals around the globe, they are of course subject to quite a few
regulatory restraints. When we found that they claim that there are some topics they cannot
discuss on social media there is actually a very reasonable explanation for this. There may be
numerous explanations in fact but the one we will include here quite clearly shows that it is
not possible to have an open dialogue on social media. It relates to industry guidance put forth
in recommendations by the Food and Drug Administration in the United States, where Novo
49
Nordisk also has products on the market. It relates to restrictions put on responses to
unsolicited requests for so-called off-label information about products.
“Statements that promote a drug or medical device for uses other than those approved or
cleared by FDA may be used as evidence of a new intended use. Introducing a product into
commerce for such a new intended use without FDA approval or clearance would, under
these requirements, generally violate the law.” (FDA, p. 2, 2011)
It should be fairly well-known that any drug or medical device has a clearly defined intended
use. Now this of course rhetorically presented as recommendations but in essence this states
that any comments on a product may be conceived by the FDA as the promotion of a new
intended use. A new intended use would then most likely need new approval from the FDA,
which if nothing else presents a stern warning. However, there is also the recommendation
that when an individual puts forth an unsolicited request for information the company must
grant such information only to the individual who asked for it. (FDA, p. 7, 2011)
“A firm should ensure that all pertinent background data are obtained to be able to
determine what information is being requested before providing a response.” (FDA, p. 7,
2011)
From this it seems very reasonable that a company like Novo Nordisk would adopt a non-
disclosure policy on social media in regard to some areas of business. First off, if they may
only disclose information to the individual who asked for it that can be said to hinder the
possibility of more perspectives on the same issue. Secondly, the fact that the company must
gather all pertinent background data before providing a response further supports such a
policy, given the inefficient nature of having to carefully evaluate every question directed at
them on Twitter.
Because of this we dare to suggest that our proposal might be well-suited for a company like
Novo Nordisk, since in essence we have armed ourselves with a different perspective on how
to interact with stakeholders on social media. Through our model a company like Novo
Nordisk might be able to attentively listen to the opinions and proposed solutions
50
disseminated by stakeholders on social media and from that gain insight into topics of
interest, and then present their view on those topics in a manner congruent with the given
regulatory constraints. In the following we will go through the steps proposed stopping at the
initial analysis of information, since we in the context of this thesis have no way of actually
deploying a CI-system into a company. As such we will focus on whether the information
disseminated would qualify to guide decisions.
9.2. Establishing a business case (domain)
As we have discussed in the two previous sections Novo Nordisk is a world leader in diabetes
care and as such the domain driving us forward is exactly that, the subject of diabetes. We
have also been through the importance of zoning in on a specific subject with regard to text
mining capabilities. As such even though Novo Nordisk has a stake in different areas we focus
solely on finding information about diabetes. Information that may aid the manager maintain
Novo Nordisk’s image as a world leader in diabetes.
We have already established that the preferred social technology in the context of this thesis
is Twitter and if we needed to do further research on this we would quite likely end up with
Twitter regardless. We can relate this to the capabilities of Twitter in disseminating
information, at least in so far as it rings true that part of his job to spread an image of Novo
Nordisk as a world leader in diabetes. We have already shown in section 7 that the likelihood
of such information reaching people it would not normally have reached is drastically
increased on Twitter (as per the compact network and retweets). As such we move directly to
explain how the stakeholders contained in the case material were selected.
9.3. Selection of stakeholders (balance diversity and expertise)
The communications from stakeholders that we will be analyzing in this case stems primarily
from a recommended list of Twitter accounts provided to us by Scott Dille. We left this choice
to him as diabetes as a subject is not exactly within the scope of our competencies, and he is
no doubt a business professional in this respect, which as we argued in section 8 makes it
reasonable to allow him to decide who to include. We did however analyze the participants to
51
ensure that we have both diversity and expertise in our stakeholder intelligence system. Here
we wish to draw out descriptions of a few accounts to portray our findings:
Pan American Health Organization
Twitter: @NCDs_PAHO (https://twitter.com/ncds_paho)
Twitter biography: “Learn what NCDs are, know the risk factors, and support the UN High-level
Meeting on NCDs and Wellness Week in NYC this September. PAHO tweets.”
Website: http://new.paho.org/hq/
Glu / T1D Exchange
Twitter: @MyGlu (https://twitter.com/myglu)
Twitter biography: “Glu is a new online community for people touched by type 1 diabetes. Glu is
part of the T1D Exchange - www.t1dexchange.org”
Website: http://t1dexchange.org
Peg Abernathy Group
Twitter: @PegAbernathy (https://twitter.com/pegabernathy)
Twitter biography: ”Diabetes Advocate with 18 years experience and 22 years Type 1.”
Website: http://pegabernathygroup.com
AmandaMichelleManait
Twitter: @sweetliferunner (https://twitter.com/sweetliferunner)
Twitter biography: “I write my experiences as a Diabetic Runner to inspire people. If I can, you
can too! I run not to win over other runners, I run to win over Diabetes!”
Website: http://thesweetliferunner.blogspot.com
This is only 4 of the sum total of 58 accounts provided to us and the total list presents a
potential of many diverse forms of perspectives all somehow coupled to diabetes. We note
here that in the list we have found a bias toward diabetes patients and non-professionals
otherwise affected by diabetes. In order to qualify the reason we stuck with this list we refer
to a correspondence we had with co-author of the book Rethinking Expertise. We asked Dr.
52
Robert Evans of Cardiff School of Social Sciences for his assessment on the perceived
expertise of diabetes patients. The following is an excerpt of his answer:
“From our perspective, T1 patients, providing they have been patients for a long time, will
be experts in the process of living with diabetes. In this case, time since diagnosis provides a
way of placing them on the ladder and puts them in the category of contributory experts. Of
course, you might want newly diagnosed patients too because what they don't know or
struggle with might be revealing too (e.g of what current information doesn't say enough
about!).” (Appendix-3)
To clarify contributory experts are by their definition the last step on the ladder of expertise
under the umbrella of what they categorize as specialist tacit knowledge. (Collins & Evans, p.
14, 2007) With this in mind we felt confident moving forward with the perception that we had
a diverse group of stakeholders with a sufficient amount of expertise on the subject of
diabetes present in the group as well.
9.4. Initial analysis of information quality
In this section we move into the presentation of our methods of data collection, cleansing and
analysis. We will lay out our approach in accordance with the descriptions presented in
section 6, where we described that we will be using text mining to categorize our dataset in
order to make sure that diabetes is the main topic of discussion, as well as perform a k-means
clustering activity on the tweets we collected in an effort to adhere to the concept of additive
aggregation.
Data collection
We collected the data through the popular open-source programming language for statistics
called R in which a package called twitteR can be downloaded. This package has the express
purpose of accessing different functionalities of Twitters API13 and we used this to download
tweets from each account. The number to download from each account was set to 150 in
order to keep dataset at a size we had computing power to tackle. We took only one sample
based on the fact that not all accounts held a total of 150 tweets, e.g. one account returned
13 https://dev.twitter.com/
53
only 6 and another returned 86. In actuality the list totaled 59 accounts to begin with but one
account had never sent tweet and as such was removed from the dataset. However, most of
the 58 accounts did indeed return 150 tweets and as such we ended up with a total of 7763
tweets. In the following whenever we refer to document we are referring to a single unique
tweet and when we refer to document collection we are referring to the total of 7763 tweets.
Data cleansing
For our data cleansing activities as well as the analysis to come we used another open-source
program called RapidMiner, which is a program for statistics, data mining and text mining
along with different types of reporting features. In RapidMiner we performed the required
preprocessing steps related to text mining in order to reduce the dimensionality of our
dataset. RapidMiner has the functionality to handle all of these and as such we started by
performing tokenization by non-letter, which means that RapidMiner divides each document
into word-level features by interpreting each non-letter as either the start or the end of a
word. We then transformed each upper-case letter to lower-case, which we did because the
program would otherwise interpret Diabetes and diabetes as different words. All stop words
in the dataset were then removed e.g. words like and, the, he, she, it because as we discussed
in section 6.2 these words hold little meaning, and to cater to methods of analysis we need the
simplest meaningful representation of a document. We then filtered tokens that had less than
two character-level features in them as these special characters like ‘/‘ were otherwise
present in the dataset. Furthermore, we stemmed each word using a built in method of
stemming, which for us served to make it easier to detect when words like diabetes were
54
present in a document. The main reasoning behind the choice of stemming was that we found
many different representations and ways of expressing that a tweet was about the subject of
diabetes. E.g. diabetic, diabetes and other representations were stemmed to ‘diabet’ and as
such we obtained a more complete picture of the representation of diabetes in the document
collection. Lastly, after studying the dataset and the dimensionality of the document collection
we decided to make use of an additional feature in RapidMiner called ‘prune method’. The
way this functions and the way we used it was to set it to only present us with words present
in at least 50 documents. We note here that some of these actions were necessary to perform
in order for our limited computing power to handle the dataset in analysis. However, we were
also satisfied that the document collection still held 168 different word-level features which
could be put into analysis.
Data analysis
Before we move into the presentation of the outcome of our work with the dataset we wish to
refer back to section 4, where we described that the purpose of this thesis is solely to explore
the possibilities of bringing theoretical aspirations of stakeholder theory into business
practice. As such our approach to the coming analysis has mainly been to assess the
55
possibility of deriving information from the stakeholders in our system, which is also
congruent with how we, in section 8, imagined the purpose of an initial analysis would be.
As we discussed in section 6.3 the process of categorization might be used as a way to
evaluate whether the stakeholders we have included are actively involved in discussions
regarding the domain that has our interest, which in our case is diabetes. Categorization is, as
described, about finding one or more categories for each document to belong to. In the case of
another text mining analysis the documents included in the document collection may have a
much-increased dimensionality in comparison to the documents in our collection. Imagine if
we were analyzing books to find an efficient way for categorization in a library. Each book in
this example would constitute a single document and as such you can imagine that the
dimensionality is quite different from tweets, where the maximum number of character-level
features is set to 140. In other words, having text mining decide on a category for the
documents in our collection might rightly be described as more of a superficial task. Because
of this we decided on categorizing the document collection as a whole in order to make sure
that we had a document collection, which held diabetes and information about this subject.
After performing the previously described preprocessing tasks we were left with a word list
presenting the most occurring terms in the document collection as a whole:
56
Furthermore our TF-IDF analysis of the document collection return the following result:
The word list shows the top occurring word-level feature was ‘http’, the second ‘co’ and the
third ‘diabet’. To provide some context to this ‘http’ and ‘co’ occurs on twitter each time a link
is contained in a tweet,14 which if we refer back to the section 7.1 according to the study by
Finin et al. (2007) was found to constitute the sharing of information. The third word-level
feature sorted by number of occurrences was ‘diabet’ with a total of 1781 occurrences. If we
compare this to the nearest feature in line ‘rt’ (an abbreviation of retweet) then we find that
frequency of ‘diabet’ in our collection is three times that of its nearest competitor.
Furthermore, ‘http’ and ‘co’ clearly occurs much more frequently than any other word-level
feature. Naturally, the links could quite possibly contain anything and the fact that the
occurrence is higher than ‘diabet’ might also show that we encounter linking to other things
than information regarding diabetes. Furthermore, as is shown in the results from the TF-IDF
analysis, ‘http’, ‘co’, and ‘diabet’ were also the word-level features in the document with the
highest average weight. From this we believe it to be a reasonable conclusion that we have a
collection of stakeholders whose main topic of discussion is diabetes. For now, we hold back
judgment on whether the information disseminated can be said to be about diabetes.
We move instead to present our results from the k-means clustering activity also carried out
in RapidMiner. As mentioned in section 6.4 the clustering activity, in our case, revolves
around on the basis of TF-IDF weights calculating a cosine similarity measure for each
document and then with that attempt to as best possible assign each document into a cluster.
In RapidMiner we can decide how many clusters the documents are to be clustered into and
after many failed runs resulting in buffer overload we decided on a number of 40 clusters.14 The reason for the presence of ’co’ instead of e.g. ’com’ is not stemming. It is that the vast majority of links on Twitter is abbreviated to lessen the amount of characters in them. In our dataset this abbreviation looks like this: http://t.co/(random string of characters)
57
RapidMiner has the functionality to present results from a clustering activity in numerous
ways. We found the most clarity of results in the graphical representations of a scatter plot
graph (this will represent our ‘given space’), which in RapidMiner allows us to triangulate
word-level features to find documents containing these. In other words, we could put in e.g.
‘diabet’, ‘thanks’ and ‘happy’, and it would show us documents containing those words and the
relative distance measure between them. Since we were trying to confirm the presence of
documents with reference to information about diabetes we found the most effective way to
have two stable word-level features, namely, ‘diabet’ and ‘http’. The first scatter plot we here
will include shows a collection of documents containing the features ‘diabet’, ‘http’ and ‘co’:
58
Each dot on this graph represents a single document; the further it is placed along the y-axis
the more weight the feature ‘diabet’ has in the document, the further along the x-axis the
more weight the feature ‘http’ has. The color coding which goes from blue to red then
represents the weight of the feature ‘co’, where the navy-blue color means no weight and the
red high weight. This quite clearly shows that we have a very large amount of documents
referring to diabetes as well as a link. When we started looking for more interesting
triangulations using this functionality we moved ‘http’ to the color coding and kept ‘diabet’
along the y-axis. Had we not done this with every result we got looking like this with some
documents being colored to represent the newly included word-level feature, which made it
difficult to get an overview of the documents present in the given collection. In total we found
six additional interesting representations in our dataset two of which we include and discuss
in this section, while the rest can be found in the appendix-4. The first was ‘diabet’, ‘http’ and
‘insulin’:
59
On this graph each dot that is not navy-blue contains the features ‘diabet’, ‘http’ and ‘insulin’.15
We were able to confirm this through manual inspection of the documents on the graph and
we will here bring a few examples of the contents in these documents.
The first is the second rightmost cyan-colored dot, which is a document from the user
‘@sstrumello’ and he tweets:
"Thermalin Diabetes gets $4.5M NIH grant for next-generation insulin
analogue http://t.co/yymFcECb via @MedCityNews" (sstrumelloTweets-
10.txt)16
The link in this tweet took us to a news article explaining the, as described by the user, that a
$4.5M grant had been given to the company Thermalin Diabetes in order for them to develop
a new insulin-related product. The next and last we will include from this scatter plot is from
the user ‘@JoyofDiabetes’ and tweet contained the following:
15 Unfortunately upon importing images of the scatter plots into word the quality of the image fell drastically no matter the quality chosen upon exporting them from RapidMiner. We were not able to fix this issue but we hope they provide enough detail to show the color coding as well as the distribution.16 These text-files we refer to here can be found in the zipped folder ’Data.zip’ in the folder ’Finds’ accompanied with the thesis.
60
"RT @HealthyNews_WR High blood sugar and insulin levels linked to
#heart disease: http://dld.bz/bCD #diabetes #diabetic"
(JoyofDiabetesTweets-128.txt)
This again took us to a news article this time revealing research that had shown a link
between high blood sugar and risk of developing heart disease. In total we found 22 different
tweets from 9 different users, which contained the relevant word-level features. The most
recurring topic was some form of information pertaining to insulin pumps but the contents of
the links varied too much to constitute a significant pattern of discussion in relation to insulin
pumps. The 22 tweets in their full length can be found in the attached data in the folder
‘diabet-http-insulin’.
The second scatter plot we wish to include here is one with the features ‘diabet’, ‘http’ and
‘research’:
Again we bring a couple of examples of the contents in the tweets distributed on this scatter
plot. One user who calls himself ‘@DiabetesRx’ on Twitter had this to say:
61
"Fasting lowers risk of heart disease, diabetes | The Salt Lake Tribune
http://t.co/JrnpQjY via @AddThis-However, Research was SHORT TERM!"
(DiabetesRxTweets-96.txt)
This links again led us to a news site but this time the news cited research pertaining to the
perceived benefits of fasting in relation to diabetes. Another user ‘@DiabetesPower1’ was also
disseminating information regarding research, the tweets indicates this and the link confirms
it:
"CU Researchers Find Cure For Type 1 Diabetes In Mice ¬´ CBS Denver
http://t.co/JfktbGcC” (DiabetesPower1Tweets.txt)
We found 13 tweets from 10 different users, which contained relevant features in relation to
this second scatter plot. Again the contents were too diverse to find a pattern displaying a
specific interest in one area of research but again the presence of word-level features in the
tweets at least displayed a shared interest in diabetes research.
We found many different styles of presentation and different types of content in the links
disseminated by the stakeholders in our pretend CI-system. Generally, it was difficult to zone
in on patterns in the information disseminated other than the inherent feature-patterns. So
we conclude the presentation of our results here and refer to the appendix-4 for the
remaining four scatter plots, the document collections for these can also be found in the
attached dataset in the folder ‘Finds’. The results yielded by the clustering activity showed
that without a doubt the stakeholders in our system are talking and spreading information
about diabetes. We referred to Finin et al. (2007) and focused on links to find information but
the remainder of the documents in the scatter plots was also communicating with reference to
the stable feature ‘diabet’ and the other non-stable feature. As such even though they did not
refer to an external source of information they were tweeting with some reference diabetes
and a subject within the scope of our interest. Generally, the document collection contained a
healthy mix of the user intentions daily chatter, conversations, sharing information and
reporting news. We did however find a tendency, also indicated in the previously portrayed
word list, toward diabetes and some link to an external source of information. However, we
62
were not able to conclusively state whether the information disseminated in the system is of
the needed relevance or quality. Ideally we would have liked to see many accounts tweeting
about something more specific pertaining to diabetes. However, based on the clustering
activity we found at most indications of this. In the coming we discuss our results in relation
to the proposal as a whole and in this we also refer to the correspondence we had with the
manager from Novo Nordisk, and his views on the needed information quality takes part in
this evaluation.
10. Evaluation of results and modelIn section 6 we discussed the basic foundation of text mining and two methods, categorization
and clustering respectively, which we conceived as an answer to the deliverance of valuable
information from stakeholders on social media. We applied such methods in our data analysis
and found that categorization may be able to help us assess whether our preconceived
notions of relevance were consistent with the content of the data we worked on. In our case
relevance was tied to the domain of diabetes and we found the talk and the information
shared to be about diabetes. Clustering delivered to us indications that the stakeholders in our
system were at some point involved, by presence of word-level features in their
communications, in the same issues. However, the results were not significantly consistent so
as to with any certainty claim that the stakeholders were speaking to the exact same issues.
Even upon focusing our analysis on word-level features e.g. ‘diabet’, ‘http’ and ‘insulin’ there
may yet be many different issues to decipher in communications about these. In this section
we wish to discuss our method of analysis in relation to the prospect of bringing such
information into a company. Upon having done this we zoom out and highlight challenges
related to our proposal as a whole.
Categorization and Clustering
Reliance on the results produced by a process of categorization we feel demands some insight
into computational elements of the method. Broadly speaking, categorization holds many
different methods some of which are more complex than others. We chose TF-IDF, an
automated approach, for ease of access and because as we explained in the previous section
each document in our collection could contain a maximum of 140 character-level features. We
63
saw that ‘http’, ‘diabet’ and ‘co’ held the highest weight in our dataset and while as stated this
confirmed our belief that diabetes was the topic of discussion, the way this is calculated is by
presence of ‘diabet’ in the document collection. E.g. we have two tweets construed of word-
level features, one with six “diabetes patients demand better insulin pumps” and another with
four “diabetes tower diabetes squared”. In this example the second tweet will contain the
highest weight even though the content is utterly meaningless. The clustering task also needs
TF-IDF weights to apply the cosine similarity measure and start clustering the documents.
This speaks to the difficulty in bringing language into computation and analysis but this does
not deter us from the belief that text mining will be a valuable tool in this. As is evident in the
results in section 9.4 text mining did actually lead us to meaningful communications on social
media. However, it underlines the importance of carefully assessing what information is to be
derived, what the quality of that information has to be and how we might couple methods to
deliver the best possible solution in the context of a text mining system. We spoke to the
manager at Novo Nordisk about this and asked him questions in relation requirements for
both the technological aspects as well as the quality of information derived. The
correspondence can be found in its entirety in appendix-2.
Bringing information into the business
The answers given in regard to information quality hinged heavily on the fact that Twitter
was the medium for communication and information dissemination. In other words quality of
information was set heavily in the context of Twitter functionality. Interestingly, his answer to
what might constitute information of a quality that would allow him to act on it (communicate
about it) deviated from our perceptions of importance, which might again bespeak the
importance of including the business professional in the conception of such a CI-system. The
following is an excerpt from his answers:
“I would say actionable intelligence is information that provides guidance on the likelihood
that a message will be perceived as valuable based on some evidence (i.e. it is a hot topic
among key influencers or the community), the probability that a message will be
ReTweeted if I post it, and the probability that the community I am targeting will spread
my message. What I would like is to make decisions to Tweet or ReTweet with some
evidence and insight into how the message will be received and promoted.” (Appendix-2)
64
When asked about his view on how many such stakeholders would have to have shown
interest in a subject for it to be classified as a ‘hot topic’ the answer was 1000+ stakeholders.
(Appendix-2) As such we quite clearly see that the information we were able to derive from
stakeholders in our system cannot be of enough quality by sheer lack of numbers.
Additionally, calculating the probability that a message will be retweeted and spread through
the community could possibly be conceived as some calculation on historical data collected in
the given CI-system on Twitter. We see no reason that this should not be a possibility in
practice but such a claim is outside the scope of this thesis. This does however bespeak
another issue with our method, namely, that we lack a clear timeframe for the data analyzed.
In actuality we have no way of assessing the timely relevance of the topics discussed in our
selection of stakeholders. Of course there is no way to ensure that stakeholders selected will
keep on discussing diabetes, however, some measure of relative activity for each user would
perhaps be a fruitful addition. It would seem that all of the issues presented here relates to the
dataset used in the thesis and not as much the methods applied on the data. Therefore, we
allow ourselves to conjecture that if we establish a CI-system with a much larger number of
unique stakeholders communicating about the same subject, we might be better braced to
find information by way of additive aggregation that may aid us in driving decisions. Lastly,
we wish to point out that we are aware that the information when being brought into the
company would need to be presented in an efficient way. In a position like that of the
communications manager it seems unlikely that he would have time to sit and sift through the
links present in the communications. Especially as the number of communications included in
the system increases. As such we end this discussion by noting that for the model to be more
than conceptually applicable most probably some method for summarizing the link-contents
and recognizing links referring to the same information is needed.
The issue of trust and the difficulty of assessing legitimacy
This discussion relates to the demand of how we can ensure the balance between diversity
and expertise. In actuality what we have based our judgments on is an estimate provided by
the manager in the case. However, this grants little guidance for companies outside the scope
of this thesis. In such cases judgments would most likely have to be based on what
information the account reveals about the owner and an analysis of the information
65
disseminated by the account. The following portrays some of the difficulties in relation to
assessing the relative legitimacy of an account on social media, and thereby also the
information disseminated by that account. Leimeister (2010) speaks to this issue in general
terms, when he states the actors in the system may act out and even in some cases may be out
to cause sabotage by e.g. spreading false information or acting rude. (Leimeister, p. 247, 2010)
Manovich (2011) claims that we must be careful reading communications over social media
as authentic:
“Peoples’ posts, tweets, uploaded photographs, comments, and other types of online
participation are not transparent windows into their selves; instead, they are often
carefully curated and systematically managed.” (Manovich, p. 6, 2011)
Dellacoras (2003) reiterates this perspective in his study of online feedback mechanisms,
where he takes issue with what he terms as the volatility of online identities and the ability to
precisely design the distributed message. (Dellacoras, p. 1422, 2003) These issues of course
also affect the legitimacy of our proposal and puts focus on two relevant questions of how we
can trust that stakeholders are who they claim to be, as well as how we might assess the
trustworthiness of the information disseminated. This discussion dates as far back as to
Kozinetz (1998) in one of the first studies of the interaction between people online. (Kozinetz,
1998, n.p.) He, however, also suggests how we might alleviate the problem when he states:
“Over time, with patient observation of any virtual community, with a few key informants
with whom one has built a strong and trusting relationship, and with a deep understanding
of one’s own inner identification as a culture member, a netnographer is likely to be able to
separate the wheat from the chaff, and construct a representation faithful to the
interpretations of bona fide culture members.” (Kozinetz, 1998, n.p.)
If we conceive of the list provided to us by the communications manager at Novo Nordisk as a
list containing stakeholders he perceives to be worthy of trust have we then solved this issue?
Most likely not but with our proposal we are ourselves involved in assessing the relevance
and trustworthiness of an account as well as the information disseminated from that account.
As such regardless of who we include and what information we dare to trust, it will be our
66
choice and as such we carry the fault if things go awry. In addition, if we dare to conjecture
that at least in some cases the given business professional included will be able to, from his
experience, to grant insights into this and as such perhaps alleviate some of the stress
surrounding the issue. We believe it to be true that these perspectives are valid
representations of difficulties on the net but if nothing else, the emergence of social media has
at least made it easier to assess who the person behind the account is. Lastly, in line with this
we find it quite reasonable to not base decisions on e.g. the words in a tweet. Therefore we
note again that for us to be more trusting of the information collected we would most likely
need the ability to effectively assess the content of the links distributed.
Influence and Community
The project of this thesis is as stated previously to provide insight into the possibilities of
taking theoretical aspirations of stakeholder theory and molding them into practice. We
present here an overall evaluation of our model and assess whether we have succeeded in
this. Upon going through contemporary efforts in stakeholder theory and correlating this with
theoretical capabilities of collective intelligence and business intelligence we were able to
conceptualize a model for stakeholder engagement on social media. However, one of these
aspirations were to move from a view of engagements were the firm exclusively decides on
each important aspect of the engagement and to one where an, in theory, unbounded number
of participants decide what is important. We realize that upon reaching a model we have
landed on a perspective, where in one aspect we are back to a traditional view of stakeholder
engagement. However, we believe the model to be able to provide a more meaningful
inclusion of stakeholders than what a mere presence on social media provides. We do this by
allowing stakeholders to be included in a system and let their voices be heard. The most
significant difference lies in the suggestion that the company passively observes the system,
which may aid them in escaping bias-distortions in the decision-making processes
surrounding stakeholder engagement. However, how such a system is used and if the core
principles of including many and letting them decide are adhered to we of course cannot say.
The theory behind our model clearly states the benefits of an unintrusive approach but in
implementation and use it will rely on the company’s actions in relation to this. This issue
seems impossible to escape.
67
Furthermore, our model deliberately stays passive and stakeholder engagement can hardly be
defined as a passive activity. We find two questions to be relevant in relation to this. One of
how stakeholders are to be made aware that they are included in such a system so as to be
able to feel a part of it and feel that they are being heard? The other of how passively listening
in and subsequently communicating about what is heard affects both the outcome of the
interaction and the relationship between the company and the stakeholder?
The first is undoubtedly difficult to answer and will again most likely be handled differently
from one company to another. In section 3 we explained in short Bonabeau’s view that our
human nature tends to lead us to favor information that fits the state of our current beliefs.
This is reiterated by research carried out by Dan Ariely (2010) in which he defined the
presence of the so-called Not-Invented-Here bias that also speaks to the fact that humans have
a clear tendency to favor their own ideas over ideas conceived by others. (Ariely, chap. 4,
2010) If this is so there may be an indication that when we have collected the opinions and
solutions from stakeholders in our system they may be likely to positively perceive the
subsequent communication from a company. However, this will obviously rely on the
individual stakeholder’s ability to recognize her own contribution at the time of subsequent
communication or action we as a company take in relation to that contribution. Another
solution may be to simply ask stakeholders whether they would be interested in being
included in such a system. However, we will refrain from claiming to have an answer to the
best solution in this regard.
The second question is posed because we recognize that in essence our model has the
purpose of bridging the gap between company and stakeholders by allowing them to enter
into decision-making processes. We do so by analyzing communications and from that
deriving intentions of our stakeholders based on the content of these. However, we also
realize that the position we have sat ourselves in could be interpreted as quite the position of
power. We have not discussed and as such we must assume that stakeholders have no way of
explaining the intentionality behind their communications. To provide further explanation of
this issue we refer to Habermas (1985) and the following citation:
68
“…Owing to this linguistic structure, it (communicatively achieved agreement) cannot be
merely induced through outside influence; it has to be accepted or presupposed as valid by
the participants…A communicatively achieved agreement has a rational basis; it cannot be
imposed by either party...Agreement can indeed be objectively obtained by force; but what
comes to pass manifestly through outside influence or the use of violence cannot count
subjectively as agreement. Agreements rests on common convictions.” (Habermas, p. 287,
1985)
What he states here is, in short, that in order for us to truly be able to claim that we have an
understanding of what our stakeholders are saying, we cannot take a position of power from
which we sit and decide upon the intended meaning in a communication. If this is so then we
obviously run into trouble and whether this issue can be alleviated in the context of
stakeholder engagement seems difficult to answer. In short, it would seem to take us back to
face-to-face engagements as the only means to capture the essence of an issue, which is not a
possibility on social media. Additionally, when we are moving in the context of business we
undoubtedly have many pieces that must come together in order for a complete puzzle to
emerge. It may seem that in order for us to bring our model into practice we cannot capture
all of them and as such it will be up to the company in question whether this issue makes the
model unusable.
11. ConclusionWe have come to the end of our attempts to find a model for stakeholder engagement on
social media. We found that social media most likely does not accommodate traditional forms
of engagement and in that difficulty found inspiration to apply other perspectives on
stakeholder engagement. These were partially found in the context of the logic of community
but primary inspirations emerged from the theory of collective intelligence. This field
provided support for the perspective that stakeholder engagement can be taken to social
media, and can gain valuable insights from doing so. We moved forward in an attempt to
consolidate these theoretical perspectives with practical applicability, and in doing so we
found both possibilities as well as challenges. Much can be said to favor the use of text mining
in this regard and we strongly believe that as this field evolves with it will the practicality and
value of our proposed model. Additionally, we also believe that as more research provides
69
insight into the technological foundation of, as well as users of, social technologies the model
may gain in strength. We admit to the many inherent challenges coupled with deriving any
sorts of information from social media but also believe that our model may be applied given
that a company has sight for these. In the end the challenges do not present the impossibility
of taking stakeholder engagement to social media. In our view it states simply that there may
be no replacement for the values derived from face-to-face engagements, but it would not
seemingly mean that there is no value to gain from taking stakeholder engagement to social
media. It seems to depend highly upon the purpose. If the purpose is to reach more
stakeholders and allow these to form some sort of relation to the company, then engagement
on social media suddenly seems an invaluable tool. Furthermore, if the purpose is the
spreading of a company’s image then the capabilities of a medium like Twitter seem
indisputable. In turn, if the purpose is getting to the bottom of an issue posed by the fact that a
company’s actions affect an external environment then real-world meetings, conferences and
the like will most likely still be the best approach, since in such a case it may well be essential
to have a complete understanding of the views held by stakeholders. However, we are
satisfied that we have shown the possibility of taking stakeholder engagement to social media.
12. Bibliography
12.1. Articles (Order of Appearance)
1: Porter, M.E. (1979) How Competitive Forces Shape Strategy, Harvard Business Review
2: Bennett, W.L. (2003), New Media Power: The Internet and Global Activism, Oxford: Rowman & Littlefield
3: Castelló, I., Etter, M. & Morsing, M. (Forthcoming), Why Stakeholder Engagement will not be Tweeted: Logic and Conditions of Authority Corset, Paper Presented at the Academy of Management 2012, Boston, USA.
4: Scherer, A.G., Palazzo, G. (2011) The New Political Role of Business in a Globalized World: A Review of a New Perspective on CSR and its Implications for the Firm, Governance, and Democracy, Journal of Management Studies, 48(4): 889-931.
5: Bughin, J., Byers, A. H., Chui, M. (2011), How social technologies are extending the organization, McKinsey Global Institutehttp://www.mckinseyquarterly.com/How_social_technologies_are_extending_the_organization_2888 (Seen on 24/7-2012)
70
6: Hutton, G., Fosdick, M. (2011), The Globalization of Social Media - Consumer Relationships with Brands Evolve in the Digital Space, Journal of Advertising Research
7: Nonaka, I. (1994), A Dynamic Theory of Organizational Knowledge Creation, Organization Science, Vol. 5, No. 1, February 1994
8: AccountAbility (2011), AA1000 Stakeholder Engagement Standard 2011http://www.accountability.org/about-us/publications/index.html (Seen on 24/7-2012)
9: Dellacoras, C. (2003), The Digitization of Word of Mouth: Promise and Challenges of Online Feedback Mechanisms, Management Science INFORMS Vol. 49, No. 10, pp. 1407–1424
10: Hypatia Research LLC (2011), Benchmarking Social Community Investments & ROI: Best Practices Vendor Selection Guide, ©Hypatia Research LLC
11: Leimeister, J. M. (2010), Collective Intelligence, Gabler Verlag
12: Malone, T. W., Laubacher, R., Dellacoras, C. (2009), Harnessing Crowds: Mapping the Genome of Collective Intelligence, MIT Center for Collective Intelligence, Working Paper No. 2009-001
13: Bonabeau, E. (2009), Decisions 2.0: The power of collective intelligence, MIT Sloan Management Review, 50(2), 45-52
14: Barnes, N. G., Andonian, J. (2011), The 2011 Fortune 500 and Social Media Adoption: Have America's Largest Companies Reached a Social Media Plateau?, University of Massachusetts Dartmouthhttp://www.umassd.edu/cmr/studiesandresearch/2011fortune500/ (Seen on 24/7-2012)
15: Luhn, H. P. (1958), A Business Intelligence System, IBM Journal of Research and Development October 1958 (p. 314-319)
16: Lovejoy, K., Waters, R. D., Saxton, G. D. (2012), Engaging Stakeholders through Twitter: How nonprofit organizations are getting more out of 140 characters or less, Public Relations Review 38 (2012) 313–318
17: Kwak, H., Lee, C., Park, H., Moon, S. (2010), What is Twitter, a Social Network or a News Media?, Proceedings of the 19th International World Wide Web (WWW) Conference, April 26-30, 2010, Raleigh NC (USA)
18: Finin, T., Java, A., Song, X., Tseng, B. (2007), Why We Twitter: Understanding Microblogging Usage and Communities, Joint 9th WEBKDD and 1st SNA-KDD Workshop ’07, August 12, 2007
19: Imhoff, C., White, C. (2011), Self-Service Business Intelligence - Empowering Users to Generate Insights, TWDI Best Practices Report - Third Quarter 2011
20: Cohen, K. B., Hunter, L. (2008), Getting Started in Text Mining, PLoS Comput Biol 4(1): e20.
71
doi:10.1371/journal.pcbi.0040020
21: Food and Drug Administration (2011), Draft Guidance for Industry on Responding to Unsolicited Requests for Off-Label Information About Prescription Drugs and Medical Devices, Food and Drug Administrationhttps://www.federalregister.gov/articles/2011/12/30/2011-33550/draft-guidance-for-industry-on-responding-to-unsolicited-requests-for-off-label-information-about (Seen on 24/7-2012)
22: Manovich, L. (2011), Trending: The Promises and Challenges of Big Social Datahttp://manovich.net/articles/ (Seen on 24/7-2012)
23: Kozinetz, R. V. (1998), ON NETNOGRAPHY: INITIAL REFLECTIONS ON CONSUMER RESEARCH INVESTIGATIONS OF CYBERCULTURE, Advances in Consumer Research Volume 25http://www.acrwebsite.org/volumes/display.asp?id=8180 (Seen on 24/7-2012)
12.2. Books (Order of Appearance)
1: Freeman, R. E. (1984), Strategic Management: A stakeholder approach, Boston: Pitman, ISBN: 0-273-01913-9
2: Li, C., Bernoff, J. (2008), Groundswell: Winning in a World Transformed by Social Technologies, Harvard Business School Press, ISBN: 1422125009
3: Collins, H., Evans, R. (2007), Rethinking Expertise, The University of Chicago Press, ISBN: 0-226-11360-4
4: Hacking, I. (1999), The Social Construction of What?, Harvard University Press, ISBN 0-674-81200-X
5: Berry, M. J. A., Linoff, G. S. (2004), Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management, John Wiley & Sons, New York, ISBN: 0470650931
6: Feldman, R., Sanger, J. (2006), The Text Mining Handbook - Advanced Approaches in Analyzing Unstructured Data, Cambridge University Press, ISBN: 0-521-83657-3
7: Ariely, D. (2010), The Upside of Irrationality: The Unexpected Benefits of Defying Logic at Work and at Home, Harper, ISBN: 0061995037
8: Habermas, J. (1985), The Theory of Communicative Action, Beacon Press, ISBN: 0807015075
12.3. Links (Order of Appearance)
1: AccountAbility Website:www.AccountAbility.orgSeen on 24/7-2012
72
2: Google Example:https://plus.google.com/u/0/100585555255542998765/posts/h7LNZ8zUAdFSeen on 24/7-2012
3: Wefollow Website 1:www.wefollow.comSeen on 24/7-2012
4: Wefollow Website 2:http://wefollow.com/twitter/ngoSeen on 24/7-2012
5: Wefollow Website 3:http://wefollow.com/twitter/advocacySeen on 24/7-2012
6: Languagemonitor Website:http://www.languagemonitor.com/global-english/number-of-words-in-the-english-language-1008879/Seen on 24/7-2012
7: Twopcharts Website:http://twopcharts.com/twitter500millionSeen on 24/7-2012
8: TDWI Website:www.tdwi.orgSeen on 24/7-2012
9: Novo Nordisk 1: http://www.novonordisk.com/about_us/about_novo_nordisk/introduction.aspSeen on 24/7-2012
10: Novo Nordisk 2:http://www.novonordisk.com/about_us/history/milestones_in_nn_history.aspSeen on 24/7-2012
11: Novo Nordisk 3:http://www.novonordisk.com/sustainability/sustainability-approach/stakeholder-engagement.aspSeen on 24/7-2012
12: Novo Nordisk 4:http://annualreport2011.novonordisk.com/stakeholders-and-reporting/stakeholders/memberships.aspx
73
Seen on 24/7-2012
13: Novo Nordisk 5:http://annualreport2011.novonordisk.com/stakeholders-and-reporting/stakeholders/partnerships.aspxSeen on 24/7-2012
14: Pan American Health Organization:https://twitter.com/ncds_pahohttp://new.paho.org/hq/Seen on 24/7-2012
15: Glu / T1D Exchange:https://twitter.com/mygluhttp://t1dexchange.orgSeen on 24/7-2012
16: Peg Abernathy Group:https://twitter.com/pegabernathyhttp://pegabernathygroup.comSeen on 24/7-2012
17: AmandaMichelleManait:https://twitter.com/sweetliferunnerhttp://thesweetliferunner.blogspot.comSeen on 24/7-2012
18: Twitter Developer Site:https://dev.twitter.com/Seen on 24/7-2012
19: Tweet from ’@sstrumello’:http://t.co/yymFcECb Seen on 24/7-2012
19: Tweet from ‘@JoyofDiabetes’:http://dld.bz/bCDSeen on 25/7-2012
20: Tweet from ‘@DiabetesRx’:http://t.co/JrnpQjY
74
Seen on 25/7-2012
21: Tweet from ‘@DiabetesPower1’:http://t.co/JfktbGcC Seen on 25/7-2012
12.4. Programs Used (Order of Appearance)
1: The R Project for Statistical Computinghttp://www.r-project.org/
2: RapidMinerhttp://rapid-i.com/content/view/181/196/Rapid-I GmbH
75