Production knowledge imass-olhao_24-4-2014_en
-
Upload
juan-d-borrero -
Category
Presentations & Public Speaking
-
view
82 -
download
0
description
Transcript of Production knowledge imass-olhao_24-4-2014_en
11
Production of new knowledge through automated Big Data extraction from Social Bookmarking Systems and
analyzing of the resulting network: The case of the network of the globalization of agriculture in Delicious
1st IMASS Conference,Methods and Analyses in Social Sciences,
23-24 April 2014, Olhão, Portugal, http://imass.ca/imass/conference
University of Huelva, Spain
Juan D. Borrero, [email protected]
Estrella Gualda, [email protected]
José Carpio, [email protected]
22
Table of ContentsIntroductionWeb 2.0 and Social Tagging SystemsSocial tagging and folksonomyFolksonmy and collective tag structure
Context and Topic of StudyDeliciousTagging on DeliciousTag structure, Delicious and social networksGlobalization of Agriculture
Objectives
MethodologyData collectionAnalysis
ResultsSocial network statistics from Delicious datasetNetwork centralizationTop authoritative nodesVisualization UserURL netCohesion and substructuresTag clouds
DiscussionCentrality and powerCentral tags
ConclusionsFurther researchPossible applications
Production of new knowledge through automated Big Data extraction from Social Bookmarking Systems and analyzing of the resulting network: The case of the network of the globalization of agriculture in Delicious
33
FrameworkWeb 2.0 and Social Tagging Systems
Many users add metadata in the form of TAGS
Resulting collective tag structure
Source: http://www.idonato.com/2009/05/27/fun-with-tag-clouds/
Source: http://blog.hubspot.com/blog/tabid/6307/bid/7372/9-Reasons-Why-Your-Social-Media-Strategy-Isn-t-Working.aspx/
Source: http://bvdt.tuxic.nl/index.php/the-wisdom-of-the-crowds-in-the-audiovisual-archive-domain/
Web 2.0 has made tagging possible for a wide range of people to produce, share, interact with, and organize data
44
FrameworkSocial Tagging
A user enjoys a resource and, according to his or her mental model, identifies those terms that best describe the information conveyed by that resource
is the activity in the Web 2.0 of annotating digital resources with keywords - tags (Golder and Huberman, 2006; Trant, 2009).
Social Tagging
Tagging
55
Source: http://scot-project.net//
Social tags produced by users are usually regarded as high quality descriptors of web page topics and a good indicator of web users’ interests and preferences.
This process also allows the formation of a socially constructed classification schema called folksonomy…
(Vander Wal, 2004)
FrameworkSocial tagging and folksonomy
66
… that emerges via a Bottom-up process, and… …the tags of many different users
are aggregated and the resulting collective tag structure– such as tag cloud – depicts the collective knowledge of Web users (Cress et al., 2012)
Source: http://blog.cimmyt.org/?p=6052
Source: http://scot-project.net//
FrameworkFolksonomy and collective tag structure
7
Context and Topic of StudyContext
Delicious is a free social bookmarking web service for storing, sharing and discovering web bookmarks•Content is created, annotated and viewed by its users. •Non-hierarchical classification system: users can tag each of their bookmarks on the Delicious website, and provides knowledge about the URL marked •Collective nature:
• view bookmarks added or annotated by other users.
• organize existing tags into groups (tag bundles).
Source: www.delicious.com
88
Context and Topic of StudyTagging on Delicious
People can classify the huge amount of information at her/his disposal in the form of tags.
Keywords freely chosen by users employed to annotate various types of digital content, or suggested by Delicious
Source: www.delicious.com
99
Context and Topic of StudyTag structure, Delicious and social networks
We can see Delicious as a tripartite network whose representation can be described by two bipartite networks, for user→tag and user→URL relations, and where we can also see indirected links (e.g. between users - straight lines), that represent a unipartite network
The structure of Social tagging websites can be viewed as a network of three different node types: the U users, the R resources (web sites –URLs) and the T tags that the U users deploy to tag the R websites.
A Tripartite Network made of three users U=(u,u’,u’’), four tags T=(t,t’,t’’,t’’’) and three URLs (url,url’,url’’)
In Delicious, an annotation is mainly composed of three interconnected components (Smith, 2008):
1. Link to the resource (website)2. One or more tags3. User who makes the annotation
1010
GlobalizationImplies large market as result of the reduction transaction costs of international
trade
Globalization of agriculture- trade (foods, goods)- prices (food, goods)
- food consumption (bulk products versus processed products)- R&D
- rules and laws (subsidies, WTO related to poverty)
implications
Asymmetries
effects
Web 2.0
Discussion/diffusion
Context and Topic of StudyTopic
11
ObjectivesTo discover some type of structuration around the issue of the globalization of agriculture on Delicious
Extracting automatically data from Delicious social bookmarking website, and using Social Network Analysis (SNA),1.what types of URLs around our topic have been recommended via collaborative tagging in Delicious,2.what types of users label URLs around this topic,3.whether there is some type of structuration and hierarchyto be discovered in the network of the globalisation of agriculture (centrality, substructures, etc.), and4.what types of tags are been used to specifically label (and thus define and qualify) the URLs on the globalization of agriculture that they recommend through Delicious.
1212
MethodologyData collection / Procedure
(A) Start point. Identify the search attributes. Authoritative source as baseline to find keywords connected to the idea of ‘globalization of agriculture’
– Wikipedia definition of “critics of globalization (popular, high reputation)
– Other starts points (future)– Selected (manually= researcher expertise) main concepts
from the website homepages, tag clouds or topics. – Identified the 9 seed keywords (globalization +
agriculture, development, activism, trade, poverty, food, organic, GMO)
– Other concepts rejected
(B) Perl program web-crawling was made to gather the sample of users, URLs and tags for
- globalization+agriculture;globalization+development; globalization+activism; globalization+poverty; globalization+food; globalization+organic; globalization+GMO
- From 22 April 2011 to 21 May 2011
(C) Results- 61,043 taggings that involved 3,668 users on 4,913 URLs
and 5,724 tags.
(D) Program in Haskell to reduce the amount of data by cutting the URLs and using key words, including the identification of synonyms, the elimination of words with capital letters and derivatives such as words in plural.
1313
MethodologyData collection / Final dataset
2,148 URLs 4,776 tags 3,668 users
14
MethodologyAnalysis
With the help of the Software Pajek, we analyzed these social networks,
first studying its properties (quantitative), and
second visualizing the nets (qualitative) through force-directed graph layouts and tag clouds.
15
Network Type Relation # of nodes
# of links Density Av. Degree
User URL Bipartite Directed 5,816 7,200 0.09% 2.476
User– User Unipartite Undirected 3,668 134,833 1.97% 73.5187
URL – URL Unipartite Undirected 2,148 20,558 0.84% 19.141
Tag – Tag Unipartite Undirected 4,776 539,105 47.06% 225.756
A bipartite network with a directed relation is a network created through two different types of nodes (in this case “users” and “URLs”), that are directly connected by a relationship or link (in this work: user recommend URLs, or user tag URLs) (2-mode network).
A unipartite network with an undirected relation is a network created after a transformation of the original matrix into a user-user, tag-tag, or URL-URL matrix. In these cases there is an undirected relation through a vertice (node) that connect both (1-mode network). For instance, a user-user matrix is built here through the URLs that connect users, because different people can tag or recommend the same URL.
ResultsSocial Network Statistics from Delicious dataset
Tag-tag network is much denser than the others: Peopleusually use common tags
1616
The network is highly centralized within a few nodes. The power law is a defining characteristic of large-scale networks such as the Web (e.g.
Barabási and Albert, 1999), which implies a high degree of network centralization
How come that a few users and websites are better connected than the majority?
2,148 URLs arranged in rank order by number of inbound links (URL’s Indegree: Sum of total inbound links)
3,668 users arranged in rank order by number of outbound links (User’s Outdegree: Sum of total outbound links)
ResultsNetwork centralizationHyperlink Network (userURL). The degree of variability in URL and user centrality scores according to indegree and outdegree.
Only 10 URLs from 2,148 (0.47%) account for 17.97% of links.1% URLs (22 URLs from 2,148) account for 26.50% of links.
Only 10 users from 3,668 (0.27%) account for 5.25% of links.1% users (37 users from 3,668) account for 12.01% of links.
17
ResultsTop authoritative nodes in the Delicious “Globalization of agriculture” network
Indegree OutdegreeValue URL Description Value User Description
1 259 www.nytimes.com On line newspaper 71 /garrygoldenhttp://www.garrygold
en.net/Professional futurist
2 170 www.independent.co.uk On line newspaper 51 /mritiunjoy
Mritiunjoy MohantyProfessor, Economics
Indian Institute of Management Calcutta
3 155 www.naomiklein.org Activist media site 44 /emmarlyb
4 144 www.news.bbc.co.uk/ On line newspaper 42 /woldpublicopinionhttp://www.worldpubl
icopinion.org/Activist media site
5 124 www.globalresearch.ca Activist media site 33 /criticalspatialpractic
eNicholas Brown
Artist
6 95 www.spiegel.de/ On line newspaper 30 /pagolnari
Dr. Kathy Ward pagol Nari
Professor, Carbondale, EEUU
Feminist bloggerhttp://pagolnari.blogs
pot.com.es/
7 94 www.guardian.co.uk/ On line newspaper 28 /bfunk
Bryan Finokihttp://subtopia.blogsp
ot.com.es/Author Subtopia
(Blog), Senior Editor, Archinect, and
Adjunct, Woodbury University School of
Architecture, San Diego
8 94 www.economist.com/ On line newspaper 28 /chris.h.p9 87 www.corpwatch.org Activist media site 27 /maitreya11 Carlos Puentes
10 72 www.theatlantic.com Online magazine 24 /matttbastardMatthew Elliot
http://bastardlogic.wordpress.com/
10 most centralized websites.Six of them were media-based
(online newspapers such as The New York Times, The Independent, BBC, Spiegel, The Guardian, and The Economist) and three wer activist (Naomi Klein, Global Research, and Corpwatch)
Identification of Users with a greater degree of centrality.
Mritiunjoy user plays a very important role in the network.
Mritiunjoy joined to Delicious on 12 march, 2007.
Mritiunjoy Mohanty - is a professor at the Indian Institute of Management Calcutta, and his Research Interests are Political Economy of growth and development.
18
ResultsVisualization UserURL network. 5,816 nodesEnergy-Frutcherman (Pajek) Map. Color: Cores
19
Cluster K=1..5
(subnet)
Nodes Frequence(%)
CumFreq(nodes)
CumFreq (%)
1 4,445 76.43% 4,445 76.43%
2 792 13.62% 5,237 90.04%
3 387 6.65% 5,624 96.70%
4 147 2.53% 5,771 99.23%
5 45 0.77% 5,816 100.00%
Sum 5,816 100.00%
k-core: A k-core of a graph G is a maximal connected sub-graph of G in which all vertices have a degree of at least k.
ResultsCohesion and substructures
20
2-core 792 vertices. Density=0.26% 3-core 387 vertices. Density=1.16%
4-core 147 vertices. Density=5.16% 5-core 45 vertices. Density=34.77%
ResultsCohesion and substructures
We found that the mass media websites belong to the 5-core subgroup, as the main activists websites are included in the 4-core.
21
Gráfico 9. Nube de etiquetas para la Red de Globalización de la Agricultura identificada en
Delicious (Principales etiquetas de la red)
Main themes
ResultsTagCloud: identifying the topical themes in the unipartite tagtag networkSize proportional to the weights - the top 50 highest weighted tags.Produced by Wordle
22
Discussion
• Because tagging is a bottom up process, the constitution of a global network in this way suggests a very old sociological dilemma concerning the constitution of society.
– Do individuals (or micro entities) came first or are communities and societies present from the very beginning?
– Does human agency determine social structures or is an individual's behavior determined by social structures?
• We found the bottom-up social tagging process is crucial, but it could not exist without Web 2.0 technology.
• What it is especially interesting for us here is whether these questions could be transferred to understanding the society that lives around the process of social tagging inside Web 2.0 as we exemplified in this article by the social bookmarking site Delicious.
• The approach of this study acknowledges the reciprocity and influence of the social and semantic characteristics. However, the user is who ultimately decides if one URL have to be included or not and whether he or she is going to write new tags. Thus, the constitution of the globalization of agriculture network is probably a mixture, as it is the society.
23
DiscussionCentrality and Power
• Very inequal distribution of power of the URLs cited by users in the topic globalization of agriculture.– Important accumulation of inbound links.
• Mass media and activists in this network of globalization of agriculture in Delicious surpassed by far other resources tagged.
• Identification of key collective actors (represented here through URLs as unknow users as well) allow a better comprehension of leadership, influence processes, and power-related structures.
• For social practitioners, is a good way to identify key informants in a community through which to disseminate useful and important information.
ADVANTAGES OF THIS TYPE OF KNOWLEDGEFOR RESEARCHING AND INTERVENING
24
DiscussionCentral Tags: Users producing Tags
• Tags: suggested by the website or added new tags in a creative way
• Each user could label a URL with an unlimited number of tags.
• Tag Cloud: visual approach to the language used by users and to identify discourses.
• From a total of 4,776 tags, two words were the main ones.
• Most frequently tags used were the words: ‘economics’ and ‘politics’.
25
ConclusionsAchieved goals
• A first step towards the development of empirical techniques capable of automatically differentiating actors who occupy a more central position.
• First stone in the difficult process of understanding and discovering patterns in the process that characterize users tagging URLs for collaborative reasons.
• Utility for discovering latent patterns = provide effective recommendations to different actors.
• Understanding the community of more than a thousand links.
• Retrieval and analysis of information was complex but easy = working in interdisciplinary teams.
26
FOCUS ON Users•Identification of key actors that disseminate and share URLs, as the previously cited Mritiunjoy
– Determine from where key elements that structure the network emerge. •Why is ‘that’ so important actor in the network of globalization of agriculture?
– Key actors in this type of network could configure and reconfigure the evolution of the network (TIME), and structure and even manipulate the type of interchange of resources in Delicious or in similar bookmarking sites.
•Use of some tags at classifying URLs and the distinction among users in the way they use some words/tags
– Distinction between scientifics / other professionals or users? – Identify users with the same patterns at tagging, or URLs that were similarly
labelled: study structural equivalences•Is it by chance? Are most prominent actors in a type of website like Delicious corresponding to a profile of very active and participative people? Do they usually work (or have as hobby) in this area and this is why accumulate and tag so many URLs in Delicious?•Go in-depth about users (if possible).
Further research
27
Further research
FOCUS ON Tags• Reasons of the prominence of the two first tags around the globalization of
agriculture.– Influence of first tags on the following ones.
• Role of innovation and creativity at tagging• Are some of the 4,776 found tags used in a interchangeable basis?
– Why sometimes the word economics is used sometimes, and why other times is used economy?
– Are they used in the same way at classifying the URLs?– Evolution and usage of language around an issue along time.– Ideological and terminological approaches in the national/ international arena.
• Other possible studies based in retrieving the pages and making content analysis.
• Why some labels are present/ absent? • Are there “traditions”/ “fashions” at tagging in the Web 2.0?
OTHERS• To compare results from Delicious and from other social bookmarking sites.• Longitudinal analysis.• And other explorations, other starting points, other indicators, etc.
28
Possible Applications
• Producing and “manipulating” public opinion (at recommending and describing websites) and markets– If we know the interests of users belonging to a network, we could also
be able to make recommendations• Important for researchers interested in formulating strategies
for intervention and mobilization, but also practitioners, and firms could make use of this.
• The discovery of the central elements in a network (users and URLs), at the same time that the tags used by users could be key to design future strategies for diffusion (spreading taglines, causes, rumours, etc.
• Implementation of Information Retrieval and Recommender Systems techniques in social commerce and social media contexts.
• Applications in advertising, e-commerce, mobilizing, security…s• …