Extracting and capturing knowledge found in social software tools
Extracting Social Network Data and Multimedia Communications from Social Media Platforms for...
-
Upload
shalin-hai-jew -
Category
Internet
-
view
887 -
download
0
description
Transcript of Extracting Social Network Data and Multimedia Communications from Social Media Platforms for...
EXTRACTING SOCIAL NETWORK DATA AND MULTIMEDIA COMMUNICATIONS
FROM SOCIAL MEDIA PLATFORMS FOR ANALYSIS AND DECISION-MAKING
Shalin Hai-Jew 2014 Big XII Teaching and Learning Conference Oklahoma State University Stillwater, Oklahoma Aug. 4 – 5, 2014
PRESENTATION OVERVIEW Electronic Commons
Academic Environment Analysis and Decision-making (from E-SNA)
Examples of Social Network Data Graphs
Electronic Social Network Analysis (E-SNA) / Social Physics
Social Media Platform Types Microblogging: Twitter
Content-Based Social Platforms: YouTube, Flickr
Web Networks
NodeXL (Network Overview, Discovery and Exploration for Excel)
Review
Tools
2
PRESENTATION OVERVIEW
3
WELCOMES AND SELF-INTROS
Please introduce yourself as your digital alter-ego. What does your electronic alter-ego look like on, say, Twitter? Facebook? Flickr? YouTube? How accurate is your digital doppelganger to your real-world self? Why?
If analyst(s) were to conduct an “inference attack” on your electronic presence, what could they find out? What could they infer in terms of data leakage and unintended communications (latent information)?
If electronic presence is a kind of social performance, how is it best performed, and why?
What are your experiences with social media platforms? Which do you prefer, and why? Have your preferences changed over time?
What would you like to learn about electronic social network analysis?
4
THE CONTEXT To provide a rationale for the
use of electronic social network
analysis to benefit the
(teaching and learning, and
other) work of universities
5
Note: This presentation was designed to
introduce some basic electronic social
network analysis capabilities, not teach
the audience directly how to do the
work, which is beyond the purview of the
presentation.
THE ELECTRONIC COMMONS
A “chokepoint” for social issues as a commons A way to reach many technologically and socially
A way to trigger mass actions (attitudes, beliefs, actions), potentially in a viral or cascading way…as an influence agent
A fantasy space where “egos” may assume audiences (that may be non-existent)
A fantasy space where “egos” may assume non-audiences (the assumption of narrow-casting) when it may be broadcasting (unintended audiences along with the intended ones)
Re-creation of social power structures from the real-world into the virtual In-group and out-groups
Social performances, posing
Social codes and meanings
Mixed interests and motives
Low cost of indulging curiosities, particularly in an automated and scalable way
6
THE ELECTRONIC COMMONS (CONT.)
Certain individuals (demographics) in certain social media platforms
Limited big data sharing (value to the data and the identities)
Application programming interfaces (APIs) to access shadow databases
Importance of maintaining trust with clients
Private accounts (vs. public ones)
7
AFFORDANCES AND ENABLEMENTS FOR INSTITUTIONS OF HIGHER EDUCATION
What are ways that universities have benefitted from the Web? Social media? How can universities continue building on these affordances? What innovations can people use to build on these effects?
What are some ways that universities can harness electronic social network analysis (e-SNA) for their various professional / formal and professional / informal objectives?
8
ACADEMIC ENVIRONMENT ANALYSIS AND DECISION-MAKING (FROM E-SNA)
What is the social media presence of the university?
Who are its closest partners in terms of exchanging messages or sharing social media contents?
What are the contents of the messages? What are the main expressed sentiments?
If the university is considering partnering with an organization, what may be learned about this organization based on its social media presence?
Who are the most active participants in a #hashtag conversation about some aspect of the university? Who is the “mayor of the hashtag” (per Marc A. Smith’s term)? Why?
What conversations are occurring around the events being hosted on or around campus?
9
ACADEMIC ENVIRONMENT ANALYSIS AND DECISION-MAKING (FROM E-SNA) (CONT.)
If there is a controversial or trending issue, what are the main sentiments being expressed? Who and which ad hoc groups are expressing what sentiments? How may the university take part constructively?
If a flash mob action is being planned around campus, how can campus administrators and law enforcement personnel know about what is happening?
If there is a university-related issue that may be inspired, organized, and maintained using social media, how can universities harness social media to constructive ends?
Is there mis-use of the university name and brand? Are there fraudulently created social media accounts linked to the university? (After de-aliasing, who is actually behind such accounts?) How can social media platform information be used to geolocate events to physical spaces, and aliases to actual people?
10
ACADEMIC ENVIRONMENT ANALYSIS AND DECISION-MAKING (FROM E-SNA) (CONT.)
What sorts of images and video are being shared (that are associated with the university) on microblogging sites? On content sharing sites?
In terms of digital content tagging, what are the most common words linked to the university (or its student groups, colleges, public figures, and other associated groups and individuals)?
If there is a desire to change public perceptions, how may social media platforms be used constructively? What are the ethical rules of engagement?
How may a university maintain relationships with its various constituencies through social media? Its political partners? Its corporate partners? Its alumni? Its donors? Its current learners? Its current learners’ families? And then, further, how can e-SNA be used to maintain understandings of these interchanges and interrelationships?
11
SOME SAMPLES OF SOCIAL NETWORK DATA GRAPHS
To pique your interest
12
GRAPH 1A: A #HASHTAG CONVERSATION ON TWITTER (FLU)
13
Note: Please click on the various
graphs to link to them on the
NodeXL Graph Gallery. Datasets
may be downloaded there for many
of these data extractions.
The data structures can be depicted
in a variety of ways based on a
number of layout algorithms.
GRAPH 1B: A TWITTER #HASHTAG CONVERSATION (#BRAG)
14
GRAPH 2: AN #EVENTGRAPH ON TWITTER (MERLOT)
15
GRAPH 3: KEYWORD SEARCH ON TWITTER (MOSUL)
16
GRAPH 4A: USER NETWORK ON TWITTER (FIFAWORLDCUP)
17
GRAPH 4B: USER NETWORKS OF THOSE WHO TWEETED “ELONMUSK” ON TWITTER
18
GRAPH 4C: USER NETWORK ON TWITTER (OKSTATENEWS)
19
GRAPH 5: LIST NETWORK ON TWITTER (WORLD LEADERS)
20
GRAPH 6: YOUTUBE USER NETWORK (RIHANNA)
21
GRAPH 7: YOUTUBE VIDEO NETWORK (CAT)
22
GRAPH 8: RELATED TAGS NETWORK ON FLICKR (SURVIVAL)
23
GRAPH 9: USER NETWORK ON FLICKR (NERDBOT)
24
GRAPH 10: WEB NETWORKS / WIKIS / BLOGS (NODEXL.CODEPLEX.COM)
25
A NOTE ABOUT WEB NETWORK GRAPHS
Third-party VOSON (Virtual Observatory for the Study of Online Networks) tool out of Australia National University (with an add-in to NodeXL)
Maltego Tungsten
26
(E-) SOCIAL NETWORK ANALYSIS AND SOCIAL PHYSICS
To summarize some of the
basic concepts of social
network analysis as applied to
electronic spaces
27
28
“SOCIAL PHYSICS”
Identifying the latent “laws” of human interactions with each other at macro and micro levels
Laws of affiliation and association (over time): homophily, heterophily
Laws of attraction and aversion
Laws of human patterning socially (and others)
Laws of human uses of physical spaces
Laws of systemic change
Laws of social frictions and large-scale combat
29
STATISTICAL MEASURES
Global Network Measures
Betweenness centrality: Total number of shortest paths or walks for each pair of dyadic notes (info moves between the shortest paths and closest ties), how much of a bridge a node is for network connectivity
Closeness centrality: Geodesic path distance between a node and every other node (farness as sum of all distances to all other nodes; closeness as inverse of farness)
Node-Level (Local) Measures
Degree centrality: In-degree and out-degree (relative popularity)
Clustering coefficient: Embeddedness of single nodes in cliques or ego neighborhoods with its alters
30
STATISTICAL MEASURES (CONT.)
Global Network Measures
Eigenvector centrality (diversity): Relative distances between a node and every other node and those connected to higher-value or popular nodes resulting in a higher value (values between 0 and 1) as a measure of relative influence
Clustering coefficient: Aggregation of multiple nodes based on similarity (like co-occurrence) or connectivity, and expressed as proximity or closeness visually; may be a measure of transitivity
Motif Structures
Dyads, triads, and other structured sub-groupings Local and experiential for the nodes in terms of
structured connections
May (fractals) / may not be reflective of the overall structure
Global motif censuses (counts of occurrences of various types of motif structures in a whole network)
Structural holes as indicators of potential openings for nodes and links (to build resilience)
31
STRUCTURE MINING
Structure of social relationships as an indicator of…
Type of social organization
An embedded power structure
An expression of interdependent and intermixed personalities
Network diffusion of information, power, and other transmissible phenomena
Geodesic structures and distances and paths
Static slice-in-time representations but actual dynamical (changing) realities
(“A Brief Overview of Social Network Analysis”)
32
NODES AND LINKS (IN TERMS OF SOCIAL MEDIA PLATFORMS)
Entities
Individuals, organizations, governments, non-profits, political groups, and others
People, robots, and cyborgs
Relationships
Follower, following
Tweets, re-tweets, replies-to, mentions
Comments on videos and response videos
Co-occurrence of related tags networks
33
ON TWITTER To give a sense of the various
network graphs possible from
the Twitter microblogging site
(with multimedia scraping)
34
ABOUT TWITTER
255 million monthly active users
500 million Tweets (140-character microblogging messages) a day
Nearly 80% of active users on mobile
77% of accounts outside U.S.
Support for over 35 languages
Vine (looping video sharing on mobile) with more than 40 million users
Verified accounts
[Twitter created by a four-man team in 2006 and incorporated in 2007 (About Twitter FactSheet)]
35
TYPES OF INFORMATION AVAILABLE
#Hashtag conversations (tagged conversations)
#Hashtag eventgraphs (event-based)
Keyword networks (multi-topic)
User networks (ego-based)
List networks (topic-based)
36
SOME E-SNA CHALLENGES WITH THIS SOCIAL MEDIA PLATFORM
Word disambiguation
1/100 with geolocation data (which is often noisy data)
Rate-limiting
Goes back a week only (no deep historical searches without paying for a third-party company with access)
Enables extractions of Tweet streams as datasets
Limits for some languages (requiring URL Decoder / Encoder for readability, such as at the following)
37
ON FLICKR To provide a sense of what
network data may be extracted
from the Yahoo Flickr imagery
and video repository
38
ABOUT FLICKR
Hosts imagery and video
Over 90 million registered members
3.5 million new images uploaded daily
Hosting over 6 billion images as of 2011
Free accounts offering a terabyte of storage per individual
Enables public and private accounts
Enables Creative Commons licensure of contents and CC-Search access
[Created by Ludicorp in 2004 and sold to Yahoo in 2005]
39
TYPES OF INFORMATION AVAILABLE
Related Tags Networks on Flickr
(Multi-lingual) tags as a form of metadata describing the imagery and videos
Related tags (networks of tags that co-occur and may be expressed as clustered text-based graphs)
Graphs may be partitioned for more visual clarity
Scraped imagery may be embedded in the graphs
User Networks / Groups on Flickr
Ego neighborhoods of individual or group contributors to Flickr
“Alters” (nodes with direct ties) to the user network in Flickr
Follower / following
Reply-to
40
SOME E-SNA CHALLENGES WITH THIS SOCIAL MEDIA PLATFORM
Disambiguation of terms
Reliance on informal tagging and folksonomies
Dealing with metadata and not the multimedia directly
Limits for some languages (requiring URL decoder / encoder for some languages, namely Cyrillic and Arabic)
41
ON YOUTUBE To give a sense of the content
networks available on Google’s
YouTube video collection
42
ABOUT YOUTUBE
Over a billion unique users each month on YouTube
Six billion hours of video watched monthly
100 hours of video uploaded each minute
Localized in 61 countries and as many languages
80% of traffic from outside the U.S. (YouTube Statistics)
Adobe Flash video format and HTML 5 format
[Founded in 2005 by a three-man development team and purchased by Google in 2006]
43
TYPES OF INFORMATION AVAILABLE
User networks (user accounts and connections with other user accounts)
Thumbnail screengrabs possible
Video networks (videos about a particular topic)
Thumbnail screengrabs possible
44
SOME E-SNA CHALLENGES WITH THIS SOCIAL MEDIA PLATFORM
Based on metadata, not the direct videos
Would be richer if drawn from the scripts of the video contents
45
ON THE WEB To provide a sense of what
may be captured in terms of
Web networks
46
TYPES OF INFORMATION AVAILABLE
Ties between websites
URLs linked to a geographical location (and vice versa)
Technological understructure of websites
Relatedness ties between various types of electronic information (and the enablement of transforms or the changing of one type of electronic information to another)
Scraping of files (PDF) and imagery (with EXIF data)
Re-identification of aliases
47
SOME E-SNA CHALLENGES WITH THIS INFORMATION SOURCE
High levels of ambiguity
Past data leaving trails (even if the information may not be current)
Involves the public web only, not the hidden Web
Requires a commercial tool for efficiency and coherence
48
NODEXL: NETWORK OVERVIEW, DISCOVERY AND
EXPLORATION FOR EXCEL
To introduce the freeware and
open-source tool that is an add-
in to Excel
49
50
GENERAL SEQUENCE
1. Define a research question (that is answerable with this type of data query).
2. Formulate a strategy to use the tool to extract information from a particular social media platform.
3. Start NodeXL. Ensure that there is Internet connectivity. Set up the data extraction parameters. Run the data extraction.
4. Process the data. Create the graph visualization.
5. Analyze the graph metrics. Analyze the graph visualization. Analyze complementary information from other sources.
6. Use the information to make a decision or create a strategy.
51
TOOL CAPABILITIES
Data extraction from a range of social media platforms
Graph visualization using a dozen different grouping (clustering) visualizations and overall graph visualizations
52
LAYOUT ALGORITHMS
Fruchterman-Reingold (force-based)
Harel-Koren Fast Multiscale
Circle (lattice)
Spiral
Horizontal sine wave / vertical sine wave
Grid
Polar / polar absolute
Sugiyama
Random
53
LAYOUT OPTIONS
Affects layout of the groups or connected components
Treemap
Packed rectangles
Force-directed
54
LAYERS OF DEPENDENCIES
From near-to-far
Local computer and its processing
Connectivity speed to the Internet
NodeXL
Access to the social media platform
Whitelisting
Rate limiting (and time-of-day for access)
Particular search terms “forbidden”
Data processing with NodeXL
Data visualization (with NodeXL or another tool)
Data analysis
Re-run? Additional data extractions?
55
56
GRAPH METRICS
Overall graph metrics
Vertex degree / in-degree and out-degree
Betweenness and closeness centralities
Vertex eigenvector centrality
Vertex PageRank
Vertex clustering coefficient
Vertex reciprocated
Edge reciprocation
Group metrics
Word and word pairs
Top items
Twitter search network top items
57
GROUPS
Group by vertex attribute
Group by connected component
Group by cluster
Group by motif
58
REVIEW To highlight some of the main
ideas
59
A BRIEF REVIEW OF THE AFFORDANCES OF E-SNA
Surfacing Hidden or Latent Information
Who (which nodes) is most active in an event or conversation or other phenomena?
What is he/she/they/it asserting (as an influence agent) via text? via imagery? via video?
Scalability
This scalable approach enables analysis of both small-scale and (relatively) large-scale data, and everything in between. At some point, the human has to come in to analyze what’s found and to advance the work…but computers can do all the heavy lifting.
60
A BRIEF REVIEW OF THE AFFORDANCES OF E-SNA (CONT.)
Machine-Enhanced Sentiment Analysis
Gist of a Tweetstream related to a user account or related accounts, a hashtag conversation, an eventgraph, a photostream, a videostream
Embedded meanings and sentiments (the meaning, the direction and the strength of that emotion, the cultural and social-based valence whether positive or negative)
Fine-tuning the automated analysis of texts
Machine reading of imagery
Human-informed processes (at virtually every step)
61
OTHER DATA EXTRACTION AND GRAPH VISUALIZATION TOOLS
NCapture on Chrome and Internet Explorer (NVivo 10 on Windows)
CEMap on AutoMap with ORA NetScenes
Maltego Tungsten™
* All the above have other purposes and capabilities beyond the limited use cases shown here.
62
REFERENCES
Hansen, D.L., Schneiderman, B., & Smith, M.A. (2011). Analyzing Social Media Networks with NodeXL: Insights from a Connected World. Boston: Elsevier. (available digitally on SciDirect)
NodeXL on CodePlex (downloadables)
63
LIVE DEMO? QUESTIONS? COMMENTS?
Audience suggestions for targets?
Any questions this presentation? About e-social network analysis? The software tools? The social media platforms?
Questions about research you might want to embark on using this methodology and these tools?
64
CONCLUSION AND CONTACT
Dr. Shalin Hai-Jew
Instructional Designer
Information Technology Assistance Center (iTAC)
Kansas State University
212 Hale Library
Manhattan, KS 66506-1200
785-532-5262 (work phone)
65