Extracting Social Network Data and Multimedia Communications from Social Media Platforms for...

65
EXTRACTING SOCIAL NETWORK DATA AND MULTIMEDIA COMMUNICATIONS FROM SOCIAL MEDIA PLATFORMS FOR ANALYSIS AND DECISION-MAKING Shalin Hai-Jew 2014 Big XII Teaching and Learning Conference Oklahoma State University Stillwater, Oklahoma Aug. 4 – 5, 2014

description

This presentation provides an overview of some of the data extractions that may be achieved on social media platforms using their respective APIs and a free open-source tool (NodeXL).

Transcript of Extracting Social Network Data and Multimedia Communications from Social Media Platforms for...

Page 1: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

EXTRACTING SOCIAL NETWORK DATA AND MULTIMEDIA COMMUNICATIONS

FROM SOCIAL MEDIA PLATFORMS FOR ANALYSIS AND DECISION-MAKING

Shalin Hai-Jew 2014 Big XII Teaching and Learning Conference Oklahoma State University Stillwater, Oklahoma Aug. 4 – 5, 2014

Page 2: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

PRESENTATION OVERVIEW Electronic Commons

Academic Environment Analysis and Decision-making (from E-SNA)

Examples of Social Network Data Graphs

Electronic Social Network Analysis (E-SNA) / Social Physics

Social Media Platform Types Microblogging: Twitter

Content-Based Social Platforms: YouTube, Flickr

Web Networks

NodeXL (Network Overview, Discovery and Exploration for Excel)

Review

Tools

2

Page 3: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

PRESENTATION OVERVIEW

3

Page 4: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

WELCOMES AND SELF-INTROS

Please introduce yourself as your digital alter-ego. What does your electronic alter-ego look like on, say, Twitter? Facebook? Flickr? YouTube? How accurate is your digital doppelganger to your real-world self? Why?

If analyst(s) were to conduct an “inference attack” on your electronic presence, what could they find out? What could they infer in terms of data leakage and unintended communications (latent information)?

If electronic presence is a kind of social performance, how is it best performed, and why?

What are your experiences with social media platforms? Which do you prefer, and why? Have your preferences changed over time?

What would you like to learn about electronic social network analysis?

4

Page 5: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

THE CONTEXT To provide a rationale for the

use of electronic social network

analysis to benefit the

(teaching and learning, and

other) work of universities

5

Note: This presentation was designed to

introduce some basic electronic social

network analysis capabilities, not teach

the audience directly how to do the

work, which is beyond the purview of the

presentation.

Page 6: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

THE ELECTRONIC COMMONS

A “chokepoint” for social issues as a commons A way to reach many technologically and socially

A way to trigger mass actions (attitudes, beliefs, actions), potentially in a viral or cascading way…as an influence agent

A fantasy space where “egos” may assume audiences (that may be non-existent)

A fantasy space where “egos” may assume non-audiences (the assumption of narrow-casting) when it may be broadcasting (unintended audiences along with the intended ones)

Re-creation of social power structures from the real-world into the virtual In-group and out-groups

Social performances, posing

Social codes and meanings

Mixed interests and motives

Low cost of indulging curiosities, particularly in an automated and scalable way

6

Page 7: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

THE ELECTRONIC COMMONS (CONT.)

Certain individuals (demographics) in certain social media platforms

Limited big data sharing (value to the data and the identities)

Application programming interfaces (APIs) to access shadow databases

Importance of maintaining trust with clients

Private accounts (vs. public ones)

7

Page 8: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

AFFORDANCES AND ENABLEMENTS FOR INSTITUTIONS OF HIGHER EDUCATION

What are ways that universities have benefitted from the Web? Social media? How can universities continue building on these affordances? What innovations can people use to build on these effects?

What are some ways that universities can harness electronic social network analysis (e-SNA) for their various professional / formal and professional / informal objectives?

8

Page 9: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

ACADEMIC ENVIRONMENT ANALYSIS AND DECISION-MAKING (FROM E-SNA)

What is the social media presence of the university?

Who are its closest partners in terms of exchanging messages or sharing social media contents?

What are the contents of the messages? What are the main expressed sentiments?

If the university is considering partnering with an organization, what may be learned about this organization based on its social media presence?

Who are the most active participants in a #hashtag conversation about some aspect of the university? Who is the “mayor of the hashtag” (per Marc A. Smith’s term)? Why?

What conversations are occurring around the events being hosted on or around campus?

9

Page 10: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

ACADEMIC ENVIRONMENT ANALYSIS AND DECISION-MAKING (FROM E-SNA) (CONT.)

If there is a controversial or trending issue, what are the main sentiments being expressed? Who and which ad hoc groups are expressing what sentiments? How may the university take part constructively?

If a flash mob action is being planned around campus, how can campus administrators and law enforcement personnel know about what is happening?

If there is a university-related issue that may be inspired, organized, and maintained using social media, how can universities harness social media to constructive ends?

Is there mis-use of the university name and brand? Are there fraudulently created social media accounts linked to the university? (After de-aliasing, who is actually behind such accounts?) How can social media platform information be used to geolocate events to physical spaces, and aliases to actual people?

10

Page 11: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

ACADEMIC ENVIRONMENT ANALYSIS AND DECISION-MAKING (FROM E-SNA) (CONT.)

What sorts of images and video are being shared (that are associated with the university) on microblogging sites? On content sharing sites?

In terms of digital content tagging, what are the most common words linked to the university (or its student groups, colleges, public figures, and other associated groups and individuals)?

If there is a desire to change public perceptions, how may social media platforms be used constructively? What are the ethical rules of engagement?

How may a university maintain relationships with its various constituencies through social media? Its political partners? Its corporate partners? Its alumni? Its donors? Its current learners? Its current learners’ families? And then, further, how can e-SNA be used to maintain understandings of these interchanges and interrelationships?

11

Page 12: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

SOME SAMPLES OF SOCIAL NETWORK DATA GRAPHS

To pique your interest

12

Page 13: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

GRAPH 1A: A #HASHTAG CONVERSATION ON TWITTER (FLU)

13

Note: Please click on the various

graphs to link to them on the

NodeXL Graph Gallery. Datasets

may be downloaded there for many

of these data extractions.

The data structures can be depicted

in a variety of ways based on a

number of layout algorithms.

Page 14: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

GRAPH 1B: A TWITTER #HASHTAG CONVERSATION (#BRAG)

14

Page 15: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

GRAPH 2: AN #EVENTGRAPH ON TWITTER (MERLOT)

15

Page 16: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

GRAPH 3: KEYWORD SEARCH ON TWITTER (MOSUL)

16

Page 17: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

GRAPH 4A: USER NETWORK ON TWITTER (FIFAWORLDCUP)

17

Page 18: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

GRAPH 4B: USER NETWORKS OF THOSE WHO TWEETED “ELONMUSK” ON TWITTER

18

Page 19: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

GRAPH 4C: USER NETWORK ON TWITTER (OKSTATENEWS)

19

Page 20: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

GRAPH 5: LIST NETWORK ON TWITTER (WORLD LEADERS)

20

Page 21: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

GRAPH 6: YOUTUBE USER NETWORK (RIHANNA)

21

Page 22: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

GRAPH 7: YOUTUBE VIDEO NETWORK (CAT)

22

Page 23: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

GRAPH 8: RELATED TAGS NETWORK ON FLICKR (SURVIVAL)

23

Page 24: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

GRAPH 9: USER NETWORK ON FLICKR (NERDBOT)

24

Page 25: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

GRAPH 10: WEB NETWORKS / WIKIS / BLOGS (NODEXL.CODEPLEX.COM)

25

Page 26: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

A NOTE ABOUT WEB NETWORK GRAPHS

Third-party VOSON (Virtual Observatory for the Study of Online Networks) tool out of Australia National University (with an add-in to NodeXL)

Maltego Tungsten

26

Page 27: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

(E-) SOCIAL NETWORK ANALYSIS AND SOCIAL PHYSICS

To summarize some of the

basic concepts of social

network analysis as applied to

electronic spaces

27

Page 28: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

28

Page 29: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

“SOCIAL PHYSICS”

Identifying the latent “laws” of human interactions with each other at macro and micro levels

Laws of affiliation and association (over time): homophily, heterophily

Laws of attraction and aversion

Laws of human patterning socially (and others)

Laws of human uses of physical spaces

Laws of systemic change

Laws of social frictions and large-scale combat

29

Page 30: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

STATISTICAL MEASURES

Global Network Measures

Betweenness centrality: Total number of shortest paths or walks for each pair of dyadic notes (info moves between the shortest paths and closest ties), how much of a bridge a node is for network connectivity

Closeness centrality: Geodesic path distance between a node and every other node (farness as sum of all distances to all other nodes; closeness as inverse of farness)

Node-Level (Local) Measures

Degree centrality: In-degree and out-degree (relative popularity)

Clustering coefficient: Embeddedness of single nodes in cliques or ego neighborhoods with its alters

30

Page 31: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

STATISTICAL MEASURES (CONT.)

Global Network Measures

Eigenvector centrality (diversity): Relative distances between a node and every other node and those connected to higher-value or popular nodes resulting in a higher value (values between 0 and 1) as a measure of relative influence

Clustering coefficient: Aggregation of multiple nodes based on similarity (like co-occurrence) or connectivity, and expressed as proximity or closeness visually; may be a measure of transitivity

Motif Structures

Dyads, triads, and other structured sub-groupings Local and experiential for the nodes in terms of

structured connections

May (fractals) / may not be reflective of the overall structure

Global motif censuses (counts of occurrences of various types of motif structures in a whole network)

Structural holes as indicators of potential openings for nodes and links (to build resilience)

31

Page 32: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

STRUCTURE MINING

Structure of social relationships as an indicator of…

Type of social organization

An embedded power structure

An expression of interdependent and intermixed personalities

Network diffusion of information, power, and other transmissible phenomena

Geodesic structures and distances and paths

Static slice-in-time representations but actual dynamical (changing) realities

(“A Brief Overview of Social Network Analysis”)

32

Page 33: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

NODES AND LINKS (IN TERMS OF SOCIAL MEDIA PLATFORMS)

Entities

Individuals, organizations, governments, non-profits, political groups, and others

People, robots, and cyborgs

Relationships

Follower, following

Tweets, re-tweets, replies-to, mentions

Comments on videos and response videos

Co-occurrence of related tags networks

33

Page 34: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

ON TWITTER To give a sense of the various

network graphs possible from

the Twitter microblogging site

(with multimedia scraping)

34

Page 35: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

ABOUT TWITTER

255 million monthly active users

500 million Tweets (140-character microblogging messages) a day

Nearly 80% of active users on mobile

77% of accounts outside U.S.

Support for over 35 languages

Vine (looping video sharing on mobile) with more than 40 million users

Verified accounts

[Twitter created by a four-man team in 2006 and incorporated in 2007 (About Twitter FactSheet)]

35

Page 36: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

TYPES OF INFORMATION AVAILABLE

#Hashtag conversations (tagged conversations)

#Hashtag eventgraphs (event-based)

Keyword networks (multi-topic)

User networks (ego-based)

List networks (topic-based)

36

Page 37: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

SOME E-SNA CHALLENGES WITH THIS SOCIAL MEDIA PLATFORM

Word disambiguation

1/100 with geolocation data (which is often noisy data)

Rate-limiting

Goes back a week only (no deep historical searches without paying for a third-party company with access)

Enables extractions of Tweet streams as datasets

Limits for some languages (requiring URL Decoder / Encoder for readability, such as at the following)

37

Page 38: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

ON FLICKR To provide a sense of what

network data may be extracted

from the Yahoo Flickr imagery

and video repository

38

Page 39: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

ABOUT FLICKR

Hosts imagery and video

Over 90 million registered members

3.5 million new images uploaded daily

Hosting over 6 billion images as of 2011

Free accounts offering a terabyte of storage per individual

Enables public and private accounts

Enables Creative Commons licensure of contents and CC-Search access

[Created by Ludicorp in 2004 and sold to Yahoo in 2005]

39

Page 40: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

TYPES OF INFORMATION AVAILABLE

Related Tags Networks on Flickr

(Multi-lingual) tags as a form of metadata describing the imagery and videos

Related tags (networks of tags that co-occur and may be expressed as clustered text-based graphs)

Graphs may be partitioned for more visual clarity

Scraped imagery may be embedded in the graphs

User Networks / Groups on Flickr

Ego neighborhoods of individual or group contributors to Flickr

“Alters” (nodes with direct ties) to the user network in Flickr

Follower / following

Reply-to

40

Page 41: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

SOME E-SNA CHALLENGES WITH THIS SOCIAL MEDIA PLATFORM

Disambiguation of terms

Reliance on informal tagging and folksonomies

Dealing with metadata and not the multimedia directly

Limits for some languages (requiring URL decoder / encoder for some languages, namely Cyrillic and Arabic)

41

Page 42: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

ON YOUTUBE To give a sense of the content

networks available on Google’s

YouTube video collection

42

Page 43: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

ABOUT YOUTUBE

Over a billion unique users each month on YouTube

Six billion hours of video watched monthly

100 hours of video uploaded each minute

Localized in 61 countries and as many languages

80% of traffic from outside the U.S. (YouTube Statistics)

Adobe Flash video format and HTML 5 format

[Founded in 2005 by a three-man development team and purchased by Google in 2006]

43

Page 44: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

TYPES OF INFORMATION AVAILABLE

User networks (user accounts and connections with other user accounts)

Thumbnail screengrabs possible

Video networks (videos about a particular topic)

Thumbnail screengrabs possible

44

Page 45: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

SOME E-SNA CHALLENGES WITH THIS SOCIAL MEDIA PLATFORM

Based on metadata, not the direct videos

Would be richer if drawn from the scripts of the video contents

45

Page 46: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

ON THE WEB To provide a sense of what

may be captured in terms of

Web networks

46

Page 47: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

TYPES OF INFORMATION AVAILABLE

Ties between websites

URLs linked to a geographical location (and vice versa)

Technological understructure of websites

Relatedness ties between various types of electronic information (and the enablement of transforms or the changing of one type of electronic information to another)

Scraping of files (PDF) and imagery (with EXIF data)

Re-identification of aliases

47

Page 48: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

SOME E-SNA CHALLENGES WITH THIS INFORMATION SOURCE

High levels of ambiguity

Past data leaving trails (even if the information may not be current)

Involves the public web only, not the hidden Web

Requires a commercial tool for efficiency and coherence

48

Page 49: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

NODEXL: NETWORK OVERVIEW, DISCOVERY AND

EXPLORATION FOR EXCEL

To introduce the freeware and

open-source tool that is an add-

in to Excel

49

Page 50: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

50

Page 51: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

GENERAL SEQUENCE

1. Define a research question (that is answerable with this type of data query).

2. Formulate a strategy to use the tool to extract information from a particular social media platform.

3. Start NodeXL. Ensure that there is Internet connectivity. Set up the data extraction parameters. Run the data extraction.

4. Process the data. Create the graph visualization.

5. Analyze the graph metrics. Analyze the graph visualization. Analyze complementary information from other sources.

6. Use the information to make a decision or create a strategy.

51

Page 52: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

TOOL CAPABILITIES

Data extraction from a range of social media platforms

Graph visualization using a dozen different grouping (clustering) visualizations and overall graph visualizations

52

Page 53: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

LAYOUT ALGORITHMS

Fruchterman-Reingold (force-based)

Harel-Koren Fast Multiscale

Circle (lattice)

Spiral

Horizontal sine wave / vertical sine wave

Grid

Polar / polar absolute

Sugiyama

Random

53

Page 54: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

LAYOUT OPTIONS

Affects layout of the groups or connected components

Treemap

Packed rectangles

Force-directed

54

Page 55: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

LAYERS OF DEPENDENCIES

From near-to-far

Local computer and its processing

Connectivity speed to the Internet

NodeXL

Access to the social media platform

Whitelisting

Rate limiting (and time-of-day for access)

Particular search terms “forbidden”

Data processing with NodeXL

Data visualization (with NodeXL or another tool)

Data analysis

Re-run? Additional data extractions?

55

Page 56: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

56

Page 57: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

GRAPH METRICS

Overall graph metrics

Vertex degree / in-degree and out-degree

Betweenness and closeness centralities

Vertex eigenvector centrality

Vertex PageRank

Vertex clustering coefficient

Vertex reciprocated

Edge reciprocation

Group metrics

Word and word pairs

Top items

Twitter search network top items

57

Page 58: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

GROUPS

Group by vertex attribute

Group by connected component

Group by cluster

Group by motif

58

Page 59: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

REVIEW To highlight some of the main

ideas

59

Page 60: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

A BRIEF REVIEW OF THE AFFORDANCES OF E-SNA

Surfacing Hidden or Latent Information

Who (which nodes) is most active in an event or conversation or other phenomena?

What is he/she/they/it asserting (as an influence agent) via text? via imagery? via video?

Scalability

This scalable approach enables analysis of both small-scale and (relatively) large-scale data, and everything in between. At some point, the human has to come in to analyze what’s found and to advance the work…but computers can do all the heavy lifting.

60

Page 61: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

A BRIEF REVIEW OF THE AFFORDANCES OF E-SNA (CONT.)

Machine-Enhanced Sentiment Analysis

Gist of a Tweetstream related to a user account or related accounts, a hashtag conversation, an eventgraph, a photostream, a videostream

Embedded meanings and sentiments (the meaning, the direction and the strength of that emotion, the cultural and social-based valence whether positive or negative)

Fine-tuning the automated analysis of texts

Machine reading of imagery

Human-informed processes (at virtually every step)

61

Page 62: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

OTHER DATA EXTRACTION AND GRAPH VISUALIZATION TOOLS

NCapture on Chrome and Internet Explorer (NVivo 10 on Windows)

CEMap on AutoMap with ORA NetScenes

Maltego Tungsten™

* All the above have other purposes and capabilities beyond the limited use cases shown here.

62

Page 63: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

REFERENCES

Hansen, D.L., Schneiderman, B., & Smith, M.A. (2011). Analyzing Social Media Networks with NodeXL: Insights from a Connected World. Boston: Elsevier. (available digitally on SciDirect)

NodeXL on CodePlex (downloadables)

63

Page 64: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

LIVE DEMO? QUESTIONS? COMMENTS?

Audience suggestions for targets?

Any questions this presentation? About e-social network analysis? The software tools? The social media platforms?

Questions about research you might want to embark on using this methodology and these tools?

64

Page 65: Extracting Social Network Data and Multimedia Communications from Social Media Platforms for Analysis and Decision-making

CONCLUSION AND CONTACT

Dr. Shalin Hai-Jew

Instructional Designer

Information Technology Assistance Center (iTAC)

Kansas State University

212 Hale Library

Manhattan, KS 66506-1200

785-532-5262 (work phone)

[email protected]

65