Formations & Deformations of Social Network Graphs

Shalin Hai-Jew

Kansas State University

Aesthesia

March 2, 2017

Marianna Kistler Beach Museum of Art

Kansas State University

(updated)

Social network graphs are node-link (vertex-edge; entity-relationship) diagrams that show relationships between people and groups. Open-source tools like NodeXL Basic (available on Microsoft’s CodePlex) enable the capture of network data from select social media platforms through third-party add-ons and social media APIs. From social groups, relational clusters are extracted with clustering algorithms which identify intensities of connections. Visually, structural relational data is conveyed with layout algorithms in two-dimensional space. Using these various layout options and built-in visual design features, it is possible to aesthetically “deform” the network graph data for visual effects. This presentation introduces novel datasets and novel data visualizations.

2

node(vertex)(ego, entity)

link(edge)(relationship)

https://nodexl.codeplex.com/

Network Graph Challenges

Challenge 1:

Can you spot the nodes and the links in the following network

graphs (particularly in the deformed ones)?

Challenge 2:

How many network graphs are in this slideshow?

(Of course, some are hidden.)

3

SETTING THE STAGE:

“NATURAL” FORMS of NETWORK GRAPHS

Part 1: Formations w/ alphanumeric labels to get a sense of what social network

graphs look like

Part 2: Formations w/o alphanumeric labels to get a sense of layout algorithms and

grouping algorithms

NETWORK GRAPH DEFORMATIONS

Part 3: Deformations to get a sense of what’s possible with the

data visualizations

4

5

(with alphanumeric labels)

6mass_media article network on Wikipedia (1 deg.)

7#media hashtag network on Twitter

12#food hashtag network on Twitter (lim. 200 Tweets)

13#media related tags network on Flickr (1.5 deg.), with subgraph images

14“life” keyword search on Twitter, basic network, with subgraph images

15

(without alphanumeric labels)

(based on common built-in

layout algorithms…

and clustering representations)

polar graph layout algorithm

polar absolute layout algorithm

26

27

treemap(grouping / clustering)

(with Sugiyamalayoutof groups / clusters)

28

packed rectangles(grouping / clustering)

(with grid layout of groups / clusters)

29

packed rectangles(grouping / clustering)

(with Harel-Koren Fast Multiscale layoutof groups / clusters)

30

force-directed(grouping / clustering)

(with Fruchterman-Reingold force-based layout of groups / clusters)

31


(with Fruchterman-Reingold layout of groups / clusters)

32


(with Fruchterman-Reingold layout of groups / clusters)

Clauset-Newman-Moore

Wakita-Tsurumi

33

Girvan-Newman

(for smaller graphs)

Connected Components

34

Motifs

(subgraph micro structures)

Vertex Attribute: PageRank

35

Data worksheets: Edges, vertices, groups, group vertices, overall (summary) metrics, and additional worksheets depending on the social media data source All expressed in row data in related worksheets

Basic edge data: Dyadic followership, relational reciprocation, relationship type, dates (UTC) of the relationship, URLs, #hashtags, and others

Basic vertex data: Name, image URLs, in-degree, out-degree, betweenness centrality, closeness centrality, eigenvector centrality, PageRank, clustering coefficient, reciprocated vertex pair ratio, and others

Clustering: Group (cluster) partitioning by empirically observed low-dimension distance-based measures (which are variable)

Motifs: Mini-subgraphs that show dyadic, triadic, quadratic,…node relationships in the social network (as a general definition) In NodeXL, the “Group by Motif” visualization

shows three types of small-group node relationships: fan motifs (a central node as a connector to otherwise unconnected nodes), D-connector motifs (dyadic nodes connected by multiple intermediary nodes), and clique motifs (with interconnected nodes)

36

37

anything goes…

except no outright manual manipulation of the image or its elements…

except no placement of an external background image in the graph pane…

except no faux or simulated data…

except no data manipulation…

except no visual editing or post-production outside the tool…

except no graph image rotation…

except no inclusion of words or lettering or numbering…

1. Extracting social network data from a social media platform (via third-party add-ons to NodeXL)

2. Data processing• Processing graph metrics

• Identifying sub-structures such as groups or clusters, motifs, or connected components (through clustering algorithms)

3. Creating graph visualizations in the graph pane with layout algorithms

4. Analyzing the data visualizations

5. Deforming the visualizations based on the NodeXL tools alone

55

Selection of data for extraction from social media platform …with rate limiting, built-in tool limits, and other

limits, and …with user-set parameters for the data extraction type and seeding terms

Data limiting Selection of data processing measures Selection of grouping algorithm Layout algorithm Autofill selections from columns Dynamic filters Group effects Graph options Scale Zoom Layout iteration (with or without updates

to the data processing) Element selection / highlighting Resizing the graph pane Fluorescing colors

with RGB (red, green, blue) or HSL (hue, saturation, lightness) swap outs

…all within data limits, machine processing limits, and parameter pre-sets in the software (at every step in the sequence)…on glass monitors with light-emitting phosphors

73

data structures, clustering algorithms, and layout

algorithms account for a majority of the visual

differences and effects…

and the decorative elements in the visuals affect only a small

portion

91

it’s the residua that enable some of the cooler visual

effects;

however, there are some “anchor points,” too, beyond

which changes cannot be made;

in experimentation, it’s easy to end up in a visually

irredeemable place, but the “reset all” buttons exist for a

reason

110

sometimes,

a glance at network graph metrics is sufficient to let you

know what visualization possibilities are available (with

enough experience);

sometimes,

a glance at network graph

metrics can be highly misleading about what

visualizations are possible

128

also, an early graph map of the data (without graph metrics,

without grouping) can also reveal a lot about the data’s

social groupness and connectivity;

the “hard fun” in this endeavor though is the pursuit of visual

surprise

129

…but first experiment very broadly to actually learn the tool and its behaviors (don’t lock in to some

“go to’s” simply because you know how those work)

…some data extractions take days, and processing some visualizations

from large datasets can take days (so schedule time on backup

computers)

147

…all available underlying data should be processed because

there may be interesting patterns available for discovery

…graph metrics include vertex degree, in-degree, out-degree,

betweenness and closeness centralities, eigenvector

centrality, PageRank, edge reciprocation, group metrics,

and other details;

…there is also geolocational data, time data, and other

scraped data

…there are subgraph images165

…and it’s pretty important to visualize data in different ways (and with text

labels) to exploit all the meaning that can be found in that data…and to

engage with

the underlying data, and not

the visualization alone per se

…all network graphs have to be read along with the underlying datasets for

actual research-quality meaning;

183

…often without any reference to x- or y- axes just spatiality in a

two-dimensional plane

…where physical proximity sometimes matters

…where sizes of objects sometimes matter

…where colors of objects sometimes matter

…where connecting lines always matter

…where arrows on the ends of lines always matter

201

But why?

…focusing on the journey, not the destination …learning the tool, every last function and the

practical and theoretical limits …understanding how data and graph data

metrics relate to visualizations (and gaining a sense of how data visualizations are perceived and what they communicate)

…making the tool do things that its maker(s) did not intend (albeit in a friendly sense)

…enjoying graphs that look like signals but are actually just pleasant noise (with a smallamount of actual information)

…digital doodling and pretty for pretty’s sake

But why not?

…a cost in time

…a cost in computer processing

…alarming the makers of the tool with the network graph “hairballs”

219

… When deforming social network graphs, you’re

playing to the following:

the social media platform (and how people are using it at that particular slice-in-time)

the extracted social data (and serendipitous aspects of that data)

the software (NodeXL and APIs)

how people perceive visually and their tendency to see patterns

your innate need for play, and

your enjoyment at amusing others

The trick…and the secret

Huh, how did I get here?!

The trick is to remember your way in and your way out of the deformations (which ultimately means you learn the tool and its many functionalities and how to troubleshoot within the tool)

The secret is that the eye candy (deformed network graphs) is to motivate the learning and defuse learning frustrations (while learning network analysis and NodeXL) and to increase learning persistence

220

The presenter is using a version of NodeXL that is between the free and open-source NodeXL Basic (a limited version) and the function-added commercial NodeXL Pro… on Excel 2016. NodeXL stands for Network Overview,

Discovery and Exploration for Excel. This “template” add-on to Excel was formerly known as NetViz.

There is a server version available.

The NodeXL template / add-on is available on Microsoft’s CodePlex site.

The third-party add-ons to NodeXL enable access to social media application programming interfaces (APIs) and open-source structures like MediaWiki. Data captured include unstructured

(image), semi-structured (text), and structured data (numerical).

238

How to define relationship and depth of relationship Frequencies and types of interactions

Conveying relationships with shapes, lines, and placement in 2D space Emplace data objects with some

likelihood of covering the 2D space and with some balance (but not symmetry)

Using colors strategically and non-offensively (general neutrality for edges, color for vertices)

Uses shapes and shape sizes strategically and non-offensively

Require data limits to enable visualization in fixed physical space (2D and 3D)

Require alignment with human visual capabilities and visual sense-making (and understanding the limits of perception with dense data and visual occlusions)

239

All data visualizations are original and based on unique social media datasets. The data visualizations here are not from any prior presentation or publication. Sometimes, one dataset was used for

multiple data visualizations.

The social network platforms used here include Twitter (microblogging site), Wikipedia (MediaWiki understructure, a crowd-sourced online encyclopedia), and Flickr (video and image-sharing site).

The social network graph types include #hashtag networks, keyword search networks, user networks, related tags networks, and article-article networks.

240

The social networks are all directional single-mode graphs. The direction of relationships are

indicated.

The nodes represent one type of a thing instead of multiple types.

Clusters are created based on inter-relating around topics of shared interest, and such clusters are captured through unsupervised learning. Groups are not pre-labeled with any

classification but are just “Group 1,” “Group 2,” and such in descending order.

There are ways to cluster by vertex attribute, connected component, or other methods.

These included sociograms that… consist of 30 – 100,000+ nodes/vertices

each

contain 1 – 8,000+ groups (which vary based on which clustering algorithms are used).

consist of 1 - 1.5 - 2 degrees when degree is a definable parameter in the data extraction.

241

So social network graphs represent people’s relationships based on various types of relating; they may be understood at global network scale (most broad level)

as various mixes of subgroups and motifs

as (egos) (most granular level)

Relationships may be one-to-oneself (isolate, reflexive self-loop), one-to-one (dyadic), one-to-several, one-to-many, several-to-several, several-to-many, many-to-one, many-to-several, many-to-many Relationships cost, so people are selective

when they connect… There is a trust premium in every connection.

Relationships may benefit their members, so people are strategic and tactical when they connect…

Relationships are dynamic and changing over time, with varying levels of speed-of-change (especially on social media platforms).

In a social ecology, the interrelationships often determine power and capabilities; resource distribution;

information sharing 257

“Relating” online include the following: Undeclared transient (ad hoc)

relationships:

replying to, retweeting, commenting, mentioning, collaboration, co-funding, co-authorship, co-editing, co-tagging digital contents, and others

Declared formalized and announced relationships:

following, un-following, friending, unfriending, relational status updates, and others

258

General types of available data on social media include the following: Content data: text messages, audio,

photos, video, shared digital objects, and others

Trace data: who interacts with whom (which enables drawing of the social network graphs), when messages are shared, when accounts are created, when accounts are closed, and others

Metadata: locational information, “folk” tags linked to digital contents, auto-tags linked to digital contents, system information of those contacting social media platforms, and others

259

In general, people relate and connect around shared interests and similarity (homophily). Human similarity can be a predictor of long-

term relating and bonding.

Some relate around heterophily or interpersonal differences.

In terms of online fame, the power law applies—with a few garnering most of the followership and attention. Then the rest of the frequency curve involves

a long tail of those with few close friends.

Actual reciprocal relationships are not so common. The followed do not often follow-back.

On social media, people often pose (perform socially) and over-share for imagined audiences.

One-to-many virality does not truly exist. It’s often the bigger entities (governments,

corporations) that are the ones that push designed messages one-to-many that often create trending topics.

Social influence is concentrated at cores.

277

Over time, there are predictable evolutionary patterns with online social networks. For one, “isolates” (singletons),

“whiskers,” and subgroups either meld with a larger connected component in an online community, or they simply disappear. In other words, nodes move to the core and connect with the social mass or move out of that particular network.

Mainline interests have to converge for people to continue participation in a community.

278

Virtual relationships are ephemeral, with varying degrees of friending and unfriending. Average length of FB relationships are said

to be about three years.

People looking for romance are often entranced by ‘bots, who stand in for actual people. People may be “catfished” into “relationships” by automata (scripts). Also, a majority of people who encounter

Twitter ‘bots are unable to tell that they are not people and will accept them as friends (and give them access to their real social networks).

Predictive analytics have been applied to the length of people’s romantic relationships with fairly high accuracy. The time period length of initial

interactivity is one indicator of overall relationship longevity.

279

Individually, people can be fairly accurately profiled psychologically by what they post online. People’s social circles may be used to

profile the individual even if he / she does not have a direct online presence.

There are geographical effects on virtual connectivity. The physical real has effects on the virtual. Likewise, language and culture have

effects on social media usage.

The cyber-physical confluence exists.

People’s geolocational check-ins have been used to profile individuals as to their lifestyles and behaviors because people tend towards habitual “patterns of life” and times/places where they are comfortable. With a few data points, people’s

likelihood of being in a particular place at a certain time may be projected with fairly high accuracy into the future (out about a little over a year).

280

Dr. Shalin Hai-Jew Instructional Designer

iTAC, Kansas State University

785-532-5262

[email protected]

For more information about social network graphs and the analytics aspects, please see “Beauty as a Bridge to NodeXL” (on SlideShare).

Thanks to Dr. Brent Chamberlain and the sponsors of “Aesthesia” for including me.

Thanks also to the Social Media Research Foundation (SMRF), which promotes “Open Tools, Open Data, Open Scholarship for Social Media” and enables free and open access to NodeXL.

The presenter has no tie to either SMRF or CodePlex.

298

mailto:[email protected]

http://www.slideshare.net/ShalinHaiJew/beauty-as-a-bridge-to-nodexl

http://www.smrfoundation.org/

Formations & Deformations of Social Network Graphs

Data & Analytics

Transcript of Formations & Deformations of Social Network Graphs