Formations & Deformations of Social Network Graphs
-
Upload
shalin-hai-jew -
Category
Data & Analytics
-
view
110 -
download
4
Transcript of Formations & Deformations of Social Network Graphs
Shalin Hai-Jew
Kansas State University
Aesthesia
March 2, 2017
Marianna Kistler Beach Museum of Art
Kansas State University
(updated)
Social network graphs are node-link (vertex-edge; entity-relationship) diagrams that show relationships between people and groups. Open-source tools like NodeXL Basic (available on Microsoft’s CodePlex) enable the capture of network data from select social media platforms through third-party add-ons and social media APIs. From social groups, relational clusters are extracted with clustering algorithms which identify intensities of connections. Visually, structural relational data is conveyed with layout algorithms in two-dimensional space. Using these various layout options and built-in visual design features, it is possible to aesthetically “deform” the network graph data for visual effects. This presentation introduces novel datasets and novel data visualizations.
2
node(vertex)(ego, entity)
link(edge)(relationship)
Network Graph Challenges
Challenge 1:
Can you spot the nodes and the links in the following network
graphs (particularly in the deformed ones)?
Challenge 2:
How many network graphs are in this slideshow?
(Of course, some are hidden.)
3
SETTING THE STAGE:
“NATURAL” FORMS of NETWORK GRAPHS
Part 1: Formations w/ alphanumeric labels to get a sense of what social network
graphs look like
Part 2: Formations w/o alphanumeric labels to get a sense of layout algorithms and
grouping algorithms
NETWORK GRAPH DEFORMATIONS
Part 3: Deformations to get a sense of what’s possible with the
data visualizations
4
5
(with alphanumeric labels)
6mass_media article network on Wikipedia (1 deg.)
7#media hashtag network on Twitter
8#media hashtag network on Twitter
9#media hashtag network on Twitter
10#media hashtag network on Twitter
11#media hashtag network on Twitter
12#food hashtag network on Twitter (lim. 200 Tweets)
13#media related tags network on Flickr (1.5 deg.), with subgraph images
14“life” keyword search on Twitter, basic network, with subgraph images
15
(without alphanumeric labels)
(based on common built-in
layout algorithms…
and clustering representations)
16
17
18
19
20
21
22
23
24
25
polar graph layout algorithm
polar absolute layout algorithm
26
27
treemap(grouping / clustering)
(with Sugiyamalayoutof groups / clusters)
28
packed rectangles(grouping / clustering)
(with grid layout of groups / clusters)
29
packed rectangles(grouping / clustering)
(with Harel-Koren Fast Multiscale layoutof groups / clusters)
30
force-directed(grouping / clustering)
(with Fruchterman-Reingold force-based layout of groups / clusters)
31
force-directed(grouping / clustering)
(with Fruchterman-Reingold layout of groups / clusters)
32
force-directed(grouping / clustering)
(with Fruchterman-Reingold layout of groups / clusters)
Clauset-Newman-Moore
Wakita-Tsurumi
33
Girvan-Newman
(for smaller graphs)
Connected Components
34
Motifs
(subgraph micro structures)
Vertex Attribute: PageRank
35
Data worksheets: Edges, vertices, groups, group vertices, overall (summary) metrics, and additional worksheets depending on the social media data source All expressed in row data in related worksheets
Basic edge data: Dyadic followership, relational reciprocation, relationship type, dates (UTC) of the relationship, URLs, #hashtags, and others
Basic vertex data: Name, image URLs, in-degree, out-degree, betweenness centrality, closeness centrality, eigenvector centrality, PageRank, clustering coefficient, reciprocated vertex pair ratio, and others
Clustering: Group (cluster) partitioning by empirically observed low-dimension distance-based measures (which are variable)
Motifs: Mini-subgraphs that show dyadic, triadic, quadratic,…node relationships in the social network (as a general definition) In NodeXL, the “Group by Motif” visualization
shows three types of small-group node relationships: fan motifs (a central node as a connector to otherwise unconnected nodes), D-connector motifs (dyadic nodes connected by multiple intermediary nodes), and clique motifs (with interconnected nodes)
36
37
anything goes…
except no outright manual manipulation of the image or its elements…
except no placement of an external background image in the graph pane…
except no faux or simulated data…
except no data manipulation…
except no visual editing or post-production outside the tool…
except no graph image rotation…
except no inclusion of words or lettering or numbering…
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
1. Extracting social network data from a social media platform (via third-party add-ons to NodeXL)
2. Data processing• Processing graph metrics
• Identifying sub-structures such as groups or clusters, motifs, or connected components (through clustering algorithms)
3. Creating graph visualizations in the graph pane with layout algorithms
4. Analyzing the data visualizations
5. Deforming the visualizations based on the NodeXL tools alone
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
Selection of data for extraction from social media platform …with rate limiting, built-in tool limits, and other
limits, and …with user-set parameters for the data extraction type and seeding terms
Data limiting Selection of data processing measures Selection of grouping algorithm Layout algorithm Autofill selections from columns Dynamic filters Group effects Graph options Scale Zoom Layout iteration (with or without updates
to the data processing) Element selection / highlighting Resizing the graph pane Fluorescing colors
with RGB (red, green, blue) or HSL (hue, saturation, lightness) swap outs
…all within data limits, machine processing limits, and parameter pre-sets in the software (at every step in the sequence)…on glass monitors with light-emitting phosphors
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
data structures, clustering algorithms, and layout
algorithms account for a majority of the visual
differences and effects…
and the decorative elements in the visuals affect only a small
portion
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
it’s the residua that enable some of the cooler visual
effects;
however, there are some “anchor points,” too, beyond
which changes cannot be made;
in experimentation, it’s easy to end up in a visually
irredeemable place, but the “reset all” buttons exist for a
reason
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
sometimes,
a glance at network graph metrics is sufficient to let you
know what visualization possibilities are available (with
enough experience);
sometimes,
a glance at network graph
metrics can be highly misleading about what
visualizations are possible
128
also, an early graph map of the data (without graph metrics,
without grouping) can also reveal a lot about the data’s
social groupness and connectivity;
the “hard fun” in this endeavor though is the pursuit of visual
surprise
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
…but first experiment very broadly to actually learn the tool and its behaviors (don’t lock in to some
“go to’s” simply because you know how those work)
…some data extractions take days, and processing some visualizations
from large datasets can take days (so schedule time on backup
computers)
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
…all available underlying data should be processed because
there may be interesting patterns available for discovery
…graph metrics include vertex degree, in-degree, out-degree,
betweenness and closeness centralities, eigenvector
centrality, PageRank, edge reciprocation, group metrics,
and other details;
…there is also geolocational data, time data, and other
scraped data
…there are subgraph images165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
…and it’s pretty important to visualize data in different ways (and with text
labels) to exploit all the meaning that can be found in that data…and to
engage with
the underlying data, and not
the visualization alone per se
…all network graphs have to be read along with the underlying datasets for
actual research-quality meaning;
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
…often without any reference to x- or y- axes just spatiality in a
two-dimensional plane
…where physical proximity sometimes matters
…where sizes of objects sometimes matter
…where colors of objects sometimes matter
…where connecting lines always matter
…where arrows on the ends of lines always matter
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
But why?
…focusing on the journey, not the destination …learning the tool, every last function and the
practical and theoretical limits …understanding how data and graph data
metrics relate to visualizations (and gaining a sense of how data visualizations are perceived and what they communicate)
…making the tool do things that its maker(s) did not intend (albeit in a friendly sense)
…enjoying graphs that look like signals but are actually just pleasant noise (with a smallamount of actual information)
…digital doodling and pretty for pretty’s sake
But why not?
…a cost in time
…a cost in computer processing
…alarming the makers of the tool with the network graph “hairballs”
219
… When deforming social network graphs, you’re
playing to the following:
the social media platform (and how people are using it at that particular slice-in-time)
the extracted social data (and serendipitous aspects of that data)
the software (NodeXL and APIs)
how people perceive visually and their tendency to see patterns
your innate need for play, and
your enjoyment at amusing others
The trick…and the secret
Huh, how did I get here?!
The trick is to remember your way in and your way out of the deformations (which ultimately means you learn the tool and its many functionalities and how to troubleshoot within the tool)
The secret is that the eye candy (deformed network graphs) is to motivate the learning and defuse learning frustrations (while learning network analysis and NodeXL) and to increase learning persistence
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
The presenter is using a version of NodeXL that is between the free and open-source NodeXL Basic (a limited version) and the function-added commercial NodeXL Pro… on Excel 2016. NodeXL stands for Network Overview,
Discovery and Exploration for Excel. This “template” add-on to Excel was formerly known as NetViz.
There is a server version available.
The NodeXL template / add-on is available on Microsoft’s CodePlex site.
The third-party add-ons to NodeXL enable access to social media application programming interfaces (APIs) and open-source structures like MediaWiki. Data captured include unstructured
(image), semi-structured (text), and structured data (numerical).
238
How to define relationship and depth of relationship Frequencies and types of interactions
Conveying relationships with shapes, lines, and placement in 2D space Emplace data objects with some
likelihood of covering the 2D space and with some balance (but not symmetry)
Using colors strategically and non-offensively (general neutrality for edges, color for vertices)
Uses shapes and shape sizes strategically and non-offensively
Require data limits to enable visualization in fixed physical space (2D and 3D)
Require alignment with human visual capabilities and visual sense-making (and understanding the limits of perception with dense data and visual occlusions)
239
All data visualizations are original and based on unique social media datasets. The data visualizations here are not from any prior presentation or publication. Sometimes, one dataset was used for
multiple data visualizations.
The social network platforms used here include Twitter (microblogging site), Wikipedia (MediaWiki understructure, a crowd-sourced online encyclopedia), and Flickr (video and image-sharing site).
The social network graph types include #hashtag networks, keyword search networks, user networks, related tags networks, and article-article networks.
240
The social networks are all directional single-mode graphs. The direction of relationships are
indicated.
The nodes represent one type of a thing instead of multiple types.
Clusters are created based on inter-relating around topics of shared interest, and such clusters are captured through unsupervised learning. Groups are not pre-labeled with any
classification but are just “Group 1,” “Group 2,” and such in descending order.
There are ways to cluster by vertex attribute, connected component, or other methods.
These included sociograms that… consist of 30 – 100,000+ nodes/vertices
each
contain 1 – 8,000+ groups (which vary based on which clustering algorithms are used).
consist of 1 - 1.5 - 2 degrees when degree is a definable parameter in the data extraction.
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
So social network graphs represent people’s relationships based on various types of relating; they may be understood at global network scale (most broad level)
as various mixes of subgroups and motifs
as (egos) (most granular level)
Relationships may be one-to-oneself (isolate, reflexive self-loop), one-to-one (dyadic), one-to-several, one-to-many, several-to-several, several-to-many, many-to-one, many-to-several, many-to-many Relationships cost, so people are selective
when they connect… There is a trust premium in every connection.
Relationships may benefit their members, so people are strategic and tactical when they connect…
Relationships are dynamic and changing over time, with varying levels of speed-of-change (especially on social media platforms).
In a social ecology, the interrelationships often determine power and capabilities; resource distribution;
information sharing 257
“Relating” online include the following: Undeclared transient (ad hoc)
relationships:
replying to, retweeting, commenting, mentioning, collaboration, co-funding, co-authorship, co-editing, co-tagging digital contents, and others
Declared formalized and announced relationships:
following, un-following, friending, unfriending, relational status updates, and others
258
General types of available data on social media include the following: Content data: text messages, audio,
photos, video, shared digital objects, and others
Trace data: who interacts with whom (which enables drawing of the social network graphs), when messages are shared, when accounts are created, when accounts are closed, and others
Metadata: locational information, “folk” tags linked to digital contents, auto-tags linked to digital contents, system information of those contacting social media platforms, and others
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
In general, people relate and connect around shared interests and similarity (homophily). Human similarity can be a predictor of long-
term relating and bonding.
Some relate around heterophily or interpersonal differences.
In terms of online fame, the power law applies—with a few garnering most of the followership and attention. Then the rest of the frequency curve involves
a long tail of those with few close friends.
Actual reciprocal relationships are not so common. The followed do not often follow-back.
On social media, people often pose (perform socially) and over-share for imagined audiences.
One-to-many virality does not truly exist. It’s often the bigger entities (governments,
corporations) that are the ones that push designed messages one-to-many that often create trending topics.
Social influence is concentrated at cores.
277
Over time, there are predictable evolutionary patterns with online social networks. For one, “isolates” (singletons),
“whiskers,” and subgroups either meld with a larger connected component in an online community, or they simply disappear. In other words, nodes move to the core and connect with the social mass or move out of that particular network.
Mainline interests have to converge for people to continue participation in a community.
278
Virtual relationships are ephemeral, with varying degrees of friending and unfriending. Average length of FB relationships are said
to be about three years.
People looking for romance are often entranced by ‘bots, who stand in for actual people. People may be “catfished” into “relationships” by automata (scripts). Also, a majority of people who encounter
Twitter ‘bots are unable to tell that they are not people and will accept them as friends (and give them access to their real social networks).
Predictive analytics have been applied to the length of people’s romantic relationships with fairly high accuracy. The time period length of initial
interactivity is one indicator of overall relationship longevity.
279
Individually, people can be fairly accurately profiled psychologically by what they post online. People’s social circles may be used to
profile the individual even if he / she does not have a direct online presence.
There are geographical effects on virtual connectivity. The physical real has effects on the virtual. Likewise, language and culture have
effects on social media usage.
The cyber-physical confluence exists.
People’s geolocational check-ins have been used to profile individuals as to their lifestyles and behaviors because people tend towards habitual “patterns of life” and times/places where they are comfortable. With a few data points, people’s
likelihood of being in a particular place at a certain time may be projected with fairly high accuracy into the future (out about a little over a year).
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
Dr. Shalin Hai-Jew Instructional Designer
iTAC, Kansas State University
785-532-5262
For more information about social network graphs and the analytics aspects, please see “Beauty as a Bridge to NodeXL” (on SlideShare).
Thanks to Dr. Brent Chamberlain and the sponsors of “Aesthesia” for including me.
Thanks also to the Social Media Research Foundation (SMRF), which promotes “Open Tools, Open Data, Open Scholarship for Social Media” and enables free and open access to NodeXL.
The presenter has no tie to either SMRF or CodePlex.
298
299