DDAY2014 - Edgesense: Social network analysis per tutti

Post on 07-Jul-2015

207 views 0 download

Tags:

description

Speaker: Luca Mearelli Area: Building, Development Ogni conversazione ha una struttura. La rete formata dalle persone che iteragiscono nelle conversazioni di una comunità online può quindi essere analizzata con gli strumenti che la scienza delle reti per comprenderne le caratteristiche. Disponibile video con una demo del software che lo speaker ha mostrato durante la presentazione: https://www.youtube.com/watch?v=HqDRcSSo6bY

Transcript of DDAY2014 - Edgesense: Social network analysis per tutti

EdgesenseSocial network analysis per tutti

Luca Mearelli - @lmea

Hi, I’m Luca

Collective Intelligence

Emergence

larger entities, patterns, and regularities arise through interactions among smaller or simpler entities that themselves do not exhibit such properties

Online collaboration

it works!

Online communities

• Exhibit emergence

• Strong design properties

•Hackable

The Blueprint

•Map the community social network

•Measure the structural properties

• Visualize the structure & the metrics

• Tweak the interaction

Edgesense

Edgesense Architecture HTML5 Javascript

JSON files

Python

JSON source

Edgesense Source Data

• users.json

• nodes.json

• comments.json

users.json

nodes.json

comments.json

Edgesense Backend

• Python

•NetworkX

Edgesense Parsing Pipeline

• Parse source JSON files

• Build network from interactions

• Extract metrics

• Export network + metrics to JSON files

Network construction

• Persons are nodes

Network construction

•Comments make links

Network construction

• Edges are aggregated

•Metadata is added

Network construction

def extract_edges(nodes_map, comments_map): # build the list of edges edges_list = [] # a comment is 'valid' if it has a recipient and an author valid_comments = [e for e in comments_map.values() if e.get('recipient_id', None) and e.get('author_id', None)] logging.info("%(v)i valid comments on %(t)i total" % {'v':len(valid_comments), 't':len(comments_map.values())}) # build the whole network to use for metrics for comment in valid_comments: link = { 'id': "{0}_{1}_{2}".format(comment['author_id'],comment['recipient_id'],comment['created_ts']), 'source': comment['author_id'], 'target': comment['recipient_id'], 'ts': comment['created_ts'], 'effort': comment['length'], 'team': comment['team'] } if nodes_map.has_key(comment['author_id']): nodes_map[comment['author_id']]['active'] = True else: logging.info("error: node %(n)s was linked but not found in the nodes_map" % {'n':comment['author_id']}) if nodes_map.has_key(comment['recipient_id']): nodes_map[comment['recipient_id']]['active'] = True else: logging.info("error: node %(n)s was linked but not found in the nodes_map" % {'n':comment['recipient_id']}) edges_list.append(link)

return sorted(edges_list, key=eu.sort_by('ts'))

Network construction

def build_network(network): MDG=nx.MultiDiGraph()

for node in network['nodes']: MDG.add_node(node['id'], node)

for edge in network['edges']: MDG.add_edge(edge['source'], edge['target'], attr_dict=edge) set_isolated(network['nodes'], MDG) return MDG

Network construction

def extract_dpsg(mdg, ts, team=True): dg=nx.DiGraph() # add all the nodes present at the time ts for node in mdg.nodes_iter(): if mdg.node[node]['created_ts'] <= ts and (team or not mdg.node[node]['team']): dg.add_node(node, mdg.node[node]) for node in mdg.nodes_iter(): for neighbour in mdg[node].keys(): count = sum(1 for e in mdg[node][neighbour].values() if e['ts'] <= ts and (team or not e['team'])) effort = sum(e['effort'] for e in mdg[node][neighbour].values() if e['ts'] <= ts and (team or not e['team'])) team_edge = sum(1 for e in mdg[node][neighbour].values() if e['ts'] <= ts and e['team'])>0 if count > 0 and (team or not team_edge): dg.add_edge(node, neighbour, {'source': node, 'target': neighbour, 'effort': effort, 'count': count, 'team': team_edge}) return dg

•Content metrics

•Network metrics

•Number of users (active/inactive)

•Number of connections

•Number of community contributions

•Degree

•Distance

•Centrality

•Modularity

Network Metrics: Degree

•Number of inbound / outbound edges insisting on a node

Network Metrics: Distance

• The average number of hops needed to go from a randomly chosen node to another.

• A lower distance implies that information spreads more easily across the network.

Network Metrics: Centrality

• Refers to indicators which identify the most important vertices within a graph

• Betweenness Centrality: it is equal to the number of shortest paths from all vertices to all others that pass through that node.

Network Metrics: Modularity

• The difference between the observed network and a random one with the same degree distribution, on a 0-1 scale.

• Subcommunities are defined such that its members are more connected to each other than to

Network Metricsdef extract_network_metrics(mdg, ts, team=True): met = {} dsg = extract_dpsg(mdg, ts, team) if team : pre = 'full:' else: pre = 'user:' # avoid trying to compute metrics for # the case of empty networks if dsg.number_of_nodes()==0: return met met[pre+'nodes_count'] = dsg.number_of_nodes() met[pre+'edges_count'] = dsg.number_of_edges() met[pre+'density'] = nx.density(dsg) met[pre+'betweenness'] = nx.betweenness_centrality(dsg) met[pre+'avg_betweenness'] = float(sum(met[pre+'betweenness'].values()))/float(len(met[pre+'betweenness'].values())) met[pre+'betweenness_count'] = nx.betweenness_centrality(dsg, weight='count') met[pre+'avg_betweenness_count'] = float(sum(met[pre+'betweenness_count'].values()))/float(len(met[pre+'betweenness_count'].values())) met[pre+'betweenness_effort'] = nx.betweenness_centrality(dsg, weight='effort') met[pre+'avg_betweenness_effort'] = float(sum(met[pre+'betweenness_effort'].values()))/float(len(met[pre+'betweenness_effort'].values())) met[pre+'in_degree'] = dsg.in_degree() met[pre+'avg_in_degree'] = float(sum(met[pre+'in_degree'].values()))/float(len(met[pre+'in_degree'].values())) met[pre+'out_degree'] = dsg.out_degree() met[pre+'avg_out_degree'] = float(sum(met[pre+'out_degree'].values()))/float(len(met[pre+'out_degree'].values())) met[pre+'degree'] = dsg.degree() met[pre+'avg_degree'] = float(sum(met[pre+'degree'].values()))/float(len(met[pre+'degree'].values())) met[pre+'degree_count'] = dsg.degree(weight='count') met[pre+'avg_degree_count'] = float(sum(met[pre+'degree_count'].values()))/float(len(met[pre+'degree_count'].values())) met[pre+'degree_effort'] = dsg.degree(weight='effort') met[pre+'avg_degree_effort'] = float(sum(met[pre+'degree_effort'].values()))/float(len(met[pre+'degree_effort'].values()))

Exported Format{ "edges": [ { "effort": 4, "id": "2_1_1315491000", "source": "2", "target": "1", "team": false, "ts": 1315491000 }, ... ], "meta": { "generated": 1415788633 }, "metrics": [ { "ts": 1315491000, ... } ], "nodes": [ { "active": true, "created_on": "2011-09-08", "created_ts": 1315483000, "id": "1", "isolated": false, "name": "Alice", "team": true, "team_on": "2011-09-08", "team_ts": 1315483000 }, {...} ]}

Edgesense Frontend

• Single page application

•D3.js

• Sigma.js

Demo!

Dashboard: Network

•Uses sigma.js

• ForceAtlas layout *

•Contextual information

Dashboard: Metrics

• Sidebar, Bottom widgets

•Declaratively select metrics to display

<div class="small-box bg-maroon big-metric metric helped" data-metric-name="louvain_modularity" data-metric-round="3" data-help="modularity" > <div class="inner"> <h3 class="value"> </h3> <p> Modularity </p> </div> <div class="minichart"> </div></div>

Dashboard: Filters

Extras

• Twitter parser

•Gexf exporting

Drupal!

• Module to embed Edgesense

• Configurator for the backend processing

• Configurator for the dashboard

Thank you!P.S. Edgesense is opensource:

github.com/Wikitalia/edgesense