Networkx & Gephi Tutorial #Pydata NYC

Post on 06-May-2015

10.697 views 2 download

description

Slide deck from my presentation at NYC's #Pydata 2012 conference - http://nyc2012.pydata.org/abstracts/#gephi Talk abstract: Are you interested in working with social data to map out communities and connections between friends, fans and followers? In this session I'll show ways in which we use the python networkx library along with the open source gephi visualization tool to make sense of social network data. We'll take a few examples from Twitter, look at how a hashtag spreads through the network, and then analyze the connections between users posting to the hashtag. We'll be constructing graphs, running stats on them and then visualizing the output.

Transcript of Networkx & Gephi Tutorial #Pydata NYC

Networkx & Gephi Tutorial#pydata

Gilad Lotan | @gilgul

#gayrights, #lgbt, #jesus, #flipflop, #jobs, #economy

#palestine, #OWS, #immigration,#abortion

#republican, #dems, #economics, #amnesty

#Debates / Ohio

#Debates / Ohio

Politicos

OSU Students

Ohio based Media

• Node network properties– from immediate connections

• indegreehow many directed edges (arcs) are incident on a node

• outdegreehow many directed edges (arcs) originate at a node

• degree (in or out)number of edges incident on a node

– from the entire graph• centrality (betweenness, closeness)

outdegree=2

indegree=3

degree=5

Source: Lada Adamic (SI508-F08)

Example Graph Types

• Complete Graph

• Bipartite Graph– Vertices can be divided into two disjoint sets– Ex: students & schools

Social Network Attributes• Scale Free

– Degree distribution follows a power law– Barabasi et al (‘99): mapped the topology of a portion of

the web

• Small World– Most nodes are not neighbors, but can be reached by

small number of hops– Watts & Strogatz (’98)– Properties: cliques, sub networks with high clustering

coefficient, most pairs of nodes connected by at least one short path

(Zachary) Karate club graph

social network of friendships between 34 members of a karate club at a US university in the 1970s.

Standard test network for clustering algorithms -> during the observation period the club broke up into two separate clubs over a conflict.

Graph Measures• Centrality

– Betweenness– Closeness– Eigenvector– Degree

• Clustering Coefficient (clique)• Modularity

Graph Layout• Open Ord

– Better distinguishes clusters• Yifan Hu• Force Atlas• Fruchterman Reingold

– Graph as a system of mass particles (nodes:particles, edges:springs)

Networkx

Graph Generators

Generate Twitter Graph

graphml file

nodes

edges

Twitter Users with Python in their Bios

• 2 days of Twitter data (Oct 24th and 25th)• Total: 4246 users (62k tweets)• @mikanyan1 tweeted 795 times

Pythonistas on Twitter

Pythonistas on Twitter

English / European

Japanese

Python(the snake)

Chinese

Spanish Speakers

Musicians, Artists

Twitter User Community: Data Science

• Grepped from Twitter bios over 1 week: "data science|data scientist|machine learning|data strateg”

• 1053 Users• 14k Tweets• Most tweeting users:

– @data_nerd (659)– @Chantel_Esworth (562)– @Da5_12 (253)

Dataists on Twitter

Thank You

Gilad LotanTwitter: @gilgul

Github: giladlotan