The Structure of Networks

37
The Structure of Networks with emphasis on information and social networks T-214-SINE Summer 2011 Chapter 2 Ýmir Vigfússon

description

The Structure of Networks. with emphasis on information and social networks. T-214-SINE Summer 2011 Chapter 2 Ýmir Vigfússon. Graph theory. We will develop some basics of graph theory This provides a unifying language for network structure We begin with central definitions - PowerPoint PPT Presentation

Transcript of The Structure of Networks

Page 1: The Structure of Networks

The Structure of Networkswith emphasis on information and social

networks

T-214-SINE Summer 2011

Chapter 2

Ýmir Vigfússon

Page 2: The Structure of Networks

Graph theoryWe will develop some basics of

graph theory◦This provides a unifying language for

network structure

We begin with central definitions◦Then consider the implications and

applications of these concepts

Page 3: The Structure of Networks

Graph theoryA graph or a network is a way to

specify relationships amongst a collection of items

Def: A graph consists of◦Set of objects: nodes◦Pairs of objects: edges

Def: Two nodes are neighbors if they are connected by an edge

Page 4: The Structure of Networks

Graph orientationBoth graphs have 4 nodes

(A,B,C,D) and 4 edges

The left graph is undirected◦Edges have no orientation (default

assumption)The right graph is directed

◦Edges have an orientation, e.g. edge from B to C

Page 5: The Structure of Networks

Weighted graphsEdges may also carry additional

information◦Signs (are we friends or enemies?)◦Tie strength (how good are we as friends?)◦Distance (how long is this road?)◦Delay (how long does the transmission

take?)Def: In a weighted graph, every edge

has an associated number called a weight.

Def: In a signed graph, every edge has a + or a – sign associated with it.

Page 6: The Structure of Networks

Graph representationsAbstract graph theory is interesting in itselfBut in network science, items typically

represent real-world entities◦ Some network abstractions are very commonly

usedSeveral examples (more later)

◦ Communication networks Companies, telephone wires

◦ Social networks People, friendship/contacts

◦ Information networks Web sites, hyperlinks

◦ Biological networks Species, predation (food web). Or metabolic pathways

Page 7: The Structure of Networks

ARPANET: Early Internet precursor

December 1970 with13 nodes

Page 8: The Structure of Networks

Graph representationOnly the connectivity matters

◦Could capture distance as weights if needed

Page 9: The Structure of Networks

Transportation networks

Graph terminology often derived from transportation metaphors◦E.g. “shortest path“, “flow“,

“diameter“

Page 10: The Structure of Networks

A structural network

The physics of rigidity theory is applied at every network node to design stable structures

Page 11: The Structure of Networks

An electrical network

The physics of Kirchoff‘s laws is applied at every network node and closed paths to design circuits

Page 12: The Structure of Networks

Graph concepts“Graph theory is a terminological

jungle in which every newcomer may plant a tree“ (Social scientist John Barnes)

We will focus on the most central concepts◦Paths between nodes◦Cycles◦Connectivity◦Components (and giant component)◦Distance (and search)

Page 13: The Structure of Networks

PathsThings often travel along the

edges of a graph◦Travel◦Information◦Physical quantities

Def: A path is sequence of nodes with the property that each consecutive pair in the sequence is connected by an edge◦Can also be defined as a sequence of

edges

Page 14: The Structure of Networks

Paths

MIT – BBN – RAND – UCLA is a path

Page 15: The Structure of Networks

PathsYou could also traverse the same

node multiple times◦SRI – STAN – UCLA – SRI – UTAH – MIT

This is called a non-simple pathPaths in which nodes are not

repeated are called simple paths

Page 16: The Structure of Networks

CyclesThese are ring structures that

begin and end in the same node◦LINC – CASE – CARN – HARV – BBN –

MIT – LINC is a cycleDef: A cycle is a closed path with

at least three edgesIn the 1970 ARPANET, every edge

is on a cycle◦By design. Why?

Page 17: The Structure of Networks

ConnectivityCan every node in a graph be

reached from any other node through a path?◦If so, the graph is connected

The 1970 ARPANET graph is connected

In many cases, graphs may be disconnected◦Social networks◦Collaboration networks◦etc.

Page 18: The Structure of Networks

Connectivity

Is this graph connected?What about now?

◦A,B not connected to other nodes◦C,D,E not connected to other nodes

Page 19: The Structure of Networks

ComponentsIf a graph is not connected, it tends to break

into pieces that themselves are connected

Def: The connected components of an undirected graph are groups of nodes with the property that the groups are connected, and no two groups overlap

More precisely:◦A connected component is a subset of nodes

such that (1) every node has a path to every other node in the subset, and (2) the subset is not a part of a larger set with the property that every node can reach another

Page 20: The Structure of Networks

Components

Three connected components◦{A,B}, {C,D,E}, {F,G,...,M}

Page 21: The Structure of Networks

Components: AnalysisA first global way to look at graph

structure◦For instance, we can understand

what is holding a component together

Real-world biology collaboration graph

Page 22: The Structure of Networks

Components: AnalysisAnalyzing graphs in terms of

densely connected regions and the boundaries of regions◦For instance, only include edges with

weights above a threshold, then gradually increase the threshold

◦The graph will fragment into more and more components

We will later see how this becomes an important type of analysis

Page 23: The Structure of Networks

Giant componentsMany graphs are not connected, but

may include a very large connected components◦E.g. the financial graph from last lecture,

or the hyperlink graph of the webLarge complex networks often

contain a giant component◦A component that holds a large

percentage of all nodesRare that two or more of these will

exist in a graph. Why?

Page 24: The Structure of Networks

Romantic liasons in a high school

Existence of giant component means higher risk of STDs (the object of study)

Giant component

Page 25: The Structure of Networks

DistanceDef: The distance between a pair

of nodes is the edge length of the shortest path between them◦Just number of edges. Can be

thought of as all edges having weight of 1

What‘s the distance between MIT and SDC?

Page 26: The Structure of Networks

DistanceQ: Given a graph, how do we find

distances between a given node and all other nodes systematically?◦Need to define an algorithm!

How would you approach this problem?

Page 27: The Structure of Networks

Distance: Breadth-first search (BFS)From the given node (root)

◦Find all nodes that are directly connected These are labeled as “distance 1“

◦Find all nodes that are directly connected to nodes at distance 1 If these nodes are not at distance 1, we label

them as “distance 2“◦...◦Find all nodes that are directly connected

to nodes at distance j If these nodes are not already of distance at

most j+1, we label them as „distance j+1“

Page 28: The Structure of Networks

Distance: Breadth-first search (BFS)

Page 29: The Structure of Networks

Distance: Breadth-first search (BFS)

Page 30: The Structure of Networks

Six degrees of separationExplained thoroughly in the video we

sawFirst experiment done by Stanley

Milgram in 1960s (research budget $680)◦296 randomly chosen starters. Asked to

send a letter to a target, by forwarding to someone they know personally and so on. Number of steps counted.

Hypothesis: The number of steps to connect to anyone in a typical large-scale network is surprisingly small◦We can use BFS to check!

Page 31: The Structure of Networks

Six degrees of separation

Milgram found a median hop number of 6 for successful chains – six degrees of separation◦This study has since been largely

discredited

Page 32: The Structure of Networks

Six degrees of separationModern experiment by Leskovec

and Horvitz in 2008Look at the 240 million user

accounts of Microsoft Instant Messenger

Complete snapshop (MS employees)Found a giant component with very

small distancesA random sample of 1000 users

were tested◦Why do they look only at a sample?

Page 33: The Structure of Networks

Six degrees of separation

Estimated average distance of 6.6, median of 7

Page 34: The Structure of Networks

Six degrees of geekiness

Co-authorship graph centered on Paul Erdös

Page 35: The Structure of Networks

Network data setsChapter 2 includes an overview of some

massive data sets and networksCollaboration graphs

◦Wikipedia, World of Warcraft, Citation graphsWho-talks-to-whom graphs

◦Microsoft IM, Cell phone graphsInformation linkage

◦HyperlinksTechnological networks

◦Power grids, communication links, InternetNatural and biological networks

◦Food webs, neural interconnections, cell metabolism

Page 36: The Structure of Networks

Network data setsLeskovec‘s SNAP at Stanford has

a repository of large-scale networks◦http://snap.stanford.edu/data

I also have a few more data sets that could be appropriate for the group project

Keep thinking about whether there are some cool data sets that you might have access to ...

Page 37: The Structure of Networks

RecapA graph consists of nodes and edgesGraphs can be directed or undirected,

weighted, signed or unweighted◦Paths between nodes (simple vs. non-

simple)◦Cycles◦Connectivity◦Components (and the giant component)◦Distance (and BFS)

Six degrees of separation can be checked with BFS