Section 8 – Ec1818 Jeremy Barofsky [email protected] March 31 st and April 1 st, 2010.

16
Section 8 – Ec1818 Jeremy Barofsky [email protected] u March 31 st and April 1 st , 2010

Transcript of Section 8 – Ec1818 Jeremy Barofsky [email protected] March 31 st and April 1 st, 2010.

Page 1: Section 8 – Ec1818 Jeremy Barofsky jbarofsk@hsph.harvard.edu March 31 st and April 1 st, 2010.

Section 8 – Ec1818

Jeremy [email protected]

March 31st and April 1st, 2010

Page 2: Section 8 – Ec1818 Jeremy Barofsky jbarofsk@hsph.harvard.edu March 31 st and April 1 st, 2010.

Section 8 Outline (lectures 15, 16)• Social Network Introduction

• Types of Networks / Graphs1) Random2) Regular3) Small-world

• Erdos / Bacon Numbers

• Review Questions?• Evaluations

• Office Hours - Thursday, 4/1/10 10-11am, outside 320 CGIS North.

Page 3: Section 8 – Ec1818 Jeremy Barofsky jbarofsk@hsph.harvard.edu March 31 st and April 1 st, 2010.

Small World Phenomenon - Milgram • Question: Probability that two randomly selected people know

each other?• Design: In 1967, Milgram sent packages to 160 random people

living in Omaha NE asking them to send the package to a friend or acquaintance they thought might know or be connected to the final individual – a stock broker in Boston. (Postcards also sent back to Harvard to track progress).

• Results: 1) Of those letters that found destination, average path length 5.5-

6. 2) Significant selection bias – in one experiment 232 of 296 were

not sent on. 3) Most of cards given to target through a few people. Experiment

with 160 packages sent, 24 reached target at his home and 16 of these were given to target by one person (nodes in network).

-Reasons for under-estimate or over-estimate of avg. path length?

Page 4: Section 8 – Ec1818 Jeremy Barofsky jbarofsk@hsph.harvard.edu March 31 st and April 1 st, 2010.

Social networks• A graph G consists of a set V(G) of vertices (or nodes)

together with a set of edges E(G) (or links) that connect vertices.

• Degree: number of edges connected to a given vertex.

• Order: the number of vertices V(G) in graph G represent its order.

• Size: the number of edges E(G) in G represents its size.

• Directed graph / undirected graph: graph is directed if all its edges are directional, ie- the network tells us not just whether people are friends but whether each person considers the other a friend. If none of edges are directional, then graph G is undirected.

Page 5: Section 8 – Ec1818 Jeremy Barofsky jbarofsk@hsph.harvard.edu March 31 st and April 1 st, 2010.

Networks / Graphs and 3 elements

Page 6: Section 8 – Ec1818 Jeremy Barofsky jbarofsk@hsph.harvard.edu March 31 st and April 1 st, 2010.

Social Networks Metrics• Characteristic path length L(G, p): measures average

distance between vertices. By distance we mean the shortest path that connects vertices v and v’.

• Clustering coefficient C(G, p): Measures a vertex / person’s level of cliquishness within its neighborhood. Answers – are the friends of my friends, my friends also?

• Formally C(G, p)= actual edges in network within its neighborhood / maximum possible edges in that neighborhood.

• Maximum number of graph edges / number of connections in social network: n(n-1)/2 where n = number of vertices.

Page 7: Section 8 – Ec1818 Jeremy Barofsky jbarofsk@hsph.harvard.edu March 31 st and April 1 st, 2010.

Types of Graphs• Regular Network: each vertex is connected to same

number k of their nearest neighbors only. All vertices have the same degree. Long characteristic path length because takes a long time to get from one vertex to another, large clustering coefficient because vertices connected to all other nearby vertices.

• Random Network: Edges between vertices occur randomly with prob. = 1/V(G). Full connectedness occurs non-linearly when Pr(connection) = 1/V(G). Small characteristic path length and clustering coefficient.

• Adjacency matrix: Way to represent network data with each row/ column representing whether those vertices have a connection.

Page 8: Section 8 – Ec1818 Jeremy Barofsky jbarofsk@hsph.harvard.edu March 31 st and April 1 st, 2010.

From Regular to Random Graphs via Small Worlds

Page 9: Section 8 – Ec1818 Jeremy Barofsky jbarofsk@hsph.harvard.edu March 31 st and April 1 st, 2010.

Regular -> Small World -> Random Graphs

• Rewire: Start with a regular graph with vertices in a circle and each connected to 4 closest neighbors. Rewire each edge at random with probability p.

• Changing p means tunes graph such that p = 0 defines a regular graph, p = 1 random.

• Watts and Strogratz define small-world networks with two characteristics:

1)Large Clustering Coefficient C(G, p) – most of my neighbors are friends and friends with me too.

2)Small Characteristic Path Length L(G, p) – Presence of random, long-distance connections mean that moving from one part of the graph to the other can be done quickly.

Page 10: Section 8 – Ec1818 Jeremy Barofsky jbarofsk@hsph.harvard.edu March 31 st and April 1 st, 2010.

Characteristic path length L(p) and clustering coefficient C(p) for rewired graphs as p varies.

(Watts and Strogatz, 1998)

Page 11: Section 8 – Ec1818 Jeremy Barofsky jbarofsk@hsph.harvard.edu March 31 st and April 1 st, 2010.

Empirical Examples (Watts and Strogatz, 1998)

Page 12: Section 8 – Ec1818 Jeremy Barofsky jbarofsk@hsph.harvard.edu March 31 st and April 1 st, 2010.

Power Laws Again? Are you serious? (Random means normal distribution and small world means power law)

Page 13: Section 8 – Ec1818 Jeremy Barofsky jbarofsk@hsph.harvard.edu March 31 st and April 1 st, 2010.

Erdos and Bacon Numbers• Small World Networks exhibit strong connections

between neighbors (cliques) but information can still travel quickly because of random connections to other highly connected groups of vertices.

• Erdos/ Bacon numbers: Level of connection in peer-reviewed journal articles or movie credits. Bacon number of 1 means individual acted in same movie as Kevin Bacon. Nearly all actors connected in this way – exhibits characteristics of small world networks.

Page 14: Section 8 – Ec1818 Jeremy Barofsky jbarofsk@hsph.harvard.edu March 31 st and April 1 st, 2010.

Is Bacon Best?

• “By processing all of the 1.6 million people in the Internet Movie Database I discovered that there are currently 506 people who are better centers than Kevin Bacon!” –Oracle of Bacon website.

• Compute average Bacon number and compare to others.

Page 15: Section 8 – Ec1818 Jeremy Barofsky jbarofsk@hsph.harvard.edu March 31 st and April 1 st, 2010.

Degree distribution of Bacon / Connery Numbers for Actors in IMD.

Bacon/ Connery Number

# of actors / Bacon

# of actors / Connery

0 1 1

1 2349 2446

2 223940 282628

3 666941 674960

4 153220 89845

5 9662 6533

6 877 598

7 134 118

8 15 10

Average 2.95 2.82

Page 16: Section 8 – Ec1818 Jeremy Barofsky jbarofsk@hsph.harvard.edu March 31 st and April 1 st, 2010.

Midterm Questions?