EK Ch 17: Power laws and rich-get-richer phenomena (with an application of Web Spam detection Spam,...

Post on 16-Dec-2015

229 views 2 download

Tags:

Transcript of EK Ch 17: Power laws and rich-get-richer phenomena (with an application of Web Spam detection Spam,...

EK Ch 17: Power laws andrich-get-richer phenomena

(with an application of Web Spam detectionSpam, Damn Spam and Statistics)

Numbers

Your grades so far in this class. The weight of an apple.

The temperature in Chicago on July 4th. The height of a Dutch man. The speed of a car on I-90.

Most instances are typical.Seeing a rare number is very surprising.

These numbers are well-characterized by the average and the standard deviation.

City populations

1. New York 8,310,2122. Los Angeles 3,834,340 3. Chicago 2,836,658

230. Cambridge, MA 101,335

240. Gainesville, FL 95,447

250. McKinney, TX 54,369

A few cities with high population

Many cities with low population

City populations

Power Law: Fraction f(k) of items with popularity k is proportional to k-c.

f(k) k-c

log [f(k)] log [k-c]

log [f(k)] -c log [k]

City populations

Number of Web page in-links (Broder+)

Other examples

Length of the URL’s host

Number of host name resolutions to a single IP

Web page out-degrees

Web page in-degrees

Word count variance

Content evolution

Cluster size

… because they care to know ;-)

Why does data exhibit power laws?

Imitation Power law

Constructing the web

1. Pages are created in order, named 1, 2, …, N2. When created, page j links to a page by

a) With probability p, picking a page i uniformly at random from 1, …, j-1

b) With probability (1-p), pick page i uniformly at random and link to the page that i links too

Imitation

The rich get richer

2 b) With prob. (1-p), pick page i uniformly at random and link to the page that i links too

1/43/4

The rich get richer

2 b) With prob. (1-p), pick page i uniformly at random and link to the page that i links too

Equivalently,

2 b) With prob. (1-p), pick a page proportional to its in-degree and link to it

Food for thought

Why is Harry Potter popular?

If we could re-play history, would we still read Harry Potter, or would it be some other book?

Information cascades and the rich

Information cascade = so some people get a little bit richer by chance

and then rich-get-richer dynamics = the random rich people get a lot richer very fast

Music download site – 8 worlds

1. “Let’s go driving,” Barzin

2. “Silence is sexy,” Einsturzende Neubauten

3. “Go it alone,” Noonday Underground

10.“Picadilly Lilly,” Tiger Lillies

1. “Let’s go driving,” Barzin

2. “Silence is sexy,” Einsturzende Neubauten

3. “Go it alone,” Noonday Underground

10.“Picadilly Lilly,” Tiger Lillies

18

3

47

2