COMS 6998-06 Network Theory Week 4: September 29, 2010 Dragomir R. Radev Wednesdays, 6:10-8 PM 325...

20
COMS 6998-06 Network Theory Week 4: September 29, 2010 Dragomir R. Radev Wednesdays, 6:10-8 PM 325 Pupin Terrace Fall 2010

Transcript of COMS 6998-06 Network Theory Week 4: September 29, 2010 Dragomir R. Radev Wednesdays, 6:10-8 PM 325...

Page 1: COMS 6998-06 Network Theory Week 4: September 29, 2010 Dragomir R. Radev Wednesdays, 6:10-8 PM 325 Pupin Terrace Fall 2010.

COMS 6998-06 Network TheoryWeek 4: September 29, 2010

Dragomir R. RadevWednesdays, 6:10-8 PM

325 Pupin TerraceFall 2010

Page 2: COMS 6998-06 Network Theory Week 4: September 29, 2010 Dragomir R. Radev Wednesdays, 6:10-8 PM 325 Pupin Terrace Fall 2010.

(27) Self-similarity

Page 3: COMS 6998-06 Network Theory Week 4: September 29, 2010 Dragomir R. Radev Wednesdays, 6:10-8 PM 325 Pupin Terrace Fall 2010.

Similarity and self-similarity

Page 4: COMS 6998-06 Network Theory Week 4: September 29, 2010 Dragomir R. Radev Wednesdays, 6:10-8 PM 325 Pupin Terrace Fall 2010.

Sierpinski Gasket

See also Koch’s snowflake: http://en.wikipedia.org/wiki/Koch_snowflake http://www.arcytech.org/java/fractals/koch.shtml

Page 5: COMS 6998-06 Network Theory Week 4: September 29, 2010 Dragomir R. Radev Wednesdays, 6:10-8 PM 325 Pupin Terrace Fall 2010.

The Cantor set

Page 6: COMS 6998-06 Network Theory Week 4: September 29, 2010 Dragomir R. Radev Wednesdays, 6:10-8 PM 325 Pupin Terrace Fall 2010.

Measuring a fractal’s dimension

• In the Sierpinski gasket example, we need at the first step 4 triangles of side ½, at the second step we need 3 such triangles, then at the third step we need 9 triangles of side ¼.

• Let N() be the number of triangles with side 1/ . Then the fractal dimension is:

/1ln

)(lnlim

0

ND

Page 7: COMS 6998-06 Network Theory Week 4: September 29, 2010 Dragomir R. Radev Wednesdays, 6:10-8 PM 325 Pupin Terrace Fall 2010.

Box counting

N(1) = 1N(1/2) = 3N(1/4) = N((1/2)2) = 9 = 32

N(1/8) = N((1/2)3) = 27 = 33

…N((1/2)n) = 3n.

http://classes.yale.edu/fractals/FracAndDim/BoxDim/GasketBoxDim/GasketLogLog.html

Page 8: COMS 6998-06 Network Theory Week 4: September 29, 2010 Dragomir R. Radev Wednesdays, 6:10-8 PM 325 Pupin Terrace Fall 2010.

Effective fractal dimension

• For a compact triangle:– At the beginning, D = ln4/ln2– After one iteration, D = ln16/ln4 = 2

• For the Sierpinski gasket:D = ln3/ln2 = 1.5850

• For the Koch curve:D = ln4/ln3 = 1.2618

• For the Cantor set:D = ln2/ln3 = 0.6309

Page 9: COMS 6998-06 Network Theory Week 4: September 29, 2010 Dragomir R. Radev Wednesdays, 6:10-8 PM 325 Pupin Terrace Fall 2010.

A self-similar fern

Page 10: COMS 6998-06 Network Theory Week 4: September 29, 2010 Dragomir R. Radev Wednesdays, 6:10-8 PM 325 Pupin Terrace Fall 2010.

(7) Small world networks

Page 11: COMS 6998-06 Network Theory Week 4: September 29, 2010 Dragomir R. Radev Wednesdays, 6:10-8 PM 325 Pupin Terrace Fall 2010.

The idea of a small world

• Milgram’s experiment (1960s)• Send a package to a stockbroker n Boston• 296 senders• 20% reached target• Chain length (avg) = 6.5• Recent reenactment by Dodds et al.

(2003) with 18 targets, 13 countries, 60K participants, only 384 reached the target with path length of 4.

Page 12: COMS 6998-06 Network Theory Week 4: September 29, 2010 Dragomir R. Radev Wednesdays, 6:10-8 PM 325 Pupin Terrace Fall 2010.

The Watts-Strogatz model

• How to keep the diameter of a growing random graph small?

• Simple model: starts with a regular lattice.• Two parameters:

– Coordination number z: how many neighbors each node has

– Shortcuts probability p: for an existing edge, the probability to draw a shortcut between two random nodes

– Total number of shortcuts is mp=nzp/2

Page 13: COMS 6998-06 Network Theory Week 4: September 29, 2010 Dragomir R. Radev Wednesdays, 6:10-8 PM 325 Pupin Terrace Fall 2010.

The Watts-Strogatz model

Page 14: COMS 6998-06 Network Theory Week 4: September 29, 2010 Dragomir R. Radev Wednesdays, 6:10-8 PM 325 Pupin Terrace Fall 2010.

Diameter

• Example (Amaral and Barthelemy, 1999): d=1, N=1000, z=10, p=0.25: d=3.6

• If p=0.016 (=1/64), the diameter d=7.6

Page 15: COMS 6998-06 Network Theory Week 4: September 29, 2010 Dragomir R. Radev Wednesdays, 6:10-8 PM 325 Pupin Terrace Fall 2010.

Clustering coefficient

• It mirrors the underlying lattice structure.• According to (Barrat and Weigt, 2000)

• In the limit, C=3/4

3)1()12(2

)1(3p

z

zC

Page 16: COMS 6998-06 Network Theory Week 4: September 29, 2010 Dragomir R. Radev Wednesdays, 6:10-8 PM 325 Pupin Terrace Fall 2010.

Properties

large :43

large :2

/ C

K N/l

For random graphs

For lattices

N

KC

ln

ln

K

Nl

Page 17: COMS 6998-06 Network Theory Week 4: September 29, 2010 Dragomir R. Radev Wednesdays, 6:10-8 PM 325 Pupin Terrace Fall 2010.

Degree distributionFrom (Barrat and Veigt, 2000)

Page 18: COMS 6998-06 Network Theory Week 4: September 29, 2010 Dragomir R. Radev Wednesdays, 6:10-8 PM 325 Pupin Terrace Fall 2010.

Kleinberg model

• Use geographical distance (e.g., p ~1/d2)

Page 19: COMS 6998-06 Network Theory Week 4: September 29, 2010 Dragomir R. Radev Wednesdays, 6:10-8 PM 325 Pupin Terrace Fall 2010.

HW 1• Analyze a network data set• Submit a PR-style 6 page paper• Check class home page for examples and instructions

• Model papers– How to become a superhero, P. M. Gleiser, J. Stat. Mech. (2007) P09020

http://arxiv.org/abs/0708.2410 – The Political Blogosphere and the 2004 U.S. Election: Divided They Blog (2005)

http://www.blogpulse.com/papers/2005/AdamicGlanceBlogWWW.pdf – Patterns in syntactic dependency networks, Ramon Ferrer Cancho, Ricard V. Solé, and

Reinhard Köhler, PHYSICAL REVIEW E 69, 051915 (2004) http://complex.upf.es/~ricard/syntaxPRE51915.pdf

– Network properties of written human language, A. P. Masucci and G. J. Rodgers, Phys. Rev. E 74, 026102 (2006) http://arxiv.org/abs/physics/0605071

– An evaluation of human protein-protein interaction data in the public domain, BMC Bioinformatics 2006, 7(Suppl 5):S19http://www.biomedcentral.com/1471-2105/7/S5/S19/abstractDatabase: This database is hand-curated. There are around 25,000 proteins and 35,000 interactions http://www.hprd.org/download

Page 20: COMS 6998-06 Network Theory Week 4: September 29, 2010 Dragomir R. Radev Wednesdays, 6:10-8 PM 325 Pupin Terrace Fall 2010.

Examples

• program committees of conferences in NLP/CL or IR or ML • Skitter (http://www.caida.org/tools/measurement/skitter/)• syntactic dependencies • mentions of named entities in text • wikipedia • social networking sites such as myspace, facebook, linkedin, etc.. • product recommendations for sites such as amazon, ebay, clothing sites etc.. • youtube related videos • adjective/noun network • Two words are connected if one appears in the directory definition of another. • analyze the AAN author network, collaboration network, or title network (two paper titles are connected if they

share a non-stop word) • people or locations that are mentioned in the same news story • collocation networks (Dorogovtsev and Mendes) • co-occurrence or other sentence graphs • concept, thesaurus, and association graphs • citation • Web Related • similarity-based (e.g., cosine) • http://www.nd.edu/~networks/resources.htm• http://deim.urv.cat/~aarenas/data/welcome.htm• http://www-personal.umich.edu/~mejn/netdata/ • http://www.sciencemag.org/cgi/content/full/302/5651/1727