Download - Investigating Mesoscale Structures of Networks via Transport Properties

Investigating Mesoscale Structures of Networks via Transport Properties

seminar at National Institute for Mathematical Sciences (NIMS), December 10, 2014.

SHL and P. Holme, Pathlength scaling in graphs with incomplete navigational information, Physica A 390, 3996 (2011); Exploring Maps with Greedy Navigators, Phys. Rev. Lett. 108, 128701 (2012). SHL, M. Cucuringu, and M. A. Porter, Density-based and transport-based core-periphery structures in networks, Phys. Rev. E 89, 032810 (2014); SHL, M. D. Fricker, and M. A. Porter, Mesoscale Analyses of Fungal Networks, e-print arXiv:1406.5855; M. Cucuringu, M. P. Rombach, SHL, and M. A. Porter, Detection of Core-Periphery Structure in Networks Using Spectral Methods and Geodesic Paths, e-print arXiv:1410.6572; SHL, D. Kim, and H. Jeong, Is Nestedness in Networks Generalized Core-Periphery Structures?, in preparation. SHL, M. Farazmand, G. Haller, and M. A. Porter, Finding Lagrangian Coherent Structures Using Community Detection, in preparation.

Sang Hoon Lee Department of Energy Science, Sungkyunkwan University

http://sites.google.com/site/lshlj82

statistical physics: micro interactions macro

regular/random networks (interactions)

magnet gas

microscale structure

macroscale properties


macroscale properties

irregular, or complex (partially random) networks

How about this? Something new but ubiquitous topology

ref) M. E. J. Newman, Phys. Rev. E 74, 036104 (2006).

Albert

Albert

Nakarado

Barabasi

Jeong

Aleksiejuk

Holyst

Stauffer

Allaria

Arecchi

DigarboMeucci

Almaas

Kovacs

Vicsek

Oltvai

Krapivsky

Redner

Kulkarni

StroudAmaral

Scala BarthelemyStanley

Meyers

Newman

Martin

Schrag

Antal

Arenas

Cabrales

DiazGuilera

Guimera

VegaRedondo

DanonGleiser

Baiesi

Paczuski

BakSneppen

Banavar

Maritan

Rinaldo

Bianconi

Ravasz

NedaSchubert

Barahona

Pecora

Barrat

PastorSatorras

Vespignani

Weigt

Gondran

Guichard

Battiston

Catanzaro

BenNaim

Frauenfelder

Toroczkai

Berlow

BernardesCosta

Araujo

Kertesz

Capocci

Boccaletti

Bragard

Mancini

Kurths

Valladares

Osipov

Zhou

Pelaez

Maza

Boguna

Bonanno

Lillo

Mantegna

Mendoza

Hentschel

Broder

Kumar

Maghoul

Raghavan

Rajagopalan

StataTomkins

Wiener

Bucolo

Fortuna

Larosa

Buhl

Gautrais

Sole

KuntzValverde

DeNeubourg

Theraulaz

CaldarelliDeLosRios

Munoz

Coccetti

CallawayHopcrof t

KleinbergStrogatz

Watts

Camacho

Servedio

Colaiori

Caruso

Latora

Rapisarda

Tadic

CastellanoVilone

ChatePikovsky

Rudzick

ChavezHwang

Amann

ClausetMoore

CohenBenAvraham

Havlin

Erez

Cosenza

Crucitt i

Frasca

Stagni

Usai

MarchioriPorta

DaFontouraCosta

DiAmbra

DeArcangelis

Herrmann

DeFraysseix

DeLucia

Bottaccio

Montuori

Pietronero

DeMenezes

Moukarzel

Penna

DeMoura

Motter

Grebogi

Dezso

Dobr in

Beg

Dodds

Muhamad

RothmanSabel

Donetti

Dorogovtsev

Goltsev

Mendes

Samukhin

Dunne

Williams Martinez

Echenique

GomezGardenes

Moreno

Vazquez

Ergun

Rodgers

Eriksen

SimonsenMaslov

Farkas

Derenyi

FerreriCancho

JanssenKohlerFink

Johnson

Carroll

Flake

Lawrence

Giles

Coetzee

Spata

Fortunato

Fronczak

Fronczak

Jedynak

Sienkiewicz

GarlaschelliCastri

Loffredo

Gastner

Girvan

Goh

Ghim

Kahng

Kim

Lee

Oh

Floria

Gonzales

Sousa

Gorman

Gregoire

GrossKujala

Hamalainen

Timmermann

Schnitzler

Salmelin

Guardiola

Llas

Perez

Giralt

Mossa

Turtschi

Hari Ilmoniemi

Knuutila

Lounasmaa

Heagy

Herrmann

Provero

Hong

Roux

Holme

EdlingLiljeros

Ghoshal

Huss

Kim

YoonHan

TrusinaMinnhagen

Holter

Mitra

Cieplak

Fedroff

Hong

Choi

Park

LopezRuiz

Mason

Tombor

Jin

Jung

Kim

Park

Kalapala

Sanwalani

Chung

Kim

Kinney

Kumar

Leyvraz

SivakumarUpfal

Lahtinen

Kaski

Leone

Zecchina

Aberg

Liu

Lai

Hoppensteadt

Ye

Lusseau

Macdonald

Rigon

Giacometti

RodrigueziTurbe

Marodi

Dovidio

Marro

Dickman

Zaliznyak

Matthews

Mirollo

Vallone

Montoya

Moreira

AndradeGomez

Pacheco

Nekovee

VazquezPrada

Dasgupta

Nishikawa

Forrest

Balthrop

Leicht

Rho

Onnela

Chakraborti

Kanto

Jarisaramaki

RosenblumBassler

Corral

Park

Rubi

Smith

Pennock

Glover

Petermannn

Pluchino

PodaniSzathmary

PorterMucha

Warmbrand

RadicchiCecconi

Loreto

Parisi

Ramasco

Somera

Mongru

DarbyDowman

Rosvall

Rozenfeld

Schafer

Abel

Schwartz

Shefi

Golding

Segev

BenJacob

Ayali

Soffer

Kepler

Salazarciudad

Garciafernandez

Song

Makse

AharonyAdlerMeyerOrtmanns

Szabo Alava

Thurner

TassWeule

Volkmann

Freund

Tieri

Valensin

Castellani

Remondini

Franceschi

Kozma

Hengartner

Korniss

Torres

Garrido

Cancho

Vannucchi

Flammini

Vazquez

Czirok

Cohen

Shochet

Vragovic

Louis

Wuchty

Yeung

Yook

Tu

Yusong

Lingjiang

Muren

Zaks

Park

Collaborations Between Network Scientists

This figure shows a network of collaborationsbetween scientists working on networks. Itwas compiled from the bibliographies of tworeview articles, by M. Newman (SIAM Review2003) and by S. Boccaletti et al. (Physics Re-ports 2006). Vertices represent scientists whosenames appear as authors of papers in those bib-liographies and an edge joins any two whosenames appear on the samepaper. A small num-ber of other references were added by handto bring the network up to date. This figureshows the largest component of the resultingnetwork, which contains 379 individuals. Sizesof vertices are proportional to their so-calledcommunity centrality. Colors represent ver-tex degrees with redder vertices having higherdegree.

a snapshot of network of network scientists

They are everywhere, indeed.

the Internet biochemical network

brain

ad infinitum

The most complicated system in the universe known to itself

microscale structure: neuron

macroscopic structure: brain or cognition

Volume 12, Number 6, 2006 THE NEUROSCIENTIST 521

with its growth by creation of new nodes, which preferen-tially form connections to existing hubs. One fMRI studyhas reported a power law degree distribution for a func-tional network of activated voxels (Eguluz and others2005). But the degree distribution of whole-brain fMRInetworks of cortical regions has also been described asan exponentially truncated power law (Achard and oth-ers 2006), meaning broadly that the probability of veryhighly connected hubs is less in the brain than in the

WWW, but there is more probability of a hub in thebrain than in a random graph. The hubs of this networkwere predominantly regions of the heteromodal and uni-modal association cortex.

Truncated power law degree distributions are wide-spread in complex systems that are physically embeddedor constrained, such as transport or infrastructural net-works, and in systems in which nodes have a finite life span,such as the social network of collaborating Hollywood

Fig. 6. Small-world functional brain networks (Achard and others 2006). Anatomical map of a small-world human brainfunctional network created by thresholding the scale 4 wavelet correlation matrix representing functional connectivity inthe frequency interval 0.03 to 0.06 Hz. A, Four hundred five undirected edges, ~10% of the 4005 possible interregionalconnections, are shown in a sagittal view of the right side of the brain. Nodes are located according to the y and z coor-dinates of the regional centroids in Talairach space. Edges representing connections between nodes separated by aEuclidean distance 7.5 cm are blue. B, Degree distribution of a small-world brain functional network. Plot of the log of the cumulative prob-ability of degree, log(P(ki)), versus log of degree, log(ki). The plus sign indicates observed data, the solid line is the best-fitting exponentially truncated power law, the dotted line is an exponential, and the dashed line is a power law. C, Resilience of the human brain functional network (right column) compared with random (left column) and scale-free(middle column) networks. Size of the largest connected cluster in the network (scaled to maximum; y axis) versus theproportion of total nodes eliminated (x axis) by random error (dashed line) or targeted attack (solid line). The size of thelargest connected cluster in the brain functional network is more resilient to targeted attack and about equally resilient torandom error compared with the scale-free network. Reprinted from J Neurosci, 26(1), Achard S, Salvador R, WhitcherB, Suckling J, Bullmore E, A resilient, low-frequency, small-world human brain functional network with highly connectedassociation cortical hubs, 63-72, 2006, with permission from the Society for Neuroscience.

log(

cumu

lative

distr

ibutio

n)

4

3

2

1

0

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5log(k)

0.00.0

0.2

0.4

0.6

0.8

1.0

0.2 0.4 0.6 0.8 1.0

Larg

est c

lust

er s

ize

Proportion of nodes attacked

Random

0.00.0

0.2

0.4

0.6

0.8

1.0

0.2 0.4 0.6 0.8 1.0

Larg

est c

lust

er s

ize


Scale free

0.00.0

0.2

0.4

0.6

0.8

1.0

0.2 0.4 0.6 0.8 1.0

Larg

est c

lust

er s

ize


Brain

20

0 20 40 60Anterior-Posterior

80 100

30

40

50

60

70

80

Vent

ral-D

orsa

l

Parietal

Occipital

Inferior temporal

Temporal pole

Orbitofrontal

Prefrontal

PremotorSensorimotor

A B

C

at Oxford University Libraries on March 14, 2014nro.sagepub.comDownloaded from

D. S. Bassett and E. Bullmore, Small-World Brain Networks, The Neuroscientist 12, 512 (2006).

system-level approach! (including mesoscale structures)

The most complicated system in the universe known to itself

microscale structure: neuron

macroscopic structure: brain or cognition

Network terminology

Network terminologyN = |V| = 7: number of nodes# of nodes

Network terminology

M = |E| = 13: number of edges# of edgesN = |V| = 7: number of nodes# of nodes

Network terminology

1

2

3

4

5

6

7

adjacency matrixW =

0BBBBBBBB@

0 8 12 0 0 0 00 0 5 0 0 0 012 0 0 0 0 0 20 0 16 0 8 10 00 0 0 0 0 8 20 0 0 1 0 0 30 0 0 0 6 0 0

1CCCCCCCCA

M = |E| = 13: number of edges# of edgesN = |V| = 7: number of nodes# of nodes

Network terminology

1

2

3

4

5

6

7

adjacency matrixW =

0BBBBBBBB@

0 8 12 0 0 0 00 0 5 0 0 0 012 0 0 0 0 0 20 0 16 0 8 10 00 0 0 0 0 8 20 0 0 1 0 0 30 0 0 0 6 0 0

1CCCCCCCCA

2 outgoing edges (out-degree) 1 incoming edges (in-degree)


Network terminology

1

2

3

4

5

6

7

adjacency matrixW =

0BBBBBBBB@

0 8 12 0 0 0 00 0 5 0 0 0 012 0 0 0 0 0 20 0 16 0 8 10 00 0 0 0 0 8 20 0 0 1 0 0 30 0 0 0 6 0 0

1CCCCCCCCA

2 outgoing edges (out-degree) 1 incoming edges (in-degree)


macroscale structureedge density = 13/(76) 0.31

some kind of nontrivial mesoscale structure?

Network terminology

1

2

3

4

5

6

7

adjacency matrixW =

0BBBBBBBB@

0 8 12 0 0 0 00 0 5 0 0 0 012 0 0 0 0 0 20 0 16 0 8 10 00 0 0 0 0 8 20 0 0 1 0 0 30 0 0 0 6 0 0

1CCCCCCCCA

Lets explore network structures!(Even?) simple organisms use chemotaxis to find a target.

Transport/navigation on road networks


Which information do humans use?

cognitive map (map in mind/brain)


Simplify! (distance/directional information)

Which information do humans use?

cognitive map (map in mind/brain)

lamount of useful information

sear

ch t

ime real optimal path with

global information

random walk without any information

Information vs navigation efficiency (or navigability)

lamount of useful information

sear

ch t

ime real optimal path with

global information

random walk without any information

incomplete information

Information vs navigation efficiency (or navigability)

source

Greedy Spatial Navigation (GSN) protocol

SHL and P. Holme, Phys. Rev. Lett. 108, 128701 (2012); Phys. Rev. E 86, 067103 (2012); Eur. Phys. J.-Spec. Top. 215, 135 (2013).

target

source



target

12

1 > 2

source



target

source



target

GSN search

source



target

GSN search

source

target

source



target

GSN search

source

target

stuck! (nowhere to go)

source



target

GSN search

source

target


backtracking

Search

Korean Air executive could face legalaction following nuts-rage incidentChoHyunahreportedlyordersattendant,whoservedhersnackinabagratherthanonaplate,toleaveplanejustbeforetakeoff

JustinMcCurry in TokyoThe Guardian, Monday 8 December 2014 09.05 GMT

South Koreas transport ministry is investigating Korean Air for possible breaches of aviation safetyregulations over the macadamia nuts incident. Photograph: Alamy

Spare a thought for the flight attendant who fails to observe snack-serving etiquette infirst class particularly when the passenger in question happens to be the daughter ofthe airlines chief executive.

South Koreas government is investigating an air-rage incident on board a Korean Airflight from New York to Seoul last Friday that culminated in a senior member of crewbeing ordered to leave the plane by the passenger moments before it was scheduled totake off.

Reports from Seoul said Cho Hyun-ah, the airlines vice-president and the eldestdaughter of its chief executive, Cho Yang-ho, could face legal action following theincident, which forced the aircraft to stop taxiing and return to its gate at JFK airport.

Cho Hyun-ah reportedly screamed at the flight attendant, who has not been named, for

source



target

GSN search

source

target


backtracking

source

real shortest path



target

GSN search

source

real shortest path

random search



target

GSN search

real shortest path length = d

random search path length = dr

greedy spatial navigation (GSN) path length = dg

For a given network embedded in a metric space,

real shortest path length = d

random search path length = dr

greedy spatial navigation (GSN) path length = dg

For a given network embedded in a metric space,

GSN navigability = d/dg

random navigability = d/dr

GSN navigabilityrandom navigability

Application to real spatial networks

Boston road New York road

Switzerland railway European railway

data from H. Youn, M. T. Gastner, and H. Jeong, Phys. Rev. Lett. 101, 128701 (2008).

data from M. Kurant and P. Thiran, Phys. Rev. Lett. 96, 138701 (2006).

road

castle maze

BostonNew York

SwitzerlandEuropean

railway

Quantified navigability of real transportation networks

SHL and P. Holme, Phys. Rev. Lett. 108, 128701 (2012).

M = |E|: number of edges# of edgesN = |V|: number of nodes# of nodes

typeGSN navigability = d/dg(GSN)random navigability = d/dr(random) N = |V|: number of nodesM = |E|: number of edges0.84 0.19 88 155

0.82 0.15 125 217

0.32 0.06 1613 16800.35 0.03 4853 5765

0.33 0.24 184 194

network

maze in Leeds Castle, Kent, England

GSN in a maze

maze in Leeds Castle, Kent, England

GSN in a maze

d/dg = 0

.33 vs d

/dr = 0.2

4

Lets get more systematic and reliable data!(unit area, uniform criterion for selecting roads, etc.)

Merkaartor program (available on Linux, Windows, Mac OS)

the first step to get the road network data with Merkaartor:New York city case


export to osm (open street map) file



detect road patterns



detect road patterns"straighten up" the roads and take

the giant component in the unit area

Dataset: 20 largest cities in the US, Europe, Asia, Latin America, and Africa (100 cities in total)

US/Austin US/Charlotte US/Chicago US/Columbus US/Dallas US/Detroit US/ElPaso US/FortWorth US/Houston US/Indianapolis US/Jacksonville US/LosAngeles US/Memphis US/NewYork US/Philadelphia US/Phoenix US/SanAntonio US/SanDiego US/SanFrancisco US/SanJose

Africa/Abidjan Africa/Accra Africa/AddisAbaba Africa/Alexandria Africa/Algiers Africa/Cairo Africa/CapeTown Africa/Casablanca Africa/Dakar Africa/DarEsSalaam Africa/Durban Africa/Ibadan Africa/Johannesburg Africa/Khartoum Africa/Kinshasa Africa/Lagos Africa/Luanda Africa/Nairobi Africa/Pretoria Africa/Tunis

Asia/Bangkok Asia/Beijing Asia/Delhi Asia/Dhaka Asia/Guangzhou Asia/HongKong Asia/Jakarta Asia/Karachi Asia/Kolkata Asia/Manila Asia/Mumbai Asia/Nagoya Asia/Osaka Asia/Seoul Asia/Shanghai Asia/Shenzhen Asia/Taipei Asia/Tehran Asia/Tokyo Asia/Wuhan

Europe/Barcelona Europe/Berlin Europe/Brussels Europe/Bucharest Europe/Budapest Europe/Hamburg Europe/London Europe/Lyon Europe/Madrid Europe/Marseille Europe/Milan Europe/Munich Europe/Naples Europe/Paris Europe/Prague Europe/Rome Europe/Sofia Europe/Valencia Europe/Vienna Europe/Warsaw

LatinAmerica/BeloHorizonte LatinAmerica/Bogota LatinAmerica/Brasilia LatinAmerica/BuenosAires LatinAmerica/Caracas LatinAmerica/Fortaleza LatinAmerica/Guadalajara LatinAmerica/Guayaquil LatinAmerica/Lima LatinAmerica/Maracaibo LatinAmerica/Medellin LatinAmerica/MexicoCity LatinAmerica/Monterrey LatinAmerica/PortoAlegre LatinAmerica/Recife LatinAmerica/RioDeJaneiro LatinAmerica/Salvador LatinAmerica/Santiago LatinAmerica/SantoDomingo LatinAmerica/SaoPaulo

N = |V|: number of nodes

100 road network data: https://sites.google.com/site/lshlj82/road_data_2km.zip

diverse values for GSN navigability vs clear scaling for random navigability: GSN navigability reflects the real characteristics of city structures

Navigability profile for large urban areas (2 km 2 km samples)

observe that its absence helps the large volume of trafficfrom the upper left part to avoid entering the central part toreach the lower right part, and induce to take more efficientperipheral roads. Of course, the external geographic factorssuch as rivers, tunnels, bridges, and roads with variousspeed limits are also important in practice. We take thesimplest approach and assume the geographical contextprimarily gives a sense of direction for the navigation,and neglect other effects. For future work it would beinteresting to extend our work with other informationinto other navigability functions, e.g., Bureau of PublicRoads (BPR) function [16]. We also notice that road 3 inFig. 3(e) with the largest e value (and the second largest bvalue) corresponds to the Harvard bridge across theCharles River, illustrating the case of deducing the crucialinfrastructure based solely on the geometric positions,without explicit awareness of the river.The multiple linear regression results shown in Table II

demonstrate that predicting e values is not plausible fromthe linear combination of those network and geometricmeasures, with low R2 values. From the same regressionanalysis on much larger Switzerland and European rail-ways, we observe even smaller R2 values estimated by the104 sampled source-target pairs for each removal of edge.Therefore, e or the Braessiness is a uniquely measured onlyby considering this greedy behavior of navigators. Finally,we investigate whether there is any correlation betweennavigability and various socioeconomic indices. We se-lected the 20 largest cities in the United States (U.S.),Europe, Asia, Latin America, and Africa, respectively,(100 cities in total), and used the MERKAARTOR program[20] and extracted a representative sample of each city (asquare of 2 km sides). First, we compared ! and " to thenumbers of vertices N, as shown in Fig. 4. There is astriking difference between those two cases, where thereis a clear scaling relationship between " and N [Fig. 4(b)],meaning that the random navigation is statistically

determined by the system sizes. In contrast, the widelyscattered points in Fig. 4(a) strongly suggests that thenumbers of vertices cannot predict ! at all, in addition tothe fact that purely topological measures cannot predict ein Table II. In this respect, the ! obviously reflects uniqueproperties of different cities with vastly different develop-mental histories. We could not find such measures (orlinear combinations of them)e.g., population density,median resident income, fraction of public transit commut-ers, etc.showing statistically significant correlationswith the navigability. Again this leads to the conclusionthat different cities have unique properties of navigabilityindependent of other socioeconomic factors. One exampleis the correlation between the navigability and the popula-tion change ratio of the 20 cities in the U.S. defined asthe ratio of the population change between 1960 and2010 to the population in 1960 [21]. We observe a veryweak negative correlation between ! and the ratio[R2 0:093$too weak perhaps for claiming a mean-ingful conclusion dependence.In summary, we have introduced a new routing strategy

incorporating greedy movement and memory of naviga-tors. This strategy, we believe, is a minimal model consid-ering the basic concept of human psychology fornavigation, namely, incomplete navigational informationand the memory not to be lost. From the results from real-world road and railway structures, we demonstrate theimportant difference in terms of centralities for navigationand the fact that there exists the celebrated Braesss para-dox caused by the navigators behavior just equipped withthis simple strategy. From the observation of correlationprofiles for centralities in road structures, we have shownthat the importance of each element heavily depends on thedetailed layout of structures. We have focused on the finalefficiency of the routing processes in this work, but thedetailed process of GSN, e.g., the relative distance towardthe target during the routing process or the prevalence ofbacktracking related to the structural properties of roads,can be worthwhile future work. This type of toollinkingspatial cognition, the environment, and emergent naviga-tional propertiescan be helpful for urban planners andarchitects [22].This research is supported by the Swedish Research

Council and theWCU program through NRF Korea fundedby MEST R31-2008-10029 (P. H.). The authors thank

TABLE II. Coefficients for the multiple linear regression e m1bm2length m3cm4kikj for road networks, withsome measures defined on edges: b, the edge length, the distancec from the midpoint of edges to the centroid of vertices, and theproduct kikj of degrees of vertices attached to edges. Thestatistical significance codes are

mesoscale structures of a network in terms of transport

1

2

3

4

5

6

7

two modes or modules (or communities)

mesoscale structures of a network in terms of transport

1

2

3

4

5

6

7

R. Lambiotte, J.-C. Delvenne, and M. Barahona, arXiv:0812.1770; M. Rosvall and C. T. Bergstrom, PNAS 104, 7327 (2007); PNAS 105, 1118 (2008).

Community structure in networks adjacency matrix

0 20 40 60 80 100

0

10

20

30

40

50

60

70

80

90

100

nz = 2730

p1=0.5, p2=0.05, p3=0.5; pS=0, dS=0

Community structure in networks adjacency matrix

modularity (the objective function to be maximized)

Q =1

2

Xij

Wij sisj

2

(gi, gj)

gi: the community to which node i belongs: the sum of weights in the network

M. A. Porter, J.-P. Onnela, and P. J. Mucha, Not. Am. Math. Soc. 56, 1082 (2009); S. Fortunato, Phys. Rep. 486, 75 (2010).

si =P

j Wij

0 20 40 60 80 100

0

10

20

30

40

50

60

70

80

90

100

nz = 2730

p1=0.5, p2=0.05, p3=0.5; pS=0, dS=0

Core-periphery structure in networks

P. Csermely, A. London, L.-Y. Wu, and B. Uzzi, J. Complex Networks 1, 93 (2013); M. P. Rombach, M. A. Porter, J. H. Fowler, and P. J. Mucha, SIAM J. App. Math 74, 167 (2014).

SHL, M. Cucuringu, and M. A. Porter, Phys. Rev. E 89, 032810 (2014); M. Cucuringu, M. P. Rombach, SHL, and M. A. Porter, e-print arXiv:1410.6572.

adjacency matrix

0 20 40 60 80 100

0

10

20

30

40

50

60

70

80

90

100

nz = 2258

p1=0.5, p2=0.2, p3=0.02; pS=0, dS=0




adjacency matrix

0 20 40 60 80 100

0

10

20

30

40

50

60

70

80

90

100

nz = 2258

p1=0.5, p2=0.2, p3=0.02; pS=0, dS=0

core

periphery

core

core

periphery

periphery




adjacency matrix

0 20 40 60 80 100

0

10

20

30

40

50

60

70

80

90

100

nz = 2258

p1=0.5, p2=0.2, p3=0.02; pS=0, dS=0

core

periphery

core

core

periphery

periphery

Wait, did this graph visualization layout just successfully detect the communities and core-periphery separation?!




adjacency matrix

0 20 40 60 80 100

0

10

20

30

40

50

60

70

80

90

100

nz = 2258

p1=0.5, p2=0.2, p3=0.02; pS=0, dS=0

core

periphery

core

core

periphery

periphery


Next: Stress Majorisation Up: Further work on the Previous: Further work on the

Kamada-Kawai Model

In their paper, Kamada and Kawai [25] put forward that in some scenarios, thereduction of the number of edge crossings that a graph possesses is not a goodaesthetic criterion for a layout algorithm to implement. They state that the totalbalance of the layout which is related to the individual characteristics of the graph isjust as important, or can be considered more important than the reduction of edgecrossings in the graph given a particular scenario. Kamada and Kawai calculates thetotal balance of the graph, as the square summation of the differences between theideal distance and the actual distance for all vertices by calculating:

(2.1)

for some pair of nodes and , where is the ideal distance between vertices

corresponding to the shortest path between those vertices, is the set of 2D or 3Dcoordinates, and Kamada and Kawai choose , where as Cohen in [4]

chose or . Choosing seems to produce the best layout as evidenced in[3,23]. Kamada and Kawai use the Newton-Raphson[10] method to optimise withrespect to a single vertex. By iteratively solving for each vertex the overall stress isreduced.

By approximating and minimising the stress in Eq , the Kamada-Kawai methodpreserved the total balance of a graph, and produces layouts with small amounts ofedge crossings.

Next: Stress Majorisation Up: Further work on the Previous: Further work on the2006-11-07




adjacency matrix

0 20 40 60 80 100

0

10

20

30

40

50

60

70

80

90

100

nz = 2258

p1=0.5, p2=0.2, p3=0.02; pS=0, dS=0

core

periphery

core

core

periphery

periphery


Next: Stress Majorisation Up: Further work on the Previous: Further work on the

Kamada-Kawai Model

In their paper, Kamada and Kawai [25] put forward that in some scenarios, thereduction of the number of edge crossings that a graph possesses is not a goodaesthetic criterion for a layout algorithm to implement. They state that the totalbalance of the layout which is related to the individual characteristics of the graph isjust as important, or can be considered more important than the reduction of edgecrossings in the graph given a particular scenario. Kamada and Kawai calculates thetotal balance of the graph, as the square summation of the differences between theideal distance and the actual distance for all vertices by calculating:

(2.1)

for some pair of nodes and , where is the ideal distance between vertices

corresponding to the shortest path between those vertices, is the set of 2D or 3Dcoordinates, and Kamada and Kawai choose , where as Cohen in [4]

chose or . Choosing seems to produce the best layout as evidenced in[3,23]. Kamada and Kawai use the Newton-Raphson[10] method to optimise withrespect to a single vertex. By iteratively solving for each vertex the overall stress isreduced.

By approximating and minimising the stress in Eq , the Kamada-Kawai methodpreserved the total balance of a graph, and produces layouts with small amounts ofedge crossings.

Next: Stress Majorisation Up: Further work on the Previous: Further work on the2006-11-07

4000 S.H. Lee, P. Holme / Physica A 390 (2011) 39964001

Source

Target

21

1624

7

01

1012

29

4

25

8 18

19

13

3

9

26random DFSbiased DFSshortest path

Fig. 4. (Color online) Illustration of two geometric routing processes: biased DFS, random DFS, and the shortest path. Here the source vertex is 26, and theagents aim to find 9. In bDFS, the packet selects the neighbor 18 of the vertex 25, because it is geometrically closer to the target. This causes a suboptimalrouting of 4 steps, instead of 3 steps. However, bDFS is more efficient than a random DFS realization of 14 steps using no geometric information.

a b c

Fig. 5. (Color online) Average navigation pathlengths (defined as number of hopping) for geometric routing: bDFS vs. random DFS. The upper and lowerfields represent a linear and logarithmic scaling, respectively. (a) BA model [15] withm = 2, (b) HK model [16] withm = 2, (c) WS model [22] with k = 4and p = 0.1. All the error bars from standard error are smaller than the symbols.

4. Routing by geometric information

Another way of assigning information to vertices than the indices of ASU and ASD, is to embed the vertices in space.In principle the methods are similar the geographic information becomes a point in R2 rather than an integer index. Ifthe agent knows the coordinates of the target it can steer toward it in Euclidean space just like in graph space. In a recentpaper, Bogu et al. use geographic embedding of graph models as a basis for a routing strategy [3,4]. In real navigation,the knowledge about the direction to the target is available even if global information about the structure of connectionis unknown. If a graphs layout in Euclidean space reflects the graph distances (so that a short Euclidean distance means ashort graph distance) then the geographic information could bemore valuable than the indices of the ASU and ASD schemes one could just go toward the target as straight as the layout permits.

The geometric routing strategy we use in this work, for graphs embedded in Euclidean space, makes a DFS of the targetwhere every step down the search tree goes to the vertex closest to the target. If there is no unvisited vertex available at adeeper level, the agent backtracks to the deepest level above with an available vertex to step down to. Note that accordingto this rule, if a target vertex is in the neighbor of a current vertex, the target vertex is selected in the next step because theEuclidean distance to the target, zero, will be the lowest. We call this navigation strategy biased DFS (bDFS), and investigateits performance compared to an unbiased random DFS. The routing strategies on a small graph are illustrated in Fig. 4,where the success of bDFS in Fig. 4 is more than just coincidences. The graph layout was done by the Neato layout ofthe Graphviz software (which is essentially the KamadaKawai algorithm [21]). One of the goals of the KamadaKawailayout is that vertices close in graph distance should be geometrically close too. Logically, moving closer in geometry shouldcorrespond to moving closer in graph space as well, which in turn helps the DFS routing protocol. Again, we emphasize thatthe graph layout process itself can be time-consuming especially for large graphs, it is our main interest that the situationwhen the graph is used by individual agents with the pieces of information provided.

4.1. Simulation results on model networks

In Fig. 5, we investigate the performance of bDFS relative to random DFS and the average distances (lengths of shortestpaths). In this case we replace the ER model graphs by the WattsStrogatz (WS) small-world network model [22], to

value of the degree k (in particular, k20) [34] and rich club[39,40] but also by a knotty center of nodes that have a highgeodesic betweenness centrality but not necessarily a high degree[36]. A k-core decomposition has also been applied to functionalbrain imaging data to demonstrate a relationship between networkreconfiguration and errors in task performance[41].

A novel approach that is able to overcome many of theseconceptual limitations is the geometrical core-score [30], which isan inherently continuous measure, is defined for weightednetworks, and can be used to identify regions of a network corewithout relying solely on their degree or strength (i.e., weighteddegree). Moreover, by using this measure, one can produce (i)continuous results, which make it possible to measure whether abrain region is more core-like or periphery-like; (ii) a discreteclassification of core versus periphery; or (iii) a finer discretedivision (e.g., into 3 or more groups). In addition, this method canidentify multiple geometrical cores in a network and rank nodes interms of how strongly they participate in different possible cores.This sensitivity is particularly helpful for the examination of brainnetworks for which multiple cores are hypothesized to mediatemultimodal integration [42]. In this paper, we have demonstratedthat functional brain networks derived from task-based dataacquired during goal-directed brain activity exhibit geometricalcore-periphery organization. Moreover, they are specificallycharacterized by a straightforward core-periphery landscape thatincludes a relatively small core composed of roughly 10% or so ofthe nodes in the network.

In this paper, we have introduced a method and associateddefinitions to identify a temporal core-periphery organization basedon changes in a nodes module allegiance over time. We havedefined the notion of a temporal core as a set of regions thatexhibit fewer changes in module allegiance over time thanexpected in a dynamic-network null model. Neurobiologically,the temporal core contains brain areas that show consistent task-based mesoscale functional connectivity over the course of anexperiment , and it is therefore perhaps unsurprising that their

Figure 6. Relationship between temporal and geometrical core-periphery organizations. A strong negative correlation exists betweenflexibility and the geometrical core score for networks constructed from blocks of (A) extensively, (B) moderately, and (C) minimally trained sequenceson scanning session 1 (day 1; circles), session 2 (after approximately 2 weeks of training; squares), session 3 (after approximately 4 weeks of training;diamonds), and session 4 (after approximately 6 weeks of training; stars). This negative correlation indicates that the temporal core-peripheryorganization is mimicked in the geometrical core-periphery organization and therefore that the core of dynamically stiff regions also exhibits denseconnectivity. We show temporal core nodes in cyan, temporal bulk nodes in gold, and temporal periphery nodes in maroon. The darkness of datapoints indicates scanning session; darker colors indicate earlier scans, so the darkest colors indicate scan 1 and the lightest ones indicate scan 4. Thegrayscale lines indicate the best linear fits; again, darker colors indicate earlier scans, so session 1 is in gray and session 4 is in light gray. The Pearsoncorrelation between the flexibility (averaged over 100 multilayer modularity optimizations, 20 participants, and 4 scanning sessions) and thegeometrical core score (averaged over 20 participants and 4 scanning sessions) is significant for the EXT (r : {0:92, p : 3:4|10{45), MOD(r : {0:93, p : 2:2|10{49), and MIN (r : {0:93, p : 4:8|10{50) data.doi:10.1371/journal.pcbi.1003171.g006

Figure 7. Core-periphery organization of brain dynamicsduring learning. The relationship between temporal and geometricalcore-periphery organization and their associations with learning arepresent in individual subjects. We represent this relationship usingspirals in a plane; data points in this plane represent brain regionslocated at the polar coordinates (fs, {f k), where f is the flexibility ofthe region, s is the skewness of flexibility over all regions, and k is thelearning parameter (see the Materials and Methods) that describes eachindividuals relative improvement between sessions. The skewnesspredicts individual differences in learning; the Spearman rankcorrelation is r : {0:480 and p : 0:034. Poor learners (straighterspirals) tend to have a low skewness (short spirals), whereas goodlearners (curvier spirals) tend to have high skewness (long spirals). Colorindicates flexibility: blue nodes have lower flexibility, and brown nodeshave higher flexibility.doi:10.1371/journal.pcbi.1003171.g007

Core-Periphery Organization of Brain Dynamics

PLOS Computational Biology | www.ploscompbiol.org 7 September 2013 | Volume 9 | Issue 9 | e1003171

D. S. Bassett, N. F. Wymbs, M. A. Porter, P. J. Mucha, J. M. Carlson, and S. T. Grafton, PNAS 108, 7641 (2011); D. S. Bassett, N. F. Wymbs, M. P. Rombach, M. A. Porter, P. J. Mucha, and S. T. Grafton, PLOS Comput. Biol. 9, e1003171 (2013).

anatomy where few modules uncovered at large spatial scales arecomplemented by more modules at smaller spatial scales (27).

Dynamic Modular Structure.We next consider evolvability, which ismost readily detected when the organism is under stress (29) orwhen acquiring new capacities such as during external training inour experiment. We found that the community organization ofbrain connectivity reconfigured adaptively over time. Using a re-cently developed mathematical formalism to assess the presenceof dynamic network reconfigurations (25), we constructed multi-layer networks in which we link the network for each time window(Fig. 3A) to the network in the time windows before and after(Fig. 3B) by connecting each node to itself in the neighboring win-dows. We then measured modular organization (3032) on thislinked multilayered network to find long-lasting modules (25).

To verify the reliability of our measurements of dynamic mod-ular architecture, we introduced three null models based on per-mutation testing (Fig. 3C). We found that cortical connectivity isspecifically patterned, which we concluded by comparison to aconnectional null model in which we scrambled links betweennodes in each time window (33). Furthermore, cortical regionsmaintain these individual connectivity signatures that definecommunity organization, which we concluded by comparison toa nodal null model in which we linked a node in one time win-dow to a randomly chosen node in the previous and next timewindows. Finally, we found that functional communities exhibita smooth temporal evolution, which we identified by comparingdiagnostics computed using the true multilayer network structureto those computed using a temporally permuted version (Fig. 3D).We constructed this temporal null model by randomly reorderingthe multilayer network layers in time.

By comparing the structure of the cortical network to thoseof the null models, we found that the human brain exhibited aheightened modular structure in which more modules of smallersize were discriminable as a consequence of the emergence andextinction of modules in cortical network evolution. The statio-narity of communities, defined by the average correlation be-tween partitions over consecutive time steps (34), was also higherin the human brain than in the connectional or nodal null models,indicating a smooth temporal evolution.

Learning. Given the dynamic architecture of brain connectivity, itis interesting to ask whether the specific architecture changes

A

B

Fig. 1. Structure of the investigation. (A) To characterize the network struc-ture of low-frequency functional connectivity (24) at each temporal scale,we partitioned the raw fMRI data (Upper Left) from each subjects brain intosignals originating from N 112 cortical structures, which constitute the net-works nodes (Upper Right). The functional connectivity, constituting the net-work edges, between two cortical structures is given by a Pearson correlationbetween the mean regional activity signals (Lower Right). We then statisti-cally corrected the resulting N N correlation matrix using a false discoveryrate correction (54) to construct a subject-specific weighted functional brainnetwork (Lower Left). (B) Schematic of the investigation that was performedover the temporal scales of days, hours, and minutes. The complete experi-ment, which defines the largest scale, took place over the course of threedays. At the intermediate scale, we conducted further investigations ofthe experimental sessions that occurred on each of those three days. Finally,to examine higher-frequency temporal structure, we cut each experimentalsession into 25 nonoverlapping windows, each of which was a fewminutes induration.

A C

B

Fig. 2. Multiscale modular architecture. (A) Results for the modular decomposition of functional connectivity across temporal scales. (Left) The network plotsshow the extracted modules; different colors indicate different modules and larger separation between modules is used to visualize weaker connectionsbetween them. (A) and (B) correspond to the entire experiment and individual sessions, respectively. Boxplots show the modularity index Q (Left)and the number of modules (Right) in the brain network compared to randomized networks. See Materials and Methods for a formal definition of Q.(C) Modularity index Q and the number of modules for the cortical (blue) compared to randomized networks (red) over the 75 time windows. Error barsindicate standard deviation in the mean over subjects.

7642 www.pnas.org/cgi/doi/10.1073/pnas.1018985108 Bassett et al.

edge: functional connectionnode: brain region; ROI (regions of interest)

Core-periphery structure in functional brain networks

edge-density-based definition: Core Score (CS) for nodes

R(,) =Xi,j

WijCi(,)Cj(,)

core quality

optimization: deciding the node sequencethat maximizes R(,)i = (1, , N)

final (normalized) core score

i

Ci(,)

bNc

core

periphery N

1

(1 )/2(1 + )/2

for 2 [0, 1] and 2 [0, 1]

M. P. Rombach, M. A. Porter, J. H. Fowler, and P. J. Mucha, SIAM J. App. Math 74, 167 (2014).

: core vectorCore-periphery structure from edge density

SHL, M. Cucuringu, and M. A. Porter, Phys. Rev. E 89, 032810 (2014).

0 20 40 60 80 100

0

10

20

30

40

50

60

70

80

90

100

nz = 2258

p1=0.5, p2=0.2, p3=0.02; pS=0, dS=0

core periphery

CS(i) = ZX(,)

Ci(,)R(,)

backup-pathway-based definition of coreness measure: Path Score (PS) for nodes and edges

j

kPS(i) =

1

|E|X

(j,k)2E

X{pjk}

jik[E \ (j, k)]

where jik[E \ (j, k)] = 1/|{pjk}| if node i is in theset {pjk} that consists of optimal backup pathsfrom node j to node k, where we stress thatthe edge (j, k) is removed from E,and jik[E \ (j, k)] = 0 otherwise.

the set of edges: E = {(j, k)| where node j is connected to k}

Core-periphery structure from transport


can be the actual shortest path or GSN, or even random search

core

periphery

backup-pathway-based definition of coreness measure: Path Score (PS) for nodes and edges

j

k

+1

+1

+1+1

+1

PS(i) =1

|E|X

(j,k)2E

X{pjk}

jik[E \ (j, k)]

where jik[E \ (j, k)] = 1/|{pjk}| if node i is in theset {pjk} that consists of optimal backup pathsfrom node j to node k, where we stress thatthe edge (j, k) is removed from E,and jik[E \ (j, k)] = 0 otherwise.

the set of edges: E = {(j, k)| where node j is connected to k}

added for all theedges (j, k) 2 E



can be the actual shortest path or GSN, or even random search

0 50 100

0

20

40

60

80

100 20 0 20 40 60 800

2

4

6

8

0 50 100

0

20

40

60

80

100

(a) p1 = 0.7, p2 = 0.7, p3 = 0.2

0 20 40 60 80 100

0

20

40

60

80

100 20 0 20 40 60 800

2

4

6

8

0 50 100

0

20

40

60

80

100

(b) p1 = 0.8, p2 = 0.6, p3 = 0.4

Fig. 5.1. First: original adjacency matrices corresponding to the random ensemble with edge probabilities p1, p2, p3 (for the core-core, core-periphery, and periphery-periphery edges). Second: plot of the spectrum of the initial adjacency matrices. Third: grayscaleplot of the rank-2 projection. Fourth: plot of the rank-2 projection matrix, after rounding the enties to nearest integer.

8

sequence, in other words, look for a natural separation between the high and low P-SCORES, if one exists. An1alternative approach would be to detect two clusters in this one-dimensional vector, using a clustering algorithm2such as the popular kmeans. The example in Figure 3.2, corresponding to the ensemble generated by k = 1.73(p = [0.7225, 0.425, 0.25] illustrates very well this heuristic, as there exists a natural cut point corresponding to a4P-SCORE of approximatively = 20, which correctly assigns the first 50 nodes to the core and the remaining 505nodes to the periphery.6

Unfortunately, for very noisy networks, with less clear separation between the edge probabilities p1, p2, p3, such7a heuristic may yield unsatisfactory results and a more systematic approach is desirable. To this end, we introduce8in Section 4 the FIND-CUT algorithm which maximizes a suitably chosen objective function for the core-periphery9problem. Using the vector of P-SCORE values as an input, or any other vector of scores that reflect the likelihood10of each node of being in the core, we find the optimal separation of nodes into core and periphery that maximizes11the objective function ??.

20 40 60 80 1000

10

20

30

40

50

60

Fig. 3.2. P-SCORES (sorted in decreasing order) for the graph ensemble with parameter k = 1.7 (p = [0.7225, 0.425, 0.25]. Anatural separation of the nodes into core and periphery is obtained when using a threshold 20.

12

We summarize the approach of PATH-CORE in Algorithm 1, for the case of unweighted and undirected graphs,13although the approach can be easily generalized for both weighted and directed graphs [15].

Algorithm 1 PATH-CORE: Detecting the core-periphery structure of a graph, where C denotes the set of corenodes, and P the set of periphery nodes.

Require: Simple undirected graph G = (V,E) with n and m edges. Initialize the P-SCORES of all nodes to 0.1: For each edge (i, j) E compute the shortest path in Gij , i.e., in the graph G with edge (i, j) temporarily

removed. All nodes contained on the above shortest path increase their P-SCORE value by 1.2: If d is known, identify the set of core nodes as the top d with largest P-SCORE values.3: If d is unknown, use the P-SCORE values as an input to the FIND-CUT algorithm.

14

3.1. SPARSE-PATH-CORE and computational complexity. In the PATH-CORE algorithm, the num-15ber of shortest paths we compute is equal to the number of edges in the graph, a task which may be prohibitive for16dense graphs where the number of edges is significantly larger than the number of nodes. An alternative approach17would be to randomly sample edges in the graph G, and compute shortest paths only for these pairs of adjacent18nodes. In other words, we pick an edge (i, j) E with probability , compute the shortest path Pij in Gij between19nodes i and j, and increase the P-SCORE values of nodes in Pij by one unit. Note that we do not further sparsify20the graph G, but only choose to compute shortest paths between a subset of the adjacent nodes. This approach has21the potential of significantly reducing the computational complexity of the algorithm, thus making it amenable to22very large networks. The drawback of this approach is that the variance of the P-SCORE values would increase, and23some nodes may cross over the threshold, and thus be classified incorrectly. Investigating the trade-o between24accuracy and computational eciency, and the impact of the size of the core on the performance of the algorithm25are other questions of interest, that we defer to future research.26

For simplicity, we compute the shortest path between pairs of (adjacent) nodes in an undirected unweighted27

4

node index (sorted by PS)PS


Core-periphery structure from transportZacharys karate club network with the PS values on the nodes and edges


Zacharys karate club network with a community structure

Zacharys karate club network with the PS values on the nodes and edges


Zacharys karate club network with a community structure

Zacharys karate club network with the PS values on the nodes and edges

?!

Real Data

Real Data

4

!!!!!!!!!!!!!!!!!!!!!!!

Entrance*hole*2*

Initial,area,,2,rabbits,

Not!sure!!failed!bolt!hole?!Collapsed!+!widened!out.!Could!be!passing!place!Secondary!entrance!

Exploratory!tunnel!!hit!rock?!

Secondary,area,

Tertiary,area,or,extension,of,secondary,

PRIMARY*HUB*

SECONDARY*HUB*

A,B,

C,

Resting,area,

Entrance*hole*1*

Entrance*hole*3*

FIG. 2: (color online) The photo (courtesy of Hannah Sneyd) of rabbit warren excavation site, where some known characteristic regions aremarked.

!Entrance!hole!1!

Entrance!hole!2!Breeding!chamber!

No!idea!

Bolt!hole!

PRIMARY*HUB*

SECONDARY*HUB*

Tertiary!Entrance!Secondary!entrance!!

Possible!boltEhole!turned!entrance!!Secondary!entrance!!

C,

B,

A,

D,

Tertiary!!entrance!Tertiary!Entrance! Tertiary!Entrance!

!

Resting!area!

PRIMARY*HUB*

SECONDARY*HUB*

Collapsed*+*widened*out.*Could*be*passing*place*

E,

Entrance!

FIG. 3: (color online) The rabbit warrens 3D structures (courtesy of Hannah Sneyd) from dierent angles, where some important locationsfor transportation and breeding chambers AE are indicated.

between banks (Wb0b from bank b to b0) should refer to theinterbank exposures from the lending bank to the borrowingbank in principle, but the data is only available in the unit of acountry c to a bank b, Ecb =

Pb02C(c)Wb0b where the set C(c)

is composed of the list of banks belonging to the country c.Therefore, for each Ecb, we equally distribute it to each bankb0 in that country c (except for the lending bank b itself) asWb0b = Ecb/|C(c) \ {b}|.

Figure 7 shows the CS (the resolution = = 0.01 for 2 [0, 1] and 2 [0, 1] is used) and PS (with optimal path-ways maximizing the sum of weights) of the interbank net-

work, where a few very large PS values dominate the system:GB089, BE004, FR013, DE017, and ES060 (sorted by the PSvalues). Similar to another non-transportation-based dolphinsocial network in Sec. III A, the CS and PS are correlated toeach other (Pearson: 0.430, Spearman: 0.499) more than orcomparable to CS vs BC (Pearson: 0.122, Spearman: 0.538)and PS vs BC (Pearson: 0.212, Spearman: 0.573).

Rabbit warren as a 3D road network

Stock-market correlation network

Fungal networks

Brain

non-transportation networks

transportation networks

Urban (2D) road networks


data: a complete, undirected, and weighted stock-market network of Standard and Poor (S&P) 500 indices and exchange-traded funds (ETFs). data source: http://finance.yahoo.com/

larger Core Score (CS) larger Path Score (PS)




The nodes sorted by the core and path scores, in the 478 S&P 500 constituents and 26 exchange-trade funds (ETFs) correlation network (504 indices in total)

ETFs

individual stocks




The nodes sorted by the core and path scores, in the 478 S&P 500 constituents and 26 exchange-trade funds (ETFs) correlation network (504 indices in total)

ETFs

individual stocks

DENSITY-BASED AND TRANSPORT-BASED CORE- . . . PHYSICAL REVIEW E 89, 032810 (2014)

TABLE III. Top core nodes in the S&P 500 and exchange-traded funds (ETFs) correlation network. We show the rank ordering based onboth CS and PS values, and we mark ETFs with a symbol.

Rank CS (value) PS (value)1 Vanguard Large-Cap Index Fund (1.000) Guggenheim S&P 500 Equal Weight (3.42 101)2 Guggenheim S&P 500 Equal Weight (0.999) iShares Russell 1000 Index Fund (9.52 102)3 iShares Russell 1000 Index Fund (0.992) Vanguard Large-Cap Index Fund (8.90 102)4 iShares Core S&P 500 ETF (0.990) iShares Core S&P 500 ETF (8.82 102)5 SPDR S&P 500 ETF (0.982) Consumer Discret Select Sector SPDR (7.72 102)6 iShares S&P 100 Index Fund (0.979) Financial Select Sector SPDR (4.80 102)7 iShares Morningstar Large Core Index Fund (0.978) Energy Select Sector SPDR (3.91 102)8 First Trust Large Cap Core AlphaDEX Fund (0.978) SPDR S&P 500 ETF (3.77 102)9 Vanguard Mega Cap ETF (0.971) Utilities Select Sector SPDR (3.35 102)

10 RevenueShares Large Cap Fund (0.968) Industrial Select Sector SPDR (3.00 102)11 Consumer Discret Select Sector SPDR (0.967) Health Care Select Sector SPDR (2.50 102)12 Industrial Select Sector SPDR (0.963) Consumer Staples Select Sector SPDR (2.46 102)13 Financial Select Sector SPDR (0.961) Technology Select Sector SPDR (2.33 102)14 Guggenheim Russell Top 50 ETF (0.957) Technology Select Sector SPDR (2.08 102)15 PowerShares Value Line Timeliness Select Portfolio (0.955) iShares S&P 100 Index Fund (1.60 102)16 Technology Select Sector SPDR (0.951) iShares Morningstar Large Core Index Fund (2.43 103)17 Technology Select Sector SPDR (0.950) Vanguard Mega Cap ETF (2.11 103)18 iShares KLD Select Social Index Fund (0.947) Guggenheim Russell Top 50 ETF (2.06 103)19 Energy Select Sector SPDR (0.947) First Trust Large Cap Core AlphaDEX Fund (2.00 103)20 Invesco Ltd. (0.932) Vornado Realty Trust (9.86 104)

the correlation between CS and PS is much stronger for Wthan that for Wraw, so our two coreness values are moreconsistent with each other for the normalized flow than forthe raw flow. We also observe a correlation between corenessand county population for both W and Wraw. (The correlationvalues are larger for the latter; this is understandable, giventhe normalization by populations for the former.) Therefore,even after the normalization of the flow by the populations ofsource and target counties, more populous counties also tend

to be core counties (see Table II). As shown in Table IV, thedifferent choices of flow and coreness measures yield ratherdifferent results when aggregated at the state level (althoughWashington D.C. has the top coreness value in every case).

It is useful to compare our observations to the intrastateversus interstate migration patterns that were discussed inRef. [63], which reported that the top 14 states with maximumratio degree (i.e., the ratio of incoming flux to outgoing flux)are (in order) Virginia, Michigan, Georgia, Indiana, Texas,

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

-14

-12

-10

-8

-6

-4

-2

0

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1

-14

-12

-10

-8

-6

-4

-2(a) (b)

(c) (d)

FIG. 3. (Color online) Core values for (a,b) normalized flow W and (c,d) raw flow Wraw for migration between US counties. We color thecounties according to their (a,c) CS values and (b,d) ln(PS+ 106). We use the logarithm because of the heterogeneity in the PS values. Wealso indicate the state boundaries.

032810-7

Urban (2D) road networks: West End area, London

optimal-path-basedPath Score (PS)

Core Score (CS) GSN-path-basedPath Score (GSNP)

11

(a) (b) (c)

FIG. 6. A square sample (2 km 2 km) of the road network in London. We color (a) the nodes based on their CS values, (b) the nodes andedges based on their geodesic PS values, and (c) the nodes and edges based on their GSNP values.

more detail in Sec. IVC by examining networks produced bygenerative models for 2D and 3D roadlike networks.

C. Generative Models for 2D and 3D Road-Like Networks

To examine correlations between the coreness measuresand BC values in roadlike networks, we generate 2D and 3Droadlike structures from a recently introduced navigability-based model for road networks [78]. We start by determiningthe locations of nodes either in the unit square (for 2D road-like networks) or in the unit cube (for the 3D case). We thenadd edges by constructing a minimum spanning tree (MST)via Kruskals algorithm [79]. Let lMST denote the total (Eu-clidean) length of the MST. We then add the shortcut thatminimizes the mean shortest path length over all node pairs,and we repeat this step until the total length of the networkreaches a certain threshold. (When there is a tie, we pickone shortcut uniformly at random from the set of all shortcutsthat minimize the shortest path length.) Our final network isthe set of nodes and edges right before the step that wouldforce us to exceed this threshold by adding a new shortcut.Reference [78] called this procedure a greedy shortcut con-struction. In adding shortcuts, we also apply an additionalconstraint to emulate real road networks: new edges are notallowed to cross any existing edges.Consider a candidate edge ecand (among all of the possible

pairs of nodes where currently edges are not connected) thatconnects the vectors q and q + q. We start by examiningthe 2D case. Suppose that there is an edge eext (which existbefore the shortcut addition) that connects p and p + p. Theequation of intersection,

p + tp = q + uq ,

then implies that

t =[(q p) q]z(p q)z , t 2 [0, 1] ,

u =[(q p) p]z(p q)z u 2 [0, 1] .

(5)

In Eq. (5), the z component (indicated by the subscripts) is per-pendicular to the plane that contains the network. If Eq. (5)has a solution, then eext intersects with ecand , so ecand is ex-cluded and we try another candidate edge. We continue un-til we exhaust every pair of nodes that are currently not con-nected to each other by an edge. We now consider the singu-lar cases, in which the denominator in Eq. (5) equals 0. Whenp q = 0, it follows that p k q (i.e., they are parallelto each other), so they cannot intersect; therefore, ecand is notexcluded. When p q = 0 and (q p) p = 0 [whichis equivalent to (q p) q = 0 because p k q impliesthat (q p), p, and q are all parallel to each other], ecandand eext are collinear and share infinitely many points, so ecandis excluded from consideration in that case as well [80]. Wenow consider the 3D case. The distance between (the closestpoints of) ecand and eext is

d =|(p q) (q p)|

|(p q)| .

Thus, if d > 0, then it is guaranteed that ecand and eext do notintersect. Again, p q = 0 corresponds to the parallelcase, so ecand and eext cannot intersect [81]. If d = 0, then thevectors p and q yield a plane, so we obtain the same solutionas in the 2D case, where we replace the z component in Eq. (5)with the component that lies in the direction perpendicular tothe relevant 2D plane.We generate synthetic roadlike networks by placing 100

nodes uniformly at random inside of a unit square (2D) orcube (3D), and we use a threshold of 2lMST for the total lengthof the edges. In Fig. 7, we show examples of 2D and 3D road-like networks. For each embedding dimension, we consider50 dierent networks in our ensemble. We consider 50 dif-ferent initial node locations in each case, but that is the onlysource of stochasticity (except for another small source ofstochasticity from the tie-breaking rule) because the construc-tion process itself is deterministic. Our main observation fromexamining these synthetic networks is that correlations of CSvalues with other quantities (geodesic PS values, GSNP val-ues, and BC values) are much larger in the 3D networks thanin the 2D networks (see Table V). This suggests that the em-bedding dimension of the roadlike networks is related to the

data: 100 samples (2 km x 2 km) of urban road structures in the worldref) SHL and P. Holme, Phys. Rev. Lett. 108, 128701 (2012).

100 road network data: https://sites.google.com/site/lshlj82/road_data_2km.zip

Rabbit warren as a 3D road network

data: a European rabbit (Oryctolagus cuniculus) warren located in Bicton Gardens, Exeter, Devon, United Kingdom excavated for the purpose of making a documentary series The Burrowers: Animals Underground that was broadcast recently by the BBC (http://www.bbc.co.uk/programmes/b038p45r)

4

!!!!!!!!!!!!!!!!!!!!!!!

Entrance*hole*2*

Initial,area,,2,rabbits,

Not!sure!!failed!bolt!hole?!Collapsed!+!widened!out.!Could!be!passing!place!Secondary!entrance!

Exploratory!tunnel!!hit!rock?!

Secondary,area,

Tertiary,area,or,extension,of,secondary,

PRIMARY*HUB*

SECONDARY*HUB*

A,B,

C,

Resting,area,

Entrance*hole*1*

Entrance*hole*3*

FIG. 2: (color online) The photo (courtesy of Hannah Sneyd) of rabbit warren excavation site, where some known characteristic regions aremarked.

!Entrance!hole!1!


No!idea!

Bolt!hole!

PRIMARY*HUB*

SECONDARY*HUB*



C,

B,

A,

D,


!

Resting!area!

PRIMARY*HUB*

SECONDARY*HUB*

Collapsed*+*widened*out.*Could*be*passing*place*

E,

Entrance!

FIG. 3: (color online) The rabbit warrens 3D structures (courtesy of Hannah Sneyd) from dierent angles, where some important locationsfor transportation and breeding chambers AE are indicated.

between banks (Wb0b from bank b to b0) should refer to theinterbank exposures from the lending bank to the borrowingbank in principle, but the data is only available in the unit of acountry c to a bank b, Ecb =

Pb02C(c)Wb0b where the set C(c)

is composed of the list of banks belonging to the country c.Therefore, for each Ecb, we equally distribute it to each bankb0 in that country c (except for the lending bank b itself) asWb0b = Ecb/|C(c) \ {b}|.

Figure 7 shows the CS (the resolution = = 0.01 for 2 [0, 1] and 2 [0, 1] is used) and PS (with optimal path-ways maximizing the sum of weights) of the interbank net-

work, where a few very large PS values dominate the system:GB089, BE004, FR013, DE017, and ES060 (sorted by the PSvalues). Similar to another non-transportation-based dolphinsocial network in Sec. III A, the CS and PS are correlated toeach other (Pearson: 0.430, Spearman: 0.499) more than orcomparable to CS vs BC (Pearson: 0.122, Spearman: 0.538)and PS vs BC (Pearson: 0.212, Spearman: 0.573).

3D modeling by Simon Buckley

!Entrance!hole!1!


No!idea!

Bolt!hole!

PRIMARY*HUB*

SECONDARY*HUB*



C,

B,

A,

D,


Hypothetical,story,of,this,warren,!

" It!could!have!taken!2"3!seasons!(over!2"3!years)!to!dig!out!this!whole!warren.!!

" In!January!year!one!you!start!with!2!adults!who!start!making!first!burrow!via!entrances!1&2.!They!then!add!in!entrance!3.!!!

" During!season!one!the!doe!has,!say,!20!offspring,!of!which!4!survive.!There!will!be!some!extra!digging!over!that!year!as!the!numbers!increase.!!!

" By!the!next!January!those!youngsters!will!be!of!breeding!age!and!start!digging!their!own!tunnels!and!breeding.!Some!of!their!young!will!stay!but!most!will!migrate.!!!

" By!year!three!you!have!the!warren!how!it!looks!now!with!8"10!rabbits!in!it.!A!couple!might!be!male!and!the!rest!will!be!female.!One!of!the!secondary!males!would!establish!the!satellite!warren!with!a!subordinate!female.!!

KEY,,

" We,could,spot,5,breeding,chambers,,which,weve,labeled,A@E.,Other,subordinate,breeding,females,would,be,using,breeding,stops,out,side,the,warren,

,@,the,different,areas,(Primary,,Secondary,and,Tertiary),denote,the,order,in,which,the,sections,of,the,,

,,,,,,,,,,,,,,,,,,,warren,would,have,been,dug,out,,

,@,A,hub,is,a,very,busy,area,of,the,warren,with,lots,of,rabbits,coming,and,going.,This,initially,could,,,,,,,

,,,,,,,,,,,,,,,,, have,been,a,tunnel,intersection,or,a,breeding,chamber,but,got,constantly,expanded,into,a,,,,,,,,,, sprawling,chamber,!

Breeding!chamber!

Primary!area!

SECONDARY*HUB*

Breeding chamber

Primary area

9TABLE V. Pearson and Spearman correlation values between various core and centrality values (CS, PS, and BC) for transport and syntheticroadlike networks. For the rabbit warren, we give (in parentheses) two-tailed p-values for the null hypothesis of absence of correlation. Thevalues that we show for road networks (Roads) are the mean correlation values for all 100 roads, and we give standard errors in parenthesis.The results of 2D and 3D null-model 100-node roadlike networks are from an ensemble of 100 initial node locations. (In each case, we reportmean values and standard errors over an ensemble.) We use the same SciPy package in Python [44] as in Tables I and II.

Network Correlation CS vs PS CS vs BC PS vs BC PS vs BC CS vs GSNP PS vs GSNP PS vs GSNP(nodes) (nodes) (nodes) (edges) (nodes) (nodes) (edges)

Rabbit Warren [70] Pearson 0.231 0.284 0.561 0.303 0.371 0.348 8.25 102(1.61 102) (2.87 103) (2.70 1010) (1.00 103) (7.85 105) (2.22 104) (0.381)

Spearman 0.318 0.437 0.568 0.403 0.331 0.293 0.198(8.08 104) (2.29 106) (1.48 1010) (7.84 106) (4.69 104) (2.09 103) (3.42 102)

3D Null Model [78] Pearson 0.570(9) 0.609(8) 0.869(4) 0.472(9) 0.52(1) 0.842(4) 0.13(1)Spearman 0.572(9) 0.668(7) 0.762(4) 0.394(7) 0.51(1) 0.710(5) 0.14(1)

Roads [37] Pearson 7(8) 104 2(7) 104 4.1(4) 102 5.8(6) 102 1(9) 104 4.2(4) 102 1.5(3) 102Spearman 3(8) 104 4(7) 104 6.7(6) 102 8.8(8) 102 2(8) 104 6.0(6) 102 2.7(3) 102

2D Null Model [78] Pearson 0.29(2) 0.33(1) 0.668(7) 0.247(9) 0.25(2) 0.675(7) 9.7(8) 102Spearman 0.36(1) 0.45(1) 0.683(6) 0.288(8) 0.31(2) 0.692(6) 0.216(9)

2D Null Model with Pearson 0.27(2) 0.36(1) 0.694(6) 0.35(1) 0.23(1) 0.713(6) 0.170(1)Edge Crossing [78] Spearman 0.35(2) 0.46(1) 0.707(6) 0.389(9) 0.31(2) 0.692(6) 0.216(9)

(a) (b) (c)

FIG. 4. Visualization of the rabbit-warren network. An edges thickness is linearly proportional to the mean width of the tunnel segment that itrepresents. (It is dicult to discern the dierences in width, as the widths are rather homogeneous.) An edges length is linearly proportionalto its real length. We color (a) the nodes according to CS values, (b) the nodes and edges according to geodesic PS values, and (c) the nodesand edges according to GSNP values. We project the three-dimensional positions of nodes into a plane using a birds-eye view. The labelsprimary hub and secondary hub were applied by experts [71], and the secondary hub was populated later in time than the primary one.(The term primary is not being used to indicate relative importance.)

measures of coreness.

The node with the largest PS value in terms of both geodesicdistance and GSN is the secondary hub marked in Fig. 4(b)and was pointed out by an expert on rabbits. The descriptorsecondary refers to the fact that it was the second hub intemporal order; it is not a statement of relative importance.The secondary hub has the second largest geodesic BC value.The primary hub region marked in Fig. 4(a) has nodes withlarger CS values than geodesic and GSNP values. As one cansee in Table V, the geodesic and GSNP values are highly cor-related in the rabbit-warren network. According to the rabbitexperts and the documentary [71], stronger rabbits are able toacquire better breeding areas. The best breeding areas experi-ence lower trac, and the breeding areas with the lowest PSvalues a