2015 ORNL-WomenInComputing-Seminar

31
COMPUTER SCIENCE [email protected] http://www.csc.ncsu.edu/faculty/bdsullivan FINDING BOTH THEORY AND PRACTICE IN NETWORK SCIENCE Oak Ridge National Laboratory Women in Computing Seminar July 9, 2015 BLAIR D. SULLIVAN

Transcript of 2015 ORNL-WomenInComputing-Seminar

COMPUTER)SCIENCE)[email protected]

http://www.csc.ncsu.edu/faculty/bdsullivan

FINDING0BOTH00THEORY0AND0PRACTICE00IN0NETWORK0SCIENCE0

Oak$Ridge$National$Laboratory$$Women$in$Computing$Seminar$

July$9,$2015$

BLAIR0D.0SULLIVAN0

2

Bad)news0F0big0data0is0big:0

0

0

0

00

•  Human0brain:083B0neurons,01Q0synaptic0links0

•  Climate0simulation:042M+0grid0cells,0100+0variables,010000years00

•  Genome:06B0nucleotides,020K0genes,01.4K0DNA0transcription0factors0

•  Social0Networks:0Facebook,0Twitter0each0have0~1B0users,0up0to0300K0tweets/min,04.5B0likes/day,0avg.03380friends0/02080followers0per0user.)

$

Images$credit$(L$to$R):$WUGMinn$HCP$consortium,$Dovrolis$et$al.,$Human$Genome$Project,$Zephoria$

Motivation0

3

Good)news)–0graphs0naturally0encode0relationships!00

0

0

0

00

•  Human0brain:083B0neurons,01Q0synaptic0links0

•  Climate0simulation:042M+0grid0cells,0100+0variables,010000years00

•  Genome:06B0nucleotides,020K0genes,01.4K0DNA0transcription0factors0

•  Social0Networks:0Facebook,0Twitter0each0have0~1B0users,0up0to0300K0tweets/min,04.5B0likes/day,0avg.03380friends0/02080followers0per0user.)

$

Images$credit$(L$to$R):$$Hagmann$et$al,$Wright$et$al,$Nayak$et$al,$and$Weng$et$al$

Motivation0

4

A0quick0tour0of0(big)0graph0data0

Image$Credits:$various$!$0

5

Challenges0

C.$Elegans$image$by$Tormikotkas$

Ad$hoc$methods$

Existing$analysis$finds$only$local$patterns$or$global$properties$

High$computational$complexity$

Hard$to$compare$data$collected$at$varied$resolutions$

Random$sampling$is$often$ineffective$

Example:$motif$counting$currently$scales$to$patterns$of$size$5$(or$6)!$0

There0is0an0urgent)need)for0new0methods0to0quantitatively)analyze)massive'(dynamic)'graphs)and0their0components0under)uncertainty.))

6

Our0Approach0

7

Cool0Theory0(with0a0little0Practice)0

1.0RealFworld0networks0have0useful)sparse)structure0–00both0theoretically0and0empirically$

2.0Sparse0structure0leads0to0fast)algorithms00

for$things$we$care$about!00

3.)Implementation!00A0proofFofFconcept0

pipeline0for0motif0counting0

8

Fast0[Graph]0Algorithms?0

Fact:$Most$interesting$graph$problems$are$$NPGcomplete$for$general$graphs$

•  Subgraph0isomorphism0(motifs)0•  Vertex0cover0(sensor0networks)0•  Independent0set0(linear0algebra)0•  Max0clique0(protein0groups)0

Image$credit:$Garey$&$Johnson$(L),$Wikimedia$(R)$

9

Parameterization0to0the0rescue!0

VERTEXCOVER:0Find0min.0number0of0vertices0that0hit0all0edges.0

0

0

0

0

0

KFVERTEXCOVER:0Can0you0find0at0most0k0vertices0hitting0all0edges?0

•  Brute)force:0consider0all0subsets0of0size0k$•  This0algorithm0takes0time0O(nk)))

Can$we$do$better?$

Image$credit:$Wolfram$Alpha$

10

Improvement0through0Kernelization0

•  KFVERTEXCOVER:0Are0there0at0most0k0vertices0hitting0all0edges?0

0

•  Each0time0we0apply0the0rule,0decrease0k0by0one0(so0at0most0k0times)0

When0we0can’t0apply0the0rule,0either0there0are0at0most0k2$vertices0left,0or0we0know0there0is0not0a0solution0(cover0of0size0at0most0k).$

•  BruteFforce0remainder0in0time0O(2k20),0so0algorithm0is0O(f(k)n)0

11

Parameterized0Complexity0in0a0nutshell0

• NP:0solvable0with0nonFdeterministic0Turing0machine0

• XP:00has0an0O(nf(k))0algorithm.00

•  FPT:0has0an0f(k)nO(1)$algorithm.0

• PolyKkernel:0the0kernel0has0size0kO(1)00

12

What0if0the0problem0size0isn’t0small?0

13

Saved0by0sparsity?0

Images$credit$Felix$Reidl$(RWTH$Aachen)$

Observation)2:)Edges$are$not$evenly$$

distributed$in$real$graphs.$$

(easy$to$see$if$you$think$social)$Twitter:0avg0followers:0200,00

max0followers0~59M0Facebook:0#0friends0varies0interF0vs0intraFcommunity0

0Hypothesis:)

This$“should”$help.$$But$how$exactly?0

0

Observation)1)(v.)0.0))Many$realGworld$networks$have$

low$average$degree.$$

Facebook:0|V|0~01.3B,0|E|0~0500B0Yeast0PPI:0|V|0=03000,0|E|0~030000Power0Grid:0|V|0~05K,0|E|0~013K0Neurome:0|V|0~01010,0|E|0~010140

Twitter:0|V|0~01B,0|E|0~0200B0

0Consequences:)

Some$algorithms$get$faster$$(but$NPGhard$problems$remain)$

What'could'it'mean'to'be'structurally'sparse?'

14

No'cycles'makes'things'easy!))

• MAXWIS:0Find0the0maximum$weighted$independent$set0in0G$

$$

This)NPKhard)problem)has)a)linear)algorithm)on)trees!)

30 20 10 10

30 2030 40

70

20

10

(3,0)

(3,6)

(1,0)

(2,0) (3,0)

(1,0) (2,0)

(7,5)

(4,1)

(8,10)

(17,15)

Useful0Sparsity0(v.01.0):0Tree0Structure0

[Some prefer to think of belief propagation -- also efficient (and exact) on trees]

15

• A0graph0class0G0has0bounded$treewidth$if0every0graph0has0a0tree$decomposition$of0width0at0most0c.00

0

0

0

0

0

Sparsity01.1:0Bounded0Treewidth0

Usefulness:)0Most0algorithms0which0are0polynomialFtime0on0trees00can0be0extended0to0run)in)polyKtime)on)bounded)treewidth00

(you0pay0an0exponential0factor0in0terms0of0the0width)0

Image$credit$Wikipedia$

16

Star forests

Bounded treedepth

Bounded treewidth

Excluding a minor

Excluding a topological minor

Bounded expansion

Outerplanar

Planar

Bounded genus

Linear forests

Bounded degree

Locally bounded treewidth

Locally excluding a minor

Forests

r

rr

Locally bounded expansion

Nowhere dense

r

Sparse0Graph0Hierarchy0

17

FPT0Algorithms0&0Graph0Structure:0Pros0&0Cons0

Lots0of0problems0become0FPT:0

00

And0there0are0metaFtheorems!00

•  FOFmodel0checking0on$nowhereGdense$graphs*0(k0=0formula0size)00

•  EMSOFmodel0checking0parameterized$by$treewidth'

BUT'$

RealBworld'networks'might$not$fall$into$any$of$these$categories!$$

Worse,$it’s$hard'to'test'membership$(&$existing$results$are$negative)$

The$algorithms$often$have$[huge]$hidden'constants''

•  STEINER0TREE0(bounded0degeneracy)0•  DOMINATINGSET0(bounded0genus)0

•  SUBGRAPHISOMORPHISM0(bounded0expansion)0•  MAXWIS0(bounded0treewidth)00

*Recall, this was broadest class. And lots of problems are expressible in FO-logic.

18

How0do0we0know0if0networks00are0structurally0sparse?0

19

Connecting0(with)0the0dots00

• Challenge:0instances$vs.$classes$• Goal:0classify0networks0by0their0features/characteristics0

• Typical0Approach:0find0randomized$models$that0match0desired0features0–0use0these0to0represent0the0class00

41

3 3

32

41

3 3

32

Random$Intersection$Graphs$$

(shared$attributes)$

BarabasiGAlbert$(pref.$attachment)$

Configuration/$ChungGLu$(degree$distribution)$

Kleinberg$(small$world)$

20

A0bit0of0bad0news…0

Bounded)Degree:)•  ErdösFRényi00F0O(log0n)0

•  MolloyFReed0[depends0on0specified0degree0sequence]0

•  Random0Intersection0Graphs0(α0≤01)0

Bounded)Degeneracy:)•  BarabasiFAlbert00•  Random0Intersection0Graphs0(α0≤01))

Bounded)Treewidth:)•  ErdösFRényi00F0O(n)00[Gao,02009]0

•  BarabasiFAlbert0–0O(n)0[Gao,02009]00

21

Sparsity02.0:0Bounded0Expansion0

• A0graph0class0G0has0bounded$expansion$if0every0rFshallow0(topological)0minor0has0density0at0most0f(r).00

0

0

0

0

0

0

Note:)related0FPT0algorithms0don’t0require0knowing0f(r)$

Or(G) = max

H2GOr

|E(H)||V (H)|

Or(G) := supG2G

Or(G) f(r)

22

Not0only0intuitive,0but0predicted!0

*0Includes0configuration0with0households0(high0clustering)0&0inhomogenous0random0graphs.0

23

An0Interesting0Implication0

Corollary:0Preferential0Attachment0generates0a0nonBrepresentative'sample$$

of0graphs0with0a0powerFlaw0degree0distribution0

Why?'•  Thm01:0Graphs0chosen0uniformly0at0random0to0have0a0powerFlaw0

degree0distribution0(exponent0>02)0or0be0“scaleFfree”0(ala0CooperFLu)'have)bounded)expansion'asymptotically$almost$surely.$$

•  Thm02:0Graphs0generated0with0preferential0attachment0(BA0model)0are)somewhere)dense0(not0bounded0expansion)$with$high$probability.$

Is'this'good'or'bad?'0

It0depends.0

24

Does0it0hold0in0real0data?0

25

Great.0So0what?00

Demaine,0O’Brien,0Reidl,0Rossmanith,0Sanchez0Villaamil,0Sikdar,0S.0

26

Exploitation0(“the0big0picture”)0

27

When0does0it0help?0

Problem:0Count0number0of0occurrences0of0motif0H00

00

in0host0graph0G$

Motifs:0recurrent$and$statistically$significant$subgraphs$or$patterns.$(Milo0et0al0‘07)0

28

Algorithmic0Pipeline0

29

Is0that0ever0really0faster?0

At$scale,$BE$algorithm$(solid)$quickly$outGperforms$general$XP$algorithms$(dashed)!$

102 103 104 105 106 107100

1050

10100

10150

10200

10250

10300

Number of vertices

Tim

e

h = 10 h = 20 h = 50 h = 100

Counting0motifs0of0size0h0in0boundedFexpansion0

takes0time0

with'small'constants)

30

Shameless0Plug0

0

Postdoc)positions)available!)3F50openings0likely0in02015F2020.0

Know)a)great)undergrad?)Encourage0them0to0apply0to0NC0State0CSC0and0list0me0as0faculty0they’re0

interested0in0working0with!00

)

We’re'Hiring!'

31

Acknowledgements0$

This$talk$covered$joint$work$with$subsets$of$the$following$collaborators:$$)

Aaron)Adcock)(Stanford)0Robert)Bridges)(ORNL)0Erik)Demaine)(MIT)0

Martin)Demaine0(MIT)0Pal)Drange)(U.0Bergen)0

Markus)Dregi)(U.0Bergen))Timothy)Goodrich)(NC0State)))Matthew)Farrell)(Cornell)0

Erik)Ferragut)(ORNL)0Jason)Laska0(ORNL)0

Nathan)Lemons)(LANL)0Daniel)Lokshtanov)(U.0Bergen)0Michael)Mahoney)(UC0Berkeley)))Michael)P.)O’Brien0(NC0State)00Felix)Reidl0(RWTH0Aachen)0

Peter)Rossmanith0(RWTH0Aachen))Fernando)Sanchez)Villaamil0(RWTH0Aachen))

Somnath)Sikdar0(RWTH0Aachen))0

More$information$available$at:$www.csc.ncsu.edu/faculty/bdsullivan)0

B.0Sullivan0funded0in0part0by0the0Gordon0&0Betty0Moore0Foundation,0the00National0Consortium0for0Data0Science0and0the0Defense0Advanced0Research0Projects0Agency0GRAPHS0program0