2015 ORNL-WomenInComputing-Seminar
-
Upload
blairdsullivan -
Category
Science
-
view
191 -
download
0
Transcript of 2015 ORNL-WomenInComputing-Seminar
COMPUTER)SCIENCE)[email protected]
http://www.csc.ncsu.edu/faculty/bdsullivan
FINDING0BOTH00THEORY0AND0PRACTICE00IN0NETWORK0SCIENCE0
Oak$Ridge$National$Laboratory$$Women$in$Computing$Seminar$
July$9,$2015$
BLAIR0D.0SULLIVAN0
2
Bad)news0F0big0data0is0big:0
0
0
0
00
• Human0brain:083B0neurons,01Q0synaptic0links0
• Climate0simulation:042M+0grid0cells,0100+0variables,010000years00
• Genome:06B0nucleotides,020K0genes,01.4K0DNA0transcription0factors0
• Social0Networks:0Facebook,0Twitter0each0have0~1B0users,0up0to0300K0tweets/min,04.5B0likes/day,0avg.03380friends0/02080followers0per0user.)
$
Images$credit$(L$to$R):$WUGMinn$HCP$consortium,$Dovrolis$et$al.,$Human$Genome$Project,$Zephoria$
Motivation0
3
Good)news)–0graphs0naturally0encode0relationships!00
0
0
0
00
• Human0brain:083B0neurons,01Q0synaptic0links0
• Climate0simulation:042M+0grid0cells,0100+0variables,010000years00
• Genome:06B0nucleotides,020K0genes,01.4K0DNA0transcription0factors0
• Social0Networks:0Facebook,0Twitter0each0have0~1B0users,0up0to0300K0tweets/min,04.5B0likes/day,0avg.03380friends0/02080followers0per0user.)
$
Images$credit$(L$to$R):$$Hagmann$et$al,$Wright$et$al,$Nayak$et$al,$and$Weng$et$al$
Motivation0
5
Challenges0
C.$Elegans$image$by$Tormikotkas$
Ad$hoc$methods$
Existing$analysis$finds$only$local$patterns$or$global$properties$
High$computational$complexity$
Hard$to$compare$data$collected$at$varied$resolutions$
Random$sampling$is$often$ineffective$
Example:$motif$counting$currently$scales$to$patterns$of$size$5$(or$6)!$0
There0is0an0urgent)need)for0new0methods0to0quantitatively)analyze)massive'(dynamic)'graphs)and0their0components0under)uncertainty.))
7
Cool0Theory0(with0a0little0Practice)0
1.0RealFworld0networks0have0useful)sparse)structure0–00both0theoretically0and0empirically$
2.0Sparse0structure0leads0to0fast)algorithms00
for$things$we$care$about!00
3.)Implementation!00A0proofFofFconcept0
pipeline0for0motif0counting0
8
Fast0[Graph]0Algorithms?0
Fact:$Most$interesting$graph$problems$are$$NPGcomplete$for$general$graphs$
• Subgraph0isomorphism0(motifs)0• Vertex0cover0(sensor0networks)0• Independent0set0(linear0algebra)0• Max0clique0(protein0groups)0
Image$credit:$Garey$&$Johnson$(L),$Wikimedia$(R)$
9
Parameterization0to0the0rescue!0
VERTEXCOVER:0Find0min.0number0of0vertices0that0hit0all0edges.0
0
0
0
0
0
KFVERTEXCOVER:0Can0you0find0at0most0k0vertices0hitting0all0edges?0
• Brute)force:0consider0all0subsets0of0size0k$• This0algorithm0takes0time0O(nk)))
Can$we$do$better?$
Image$credit:$Wolfram$Alpha$
10
Improvement0through0Kernelization0
• KFVERTEXCOVER:0Are0there0at0most0k0vertices0hitting0all0edges?0
0
• Each0time0we0apply0the0rule,0decrease0k0by0one0(so0at0most0k0times)0
When0we0can’t0apply0the0rule,0either0there0are0at0most0k2$vertices0left,0or0we0know0there0is0not0a0solution0(cover0of0size0at0most0k).$
• BruteFforce0remainder0in0time0O(2k20),0so0algorithm0is0O(f(k)n)0
11
Parameterized0Complexity0in0a0nutshell0
• NP:0solvable0with0nonFdeterministic0Turing0machine0
• XP:00has0an0O(nf(k))0algorithm.00
• FPT:0has0an0f(k)nO(1)$algorithm.0
• PolyKkernel:0the0kernel0has0size0kO(1)00
13
Saved0by0sparsity?0
Images$credit$Felix$Reidl$(RWTH$Aachen)$
Observation)2:)Edges$are$not$evenly$$
distributed$in$real$graphs.$$
(easy$to$see$if$you$think$social)$Twitter:0avg0followers:0200,00
max0followers0~59M0Facebook:0#0friends0varies0interF0vs0intraFcommunity0
0Hypothesis:)
This$“should”$help.$$But$how$exactly?0
0
Observation)1)(v.)0.0))Many$realGworld$networks$have$
low$average$degree.$$
Facebook:0|V|0~01.3B,0|E|0~0500B0Yeast0PPI:0|V|0=03000,0|E|0~030000Power0Grid:0|V|0~05K,0|E|0~013K0Neurome:0|V|0~01010,0|E|0~010140
Twitter:0|V|0~01B,0|E|0~0200B0
0Consequences:)
Some$algorithms$get$faster$$(but$NPGhard$problems$remain)$
What'could'it'mean'to'be'structurally'sparse?'
14
No'cycles'makes'things'easy!))
• MAXWIS:0Find0the0maximum$weighted$independent$set0in0G$
$$
This)NPKhard)problem)has)a)linear)algorithm)on)trees!)
30 20 10 10
30 2030 40
70
20
10
(3,0)
(3,6)
(1,0)
(2,0) (3,0)
(1,0) (2,0)
(7,5)
(4,1)
(8,10)
(17,15)
Useful0Sparsity0(v.01.0):0Tree0Structure0
[Some prefer to think of belief propagation -- also efficient (and exact) on trees]
15
• A0graph0class0G0has0bounded$treewidth$if0every0graph0has0a0tree$decomposition$of0width0at0most0c.00
0
0
0
0
0
Sparsity01.1:0Bounded0Treewidth0
Usefulness:)0Most0algorithms0which0are0polynomialFtime0on0trees00can0be0extended0to0run)in)polyKtime)on)bounded)treewidth00
(you0pay0an0exponential0factor0in0terms0of0the0width)0
Image$credit$Wikipedia$
16
Star forests
Bounded treedepth
Bounded treewidth
Excluding a minor
Excluding a topological minor
Bounded expansion
Outerplanar
Planar
Bounded genus
Linear forests
Bounded degree
Locally bounded treewidth
Locally excluding a minor
Forests
r
rr
Locally bounded expansion
Nowhere dense
r
Sparse0Graph0Hierarchy0
17
FPT0Algorithms0&0Graph0Structure:0Pros0&0Cons0
Lots0of0problems0become0FPT:0
00
And0there0are0metaFtheorems!00
• FOFmodel0checking0on$nowhereGdense$graphs*0(k0=0formula0size)00
• EMSOFmodel0checking0parameterized$by$treewidth'
BUT'$
RealBworld'networks'might$not$fall$into$any$of$these$categories!$$
Worse,$it’s$hard'to'test'membership$(&$existing$results$are$negative)$
The$algorithms$often$have$[huge]$hidden'constants''
• STEINER0TREE0(bounded0degeneracy)0• DOMINATINGSET0(bounded0genus)0
• SUBGRAPHISOMORPHISM0(bounded0expansion)0• MAXWIS0(bounded0treewidth)00
*Recall, this was broadest class. And lots of problems are expressible in FO-logic.
19
Connecting0(with)0the0dots00
• Challenge:0instances$vs.$classes$• Goal:0classify0networks0by0their0features/characteristics0
• Typical0Approach:0find0randomized$models$that0match0desired0features0–0use0these0to0represent0the0class00
41
3 3
32
41
3 3
32
Random$Intersection$Graphs$$
(shared$attributes)$
BarabasiGAlbert$(pref.$attachment)$
Configuration/$ChungGLu$(degree$distribution)$
Kleinberg$(small$world)$
20
A0bit0of0bad0news…0
Bounded)Degree:)• ErdösFRényi00F0O(log0n)0
• MolloyFReed0[depends0on0specified0degree0sequence]0
• Random0Intersection0Graphs0(α0≤01)0
Bounded)Degeneracy:)• BarabasiFAlbert00• Random0Intersection0Graphs0(α0≤01))
Bounded)Treewidth:)• ErdösFRényi00F0O(n)00[Gao,02009]0
• BarabasiFAlbert0–0O(n)0[Gao,02009]00
21
Sparsity02.0:0Bounded0Expansion0
• A0graph0class0G0has0bounded$expansion$if0every0rFshallow0(topological)0minor0has0density0at0most0f(r).00
0
0
0
0
0
0
Note:)related0FPT0algorithms0don’t0require0knowing0f(r)$
Or(G) = max
H2GOr
|E(H)||V (H)|
Or(G) := supG2G
Or(G) f(r)
22
Not0only0intuitive,0but0predicted!0
*0Includes0configuration0with0households0(high0clustering)0&0inhomogenous0random0graphs.0
23
An0Interesting0Implication0
Corollary:0Preferential0Attachment0generates0a0nonBrepresentative'sample$$
of0graphs0with0a0powerFlaw0degree0distribution0
Why?'• Thm01:0Graphs0chosen0uniformly0at0random0to0have0a0powerFlaw0
degree0distribution0(exponent0>02)0or0be0“scaleFfree”0(ala0CooperFLu)'have)bounded)expansion'asymptotically$almost$surely.$$
• Thm02:0Graphs0generated0with0preferential0attachment0(BA0model)0are)somewhere)dense0(not0bounded0expansion)$with$high$probability.$
Is'this'good'or'bad?'0
It0depends.0
27
When0does0it0help?0
Problem:0Count0number0of0occurrences0of0motif0H00
00
in0host0graph0G$
Motifs:0recurrent$and$statistically$significant$subgraphs$or$patterns.$(Milo0et0al0‘07)0
29
Is0that0ever0really0faster?0
At$scale,$BE$algorithm$(solid)$quickly$outGperforms$general$XP$algorithms$(dashed)!$
102 103 104 105 106 107100
1050
10100
10150
10200
10250
10300
Number of vertices
Tim
e
h = 10 h = 20 h = 50 h = 100
Counting0motifs0of0size0h0in0boundedFexpansion0
takes0time0
with'small'constants)
30
Shameless0Plug0
0
Postdoc)positions)available!)3F50openings0likely0in02015F2020.0
Know)a)great)undergrad?)Encourage0them0to0apply0to0NC0State0CSC0and0list0me0as0faculty0they’re0
interested0in0working0with!00
)
We’re'Hiring!'
31
Acknowledgements0$
This$talk$covered$joint$work$with$subsets$of$the$following$collaborators:$$)
Aaron)Adcock)(Stanford)0Robert)Bridges)(ORNL)0Erik)Demaine)(MIT)0
Martin)Demaine0(MIT)0Pal)Drange)(U.0Bergen)0
Markus)Dregi)(U.0Bergen))Timothy)Goodrich)(NC0State)))Matthew)Farrell)(Cornell)0
Erik)Ferragut)(ORNL)0Jason)Laska0(ORNL)0
Nathan)Lemons)(LANL)0Daniel)Lokshtanov)(U.0Bergen)0Michael)Mahoney)(UC0Berkeley)))Michael)P.)O’Brien0(NC0State)00Felix)Reidl0(RWTH0Aachen)0
Peter)Rossmanith0(RWTH0Aachen))Fernando)Sanchez)Villaamil0(RWTH0Aachen))
Somnath)Sikdar0(RWTH0Aachen))0
More$information$available$at:$www.csc.ncsu.edu/faculty/bdsullivan)0
B.0Sullivan0funded0in0part0by0the0Gordon0&0Betty0Moore0Foundation,0the00National0Consortium0for0Data0Science0and0the0Defense0Advanced0Research0Projects0Agency0GRAPHS0program0