Techniques that Facebook use to Analyze and QuerySocial Graphs
The power of graphs to analyze biological data
-
Upload
datablend -
Category
Technology
-
view
2.008 -
download
0
description
Transcript of The power of graphs to analyze biological data
the power of graphs for analyzing biological datasets
Davy Suvee
Janssen Pharmaceutica
about me
➡ working as an it lead / software architect @ janssen pharmaceutica• dealing with big scientific data sets
• hands-on expertise in big data and NoSQL technologies
who am i ...
Davy Suvee@DSUVEE
➡ founder of datablend• provide big data and NoSQL consultancy
• share practical knowledge and big data use cases via blog
outline
➡ getting visual insights into big data sets
➡ fluxgraph, a time machine for you graphs ...
★ gene expression clustering (mongodb, Neo4j, Gephi)★ Mutation prevalence (cassandra, Neo4j, Gephi)
insights in big data
➡ typical approach through warehousing★ star schema with fact tables and dimension tables
insights in big data
➡ typical approach through warehousing★ star schema with fact tables and dimension tables
insights in big data
★ real-time visualization★ filtering★ metrics★ layouting★ modular 1, 2
1. http://gephi.org/plugins/neo4j-graph-database-support/ 2. http://github.com/datablend/gephi-blueprints-plugin
gene expression clustering
★ 4.800 samples★ 27.000 genes
➡ oncology data set:
➡ Question:★ for a particular subset of samples, which genes are co-expressed?
mongodb for storing gene expressions{ "_id" : { "$oid" : "4f1fb64a1695629dd9d916e3"} , "sample_name" : "122551hp133a21.cel" , "genomics_id" : 122551 , "sample_id" : 343981 , "donor_id" : 143981 , "sample_type" : "Tissue" , "sample_site" : "Ascending colon" , "pathology_category" : "MALIGNANT" , "pathology_morphology" : "Adenocarcinoma" , "pathology_type" : "Primary malignant neoplasm of colon" , "primary_site" : "Colon" , "expressions" : [ { "gene" : "X1_at" , "expression" : 5.54217719084415} , { "gene" : "X10_at" , "expression" : 3.92335121981739} , { "gene" : "X100_at" , "expression" : 7.81638155662255} , { "gene" : "X1000_at" , "expression" : 5.44318512260619} , … ]}
pearson correlation through map-reduce
pearson correlation
x y
43 99
21 65
25 79
42 75
57 87
59 81
0,52
co-expression graph
➡ create a node for each gene➡ if correlation between two genes >= 0.8, draw an edge between both nodes
co-expression graph
graphs and time ...
➡ fluxgraph: a blueprints-compatible graph on top of Datomic
➡ make FluxGraph fully time-aware ★ travel your graph through time★ time-scoped iteration of vertices and edges★ temporal graph comparison
➡ towards a time-aware graph ...
➡ reproducible graph state
travel through time
FluxGraph fg = new FluxGraph();
travel through time
FluxGraph fg = new FluxGraph();
Vertex davy = fg.addVertex();davy.setProperty(“name”,”Davy”);
Davy
travel through time
FluxGraph fg = new FluxGraph();
Vertex davy = fg.addVertex();davy.setProperty(“name”,”Davy”);
Davy
Peter
Vertex peter = ...
travel through time
FluxGraph fg = new FluxGraph();
Vertex davy = fg.addVertex();davy.setProperty(“name”,”Davy”);
Michael
Davy
Peter
Vertex peter = ...Vertex michael = ...
travel through time
FluxGraph fg = new FluxGraph();
Vertex davy = fg.addVertex();davy.setProperty(“name”,”Davy”);
Michael
Davy
Peter
Vertex peter = ...Vertex michael = ...
Edge e1 = fg.addEdge(davy, peter,“knows”);
knows
travel through time
Date checkpoint = new Date();
Michael
Davy
Peter
knows
travel through time
Date checkpoint = new Date();
davy.setProperty(“name”,”David”);
Michael
Davy
Peter
knows
travel through time
Date checkpoint = new Date();
davy.setProperty(“name”,”David”);
Michael
Peter
knows
David
travel through time
Date checkpoint = new Date();
davy.setProperty(“name”,”David”);
Michael
Peter
Edge e2 = fg.addEdge(davy, michael,“knows”);
knows
David
knows
travel through time
Michael
Davy
Peter
DavidDavy
Peter
knows
knows
Michael
knows
checkpoint
currenttime
by default
travel through time
Michael
Davy
Peter
DavidDavy
Peter
knows
knows
Michael
knows
checkpoint
currenttime
fg.setCheckpointTime(checkpoint);
tcurrrentt3t2
time-scoped iteration
change change change
Davy’’’Davy’ Davy’’
t1
Davy
➡ how to find the version of the vertex you are interested in?
next next next
previouspreviousprevious
tcurrrentt3t2
time-scoped iteration
Davy’’’Davy’ Davy’’
t1
Davy
next next next
previouspreviousprevious
tcurrrentt3t2
time-scoped iteration
Davy’’’Davy’ Davy’’
t1
Davy
Vertex previousDavy = davy.getPreviousVersion();
next next next
previouspreviousprevious
tcurrrentt3t2
time-scoped iteration
Davy’’’Davy’ Davy’’
t1
Davy
Vertex previousDavy = davy.getPreviousVersion();Iterable<Vertex> allDavy = davy.getNextVersions();
next next next
previouspreviousprevious
tcurrrentt3t2
time-scoped iteration
Davy’’’Davy’ Davy’’
t1
Davy
Vertex previousDavy = davy.getPreviousVersion();Iterable<Vertex> allDavy = davy.getNextVersions();
Iterable<Vertex> selDavy = davy.getPreviousVersions(filter);
next next next
previouspreviousprevious
tcurrrentt3t2
time-scoped iteration
Davy’’’Davy’ Davy’’
t1
Davy
Vertex previousDavy = davy.getPreviousVersion();Iterable<Vertex> allDavy = davy.getNextVersions();
Iterable<Vertex> selDavy = davy.getPreviousVersions(filter);Interval valid = davy.getTimerInterval();
time-scoped iteration
➡ vertex:★ setting or removing a property ★ add or remove it from an edge★ being removed
➡ When does an element change?
time-scoped iteration
➡ vertex:★ setting or removing a property ★ add or remove it from an edge★ being removed
➡ When does an element change?
➡ edge:★ setting or removing a property ★ being removed
time-scoped iteration
➡ vertex:★ setting or removing a property ★ add or remove it from an edge★ being removed
➡ When does an element change?
➡ edge:★ setting or removing a property ★ being removed
➡ ... and each element is time-scoped!
MichaelMichael
Davy
Peter
David Davy
Peter
temporal graph comparison
knows
knows
knows
current checkpoint
what changed?
temporal graph comparison
➡ difference (A , B) = union (A , B) - B
➡ ... as a (immutable) graph!
temporal graph comparison
➡ difference (A , B) = union (A , B) - B
➡ ... as a (immutable) graph!
difference ( , ) =
David
knows
t3t2t1
use case: longitudinal patient data
patient patient
smoking
patient
smoking
t4
patient
cancer
t5
patient
cancer
death
use case: longitudinal patient data
➡ historical data for 15.000 patients over a period of 10 years (2001- 2010)
use case: longitudinal patient data
➡ historical data for 15.000 patients over a period of 10 years (2001- 2010)
➡ example analysis: ★ if a male patient is no longer smoking in 2005★ what are the chances of getting lung cancer in 2010, comparing
patients that smoked before 2005
patients that never smoked
use case: longitudinal patient data
➡ get all male non-smokers in 2005
fg.setCheckpointTime(new DateTime(2005,12,31).toDate());
use case: longitudinal patient data
➡ get all male non-smokers in 2005
fg.setCheckpointTime(new DateTime(2005,12,31).toDate());
Iterator<Vertex> males = fg.getVertices("gender", "male").iterator()
use case: longitudinal patient data
➡ get all male non-smokers in 2005
fg.setCheckpointTime(new DateTime(2005,12,31).toDate());
Iterator<Vertex> males = fg.getVertices("gender", "male").iterator()
while (males.hasNext()) { Vertex p2005 = males.next(); boolean smoking2005 = p2005.getEdges(OUT,"smokingStatus").iterator().hasNext();}
use case: longitudinal patient data
boolean smokingBefore2005 = ((FluxVertex)p2005).getPreviousVersions(new TimeAwareFilter() { public TimeAwareElement filter(TimeAwareVertex element) { return element.getEdges(OUT, "smokingStatus").iterator().hasNext() ? element : null; }
}).iterator().hasNext();
➡ which patients were smoking before 2005?
use case: longitudinal patient data
Graph g = fg.difference(smokerws, time2010.toDate(), time2005.toDate());
➡ which patients have cancer in 2010
working set of smokers
use case: longitudinal patient data
Graph g = fg.difference(smokerws, time2010.toDate(), time2005.toDate());
➡ which patients have cancer in 2010
working set of smokers
➡ extract the patients that have an edge to the cancer node
Questions?