Arts Integration as Pathway to Unity in the Community: The (Ongoing)
Go pathway-interaction-integration
Click here to load reader
-
Upload
chris-mungall -
Category
Documents
-
view
946 -
download
0
Transcript of Go pathway-interaction-integration
Integration of GO, Pathway data and Interaction data
Chris MungallPeter D’Eustachio
The GO was originally intended to integrate databases
• How are we doing? Interoperability of genomic databases is limited by this lack of progress, and it is this major obstacle that the Gene Ontology (GO) Consortium was formed to addressGene Ontology: Tool for the Unification of Biology. Nat Genet 2000
SGD FB GOA
GO
The GO was originally intended to integrate databases
• How are we doing?Not as well as we could!
SGD FB GOA
BioGRID IntactReactome Cyc
IMEXPathway Commons
……
GO
Integration enhances analyses and reduces workload
• Division of labor– leave specialized curation to specialized systems biology
databases– but data needs to be re-combined to prevent siloing
• GO is an invaluable single-stop shop for term enrichment etc
• Can we quantify how integrating with systems biology databases helps users?– Yes! We can do the experiment:– GO term enrichment analysis on all MolSigDB
• with Reactome annotations– Also include Reactome inputs/outputs, not currently in GOA
• without Reactome annotations
Integration enhances analyses
• GOA+R: Many p-values will significantly improved– Recapitulated biologically valid results that would
have been suppressed had one single resource been used
– Examples:• Genes down-regulated in Alzheimers
GOA without R GOA with R (enhanced)
oxidative phosophorylation 7 x 10-29 1.2 x 10-44
regulation of insulin secretion 0.72 4 x 10-46
How are we currently integrating systems biology datasets?
• Interaction data– Currently Intact, soon IMEX– “protein binding” and “self-protein binding” only (+with)
• Pathway data– Currently Reactome only– Loses much of what is in Reactome
• E,g,inputs and outputs
– Manually curated GO<->Reactome links• incomplete• not always to the most specific term• labor-intensive• become stale over time• other pathway databases?
• This can be improved!
Automating integration using cross-product definitions – pathway databases
[Term]id: GO:0015871name: choline transportintersection_of: GO:0006810 ! transportintersection_of: results_in_transport_of CHEBI:15354 ! choline
Automating integration using cross-products – pathway databases
• We can also automatically map:– catalysis terms [165*]– transport [373]– binding [133]– phosphorylation and other modifications– metabolism [278]– signaling– …
• All this relies on different cross-product files• Any pathway database that exports BioPax-OWL can be used
– E.g humancyc, mousecyc, pathwaycommons, …
*Numbers for Reactome-human
Automating integration using cross-products – interaction databases
[Term]id: GO:0043184name: vascular endothelial growth factor receptor 2 bindingintersection_of: GO:0005488 ! bindingintersection_of: results_in_binding_of PRO:000002112 ! VEGFR 2
FIGF VEGFRbinds
is_a
has_function
Automated Integration: Results
• Reactome– Evaluation in progress– Many manually assigned equivalencies recapitulated– Inferred equivalencies differed in some cases
• sometimes better than manually assigned• sometimes required info not in biopax export• ongoing discussions
• BioGrid– not evaluated (all trivial)– inferred annotations improve some enrichment results
• E.g. Brentani angiogenesis gene sets, increased enrichment for VEGFR binding– Obvious but useful as proof of concept
Conclusions and future work
• We can be more efficient:– Coordinate with systems bio databases to divide labor– Prevent siloing through semi-automated integration– GO acts as a high-level ‘window’ on systems biology databases
• Still to be done:– Make integration tool production-ready– Reconcile existing mis-alignments, particularly signaling
• highly inconsistent between GO and Reactome
– Explore open questions – e.g. auto-generate terms?– Finish cross-products, they are vital
• particular PRO, CHEBI