Go pathway-interaction-integration

Integration of GO, Pathway data and Interaction data

Chris MungallPeter D’Eustachio

The GO was originally intended to integrate databases

• How are we doing? Interoperability of genomic databases is limited by this lack of progress, and it is this major obstacle that the Gene Ontology (GO) Consortium was formed to addressGene Ontology: Tool for the Unification of Biology. Nat Genet 2000

SGD FB GOA

GO

The GO was originally intended to integrate databases

• How are we doing?Not as well as we could!

SGD FB GOA

BioGRID IntactReactome Cyc

IMEXPathway Commons

……

GO

Integration enhances analyses and reduces workload

• Division of labor– leave specialized curation to specialized systems biology

databases– but data needs to be re-combined to prevent siloing

• GO is an invaluable single-stop shop for term enrichment etc

• Can we quantify how integrating with systems biology databases helps users?– Yes! We can do the experiment:– GO term enrichment analysis on all MolSigDB

• with Reactome annotations– Also include Reactome inputs/outputs, not currently in GOA

• without Reactome annotations

Integration enhances analyses

• GOA+R: Many p-values will significantly improved– Recapitulated biologically valid results that would

have been suppressed had one single resource been used

– Examples:• Genes down-regulated in Alzheimers

GOA without R GOA with R (enhanced)

oxidative phosophorylation 7 x 10-29 1.2 x 10-44

regulation of insulin secretion 0.72 4 x 10-46

How are we currently integrating systems biology datasets?

• Interaction data– Currently Intact, soon IMEX– “protein binding” and “self-protein binding” only (+with)

• Pathway data– Currently Reactome only– Loses much of what is in Reactome

• E,g,inputs and outputs

– Manually curated GO<->Reactome links• incomplete• not always to the most specific term• labor-intensive• become stale over time• other pathway databases?

• This can be improved!

Automating integration using cross-product definitions – pathway databases

[Term]id: GO:0015871name: choline transportintersection_of: GO:0006810 ! transportintersection_of: results_in_transport_of CHEBI:15354 ! choline

Automating integration using cross-products – pathway databases

• We can also automatically map:– catalysis terms [165*]– transport [373]– binding [133]– phosphorylation and other modifications– metabolism [278]– signaling– …

• All this relies on different cross-product files• Any pathway database that exports BioPax-OWL can be used

– E.g humancyc, mousecyc, pathwaycommons, …

*Numbers for Reactome-human

Automating integration using cross-products – interaction databases

[Term]id: GO:0043184name: vascular endothelial growth factor receptor 2 bindingintersection_of: GO:0005488 ! bindingintersection_of: results_in_binding_of PRO:000002112 ! VEGFR 2

FIGF VEGFRbinds

is_a

has_function

Automated Integration: Results

• Reactome– Evaluation in progress– Many manually assigned equivalencies recapitulated– Inferred equivalencies differed in some cases

• sometimes better than manually assigned• sometimes required info not in biopax export• ongoing discussions

• BioGrid– not evaluated (all trivial)– inferred annotations improve some enrichment results

• E.g. Brentani angiogenesis gene sets, increased enrichment for VEGFR binding– Obvious but useful as proof of concept

Conclusions and future work

• We can be more efficient:– Coordinate with systems bio databases to divide labor– Prevent siloing through semi-automated integration– GO acts as a high-level ‘window’ on systems biology databases

• Still to be done:– Make integration tool production-ready– Reconcile existing mis-alignments, particularly signaling

• highly inconsistent between GO and Reactome

– Explore open questions – e.g. auto-generate terms?– Finish cross-products, they are vital

• particular PRO, CHEBI

Go pathway-interaction-integration

Documents

Transcript of Go pathway-interaction-integration