EcoCyc , MetaCyc, and the Pathway Tools Software
description
Transcript of EcoCyc , MetaCyc, and the Pathway Tools Software
1 SRI International Bioinformatics
EcoCyc, MetaCyc, and the Pathway Tools Software
Peter D. Karp, Ph.D.Bioinformatics Research Group
http://www.ai.sri.com/pkarp/talks/
BioCyc.orgEcoCyc.org, MetaCyc.org
2 SRI International Bioinformatics
MetaCyc Family ofPathway/Genome Databases
1,700+ databases from multiple institutionsCover all domains of life with microbial emphasis
All DBs derived from MetaCyc via computational pathway prediction
Common schemaCommon controlled
vocabulariesCommon methodologies
Archives of Toxicology 2011
3 SRI International Bioinformatics
Curated Databases Within the MetaCyc Family
Database Organism Organization Curated From
MetaCyc Multiorganism SRI 26,000
EcoCyc E. coli SRI 21,000
HumanCyc H. sapiens SRI
AraCyc A. thaliana Carnegie Instit. 2,282
YeastCyc S. cerevisiae Stanford Univ 565
MouseCyc M. musculus Jackson Labs
4 SRI International Bioinformatics
BioCyc Collection of 1,100 Pathway/Genome Databases
Pathway/Genome Database (PGDB) – combines information about
Pathways, reactions, substrates Enzymes, transporters Genes, replicons Transcription factors/sites, promoters,
operons
Tier 1: Literature-Derived PGDBs MetaCyc EcoCyc -- Escherichia coli K-12
Tier 2: Computationally-derived DBs, Some Curation -- 28 PGDBs
HumanCyc, BsubCyc Mycobacterium tuberculosis
Tier 3: Computationally-derived DBs, No Curation -- The remainder
5 SRI International Bioinformatics
EcoCyc Project – EcoCyc.org E. coli Encyclopedia
Review-level Model-Organism Database for E. coli Tracks evolving annotation of the E. coli genome and cellular networks The two paradigms of EcoCyc
“Multi-dimensional annotation of the E. coli K-12 genome” Positions of genes; functions of gene products – 76% / 66% exp Gene Ontology terms; MultiFun terms Gene product summaries and literature citations Evidence codes Multimeric complexes Metabolic pathways Regulation of gene expression and of protein activity
Nuc. Acids Res. 35:7577 2007 ASM News 70:25 2004 Science 293:2040
Karp, Gunsalus, Collado-Vides, Paulsen
6 SRI International Bioinformatics
EcoCyc = E.coli Dataset + Pathway/Genome Navigator
Genes: 4,489
Proteins: 4,479Complexes: 895
RNAs: 285
Reactions: Metabolic: 1446 Transport: 287
Pathways: 260
Compounds: 1,830
URL: EcoCyc.org
Regulation: Operons: 3,409 Trans Factors: 206 Promoters: 1,878 TF Binding Sites: 2,394 Reg Interactions: 5345
EcoCyc v15.0
Citations: 21,000
7 SRI International Bioinformatics
EcoCyc on the iPhone
8 SRI International Bioinformatics
EcoCyc on the iPhone
9 SRI International Bioinformatics
PortEco.org
EcoCyc + PortEco = E. coli model-organism database
Query multiple E. coli databases simultaneouslyE. coli gene expression archiveE. coli Wiki~40 E. coli and Shigella databases available at
BioCyc.org
10 SRI International Bioinformatics
MetaCyc: Metabolic Encyclopedia Describe a representative sample of every experimentally
determined metabolic pathway Describe properties of metabolic enzymes
Literature-based DB with extensive references and commentary
Pathways, reactions, enzymes, substrates
MetaCyc vs BioCyc: Experimentally elucidated pathways
Jointly developed by P. Karp, R. Caspi, C. Fulcher, SRI International L. Mueller, A. Pujar, Boyce Thompson Institute S. Rhee, P. Zhang, Carnegie Institution
Nucleic Acids Research 2010
11 SRI International Bioinformatics
Applications of MetaCyc
Reference source on metabolic pathways and enzymes
Predict pathways from genomes
Metabolic engineering Find desired metabolic pathways and reactions Find enzymes with desired activities, regulatory properties Determine cofactor requirements
12 SRI International Bioinformatics
MetaCyc Data -- Version 15.4
Pathways 1,747
Reactions 9,460
Enzymes 7,424
Small Molecules
9,188
Organisms 2,170
Citations 29,900
16 SRI International Bioinformatics
Pathway Tools Software
17 SRI International Bioinformatics
Pathway Tools Software
Pathway/GenomeEditors
Pathway/GenomeDatabase
PathoLogicAnnotatedGenome
Pathway/GenomeNavigator
Briefings in Bioinformatics 11:40-79 2010
+
Genome-ScaleFlux Model
18 SRI International Bioinformatics
Pathway Tools Software: PathoLogic
Computational creation of new Pathway/Genome Databases
Transforms genome into Pathway Tools schema and layers inferred information above the genome
Predicts operonsPredicts metabolic networkPredicts which genes code for missing enzymes
in metabolic pathways Infers transport reactions from transporter names
19 SRI International Bioinformatics
Pathway Tools Software:Pathway/Genome Editors Interactively update PGDBs
with graphical editors
Support geographically distributed teams of curators with object database system
Gene and protein editor Reaction editor Compound editor Pathway editor Operon editor Publication editor
20 SRI International Bioinformatics
Pathway Tools Software:Pathway/Genome Navigator
Querying and visualization of: Pathways Reactions Metabolites Genes/Proteins/RNA Regulatory interactions Chromosomes
Two modes of operation: Web mode Desktop mode Most functionality shared, but each
has unique functionality
23 SRI International Bioinformatics
Cellular Overview Diagram
Combines metabolic map and transportersAutomatically generated for each organismZoomable, queryableWeb-based and desktop
BioCyc.org Tools Cellular Overview Tools Regulatory Overview Fastest with Safari, Chrome, Firefox
24 SRI International Bioinformatics
25 SRI International Bioinformatics
26 SRI International Bioinformatics
27 SRI International Bioinformatics
Omics Data Graphing on Cellular Overview
28 SRI International Bioinformatics
29 SRI International Bioinformatics
30 SRI International Bioinformatics
Genome Overview
31 SRI International Bioinformatics
Genome Poster
32 SRI International Bioinformatics
Regulatory Overview and Omics ViewerShow regulatory relationships among gene
groups
33 SRI International Bioinformatics
Genome BrowserChIP-Chip Data Shown in Graph Track
34 SRI International Bioinformatics
Enrichment Analysis
“My experiments yielded a set of genes/metabolites. What do they have in common?”
Given a set of genes: What GO terms are statistically over-represented in that set? What metabolic pathways are over-represented? What transcriptional regulators are over-represented?
Given a set of metabolites: What metabolic pathways are statistically over-represented in
that set?
35 SRI International Bioinformatics
Automated Generation of Metabolic Flux Models from
PGDBs
Joint work with Mario Latendresse
36 SRI International Bioinformatics
Goals
Decrease the time required to construct FBA models from 9-12 months to several weeks
Create richer FBA models that are tightly coupled to genome and regulatory information
Make FBA models and results more transparent
37 SRI International Bioinformatics
Approach: Derive FBA Models from PGDBs
Store and update metabolic model within Pathway Tools Export to constraint solver for model execution/solving
Fast generation of metabolic model from annotated genome Pathway Tools schema
Associate a wealth of information with each metabolic model Unique identifiers and controlled vocabulary for model components
Tools for querying and visualization of metabolic models Tools for model debugging and analysis
Reaction balance checking Dead-end metabolite analysis Visualize reaction flux using cellular overview Multiple gap filling
40 SRI International Bioinformatics
FBA Model Execution
Runs SCIP solver on .lp file Konrad-Zuse-Zentrum für Informationstechnik Berlin
Interpret SCIP output Determine if SCIP found a solution Map fluxes to PGDB reactions
Display resulting fluxes on the Cellular Overview
41 SRI International Bioinformatics
Model Debugging via Multiple Gap Filling
Most FBA models are not initially solvable because of incomplete or incorrect information
Use meta-optimization to postulate alterations to a model to render it solvable
Each alteration has an associated cost; minimize cost of alterations
Formulate as MILP and submit to SCIP
42 SRI International Bioinformatics
Multiple Gap Filling of FBA Models
Reaction gap filling (Kumar et al, BMC Bioinf 2007 8:212): Reverse directionality of selected reactions Add a minimal number of reactions from MetaCyc to the
model to enable a solution Reaction cost is a function of reaction taxonomic range
Metabolite gap filling: Postulate additional nutrients and secretions
Partial solutions: Identify maximal subset of biomass components for which model can yield positive production rates
46 SRI International Bioinformatics
47 SRI International Bioinformatics
Comparative Analysis
Via Cellular Overview
Comparative genome browser
Comparative pathway table
Comparative analysis reports Compare reaction complements Compare pathway complements Compare transporter complements
48 SRI International Bioinformatics
Advanced Query FormIntuitive construction of complex database
queries of SQL power
49 SRI International Bioinformatics
Work in Progress
Computation of reaction atom mappings
Program to generate metabolic pathways that synthesize target compound from feedstock compound
50 SRI International Bioinformatics
How to Learn More
BioCyc.org Help menu
BioCyc Webinars Biocyc.org/webinar.shtml
Publications page Biocyc.org/publications.shtml
Tutorials held at SRI Next week: FBA