1 SRI International Bioinformatics The BioCyc Ontologies Markus Krummenacker Bioinformatics Research...
-
Upload
janis-terry -
Category
Documents
-
view
219 -
download
0
Transcript of 1 SRI International Bioinformatics The BioCyc Ontologies Markus Krummenacker Bioinformatics Research...
![Page 1: 1 SRI International Bioinformatics The BioCyc Ontologies Markus Krummenacker Bioinformatics Research Group SRI International kr@ai.sri.com BioCyc.org EcoCyc.org,](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649e185503460f94b03a75/html5/thumbnails/1.jpg)
1 SRI International Bioinformatics
The BioCyc Ontologies
Markus KrummenackerBioinformatics Research Group
SRI International
BioCyc.org
EcoCyc.org, MetaCyc.org, HumanCyc.org
![Page 2: 1 SRI International Bioinformatics The BioCyc Ontologies Markus Krummenacker Bioinformatics Research Group SRI International kr@ai.sri.com BioCyc.org EcoCyc.org,](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649e185503460f94b03a75/html5/thumbnails/2.jpg)
2 SRI International Bioinformatics
Overview
Pathway/Genome Databases (PGDBs) BioCyc collection EcoCyc, MetaCyc
Pathway Tools Software & Applications Visualization, Editing, Analysis, Omics data Inference tools: PathoLogic, Operon predictor, Pathway hole
filler Tools for debugging a predicted metabolic network
Some Ontology Details Pathways, Reactions and Compounds, Enzymes, Genes Regulation Integration with other efforts: BioPAX, GO, NCBI Taxonomy
![Page 3: 1 SRI International Bioinformatics The BioCyc Ontologies Markus Krummenacker Bioinformatics Research Group SRI International kr@ai.sri.com BioCyc.org EcoCyc.org,](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649e185503460f94b03a75/html5/thumbnails/3.jpg)
3 SRI International Bioinformatics
Model Organism Databases / PGDBs
DBs that describe the genome and molecular machinery of one specific organism.
Integrating many diverse types of data into a coherent model of a cell
Every sequenced organism with an active experimental community requires a MOD
Integrate genome data with information about the biochemical and genetic network of the organism
Integrate literature-based information with computational predictions Ongoing updating of sequence, gene positions and functions, regulatory
sites, pathways
MODs are platforms for global analyses of the organism Interpret omics data in a pathway context In silico prediction of essential genes Characterize systems properties of metabolic and genetic networks
![Page 4: 1 SRI International Bioinformatics The BioCyc Ontologies Markus Krummenacker Bioinformatics Research Group SRI International kr@ai.sri.com BioCyc.org EcoCyc.org,](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649e185503460f94b03a75/html5/thumbnails/4.jpg)
4 SRI International Bioinformatics
BioCyc Collection of Pathway/Genome Databases
Pathway/Genome Database (PGDB) – combines information about
Pathways, reactions, substrates Enzymes, transporters Genes, replicons Transcription factors/sites, promoters,
operons
Tier 1: Literature-Derived PGDBs MetaCyc EcoCyc -- Escherichia coli K-12
Tier 2: Computationally-derived DBs, Some Curation -- 20 PGDBs
HumanCyc Mycobacterium tuberculosis
Tier 3: Computationally-derived DBs, No Curation -- 349 DBs
![Page 5: 1 SRI International Bioinformatics The BioCyc Ontologies Markus Krummenacker Bioinformatics Research Group SRI International kr@ai.sri.com BioCyc.org EcoCyc.org,](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649e185503460f94b03a75/html5/thumbnails/5.jpg)
5 SRI International Bioinformatics
Pathway Tools: PathoLogic Inference
Pathway/GenomeEditors
Pathway/GenomeDatabase
PathoLogicAnnotatedGenome
MetaCycReference
Pathway DB
Pathway/GenomeNavigator
![Page 6: 1 SRI International Bioinformatics The BioCyc Ontologies Markus Krummenacker Bioinformatics Research Group SRI International kr@ai.sri.com BioCyc.org EcoCyc.org,](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649e185503460f94b03a75/html5/thumbnails/6.jpg)
6 SRI International Bioinformatics
Pathway Tools Software: PGDBs Created Outside SRI
1,300+ licensees: 75+ groups applying software to 200+ organisms
Saccharomyces cerevisiae, SGD project, Stanford UniversityMouse, MGD, Jackson LaboratorydictyBase, Northwestern UniversityUnder development:
CGD (Candida albicans), Stanford University Drosophila, P. Ebert in collaboration with FlyBase C. elegans, P. Ebert in collaboration with WormBase
Planned: RGD (Rat), Medical College of Wisconsin
Arabidopsis thaliana, TAIR, Carnegie Institution of WashingtonPlantCyc, ~20 plant PGDBs, Carnegie Institution of WashingtonSix Solanaceae species, Cornell University GrameneDB, Cold Spring Harbor LaboratoryMedicago truncatula, Samuel Roberts Noble Foundation
![Page 7: 1 SRI International Bioinformatics The BioCyc Ontologies Markus Krummenacker Bioinformatics Research Group SRI International kr@ai.sri.com BioCyc.org EcoCyc.org,](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649e185503460f94b03a75/html5/thumbnails/7.jpg)
7 SRI International Bioinformatics
Pathway Tools Software: PGDBs Created Outside SRI
BioHealthBase (M. tuberculosis, F. tuleremia), PATRIC, ApiDBGary Xie, Los Alamos Lab, Dental pathogensF. Brinkman, Simon Fraser Univ, Pseudomonas aeruginosaV. Schachter, Genoscope, AcinetobacterM. Bibb, John Innes Centre, Streptomyces coelicolorG. Church, Harvard, Prochlorococcus marinus, multiple strainsE. Uberbacher, ORNL and G. Serres, MBL, Shewanella onedensisR.J.S. Baerends, University of Groningen, Lactococcus lactis IL1403, Lactococcus lactis MG1363, Streptococcus pneumoniae TIGR4, Bacillus subtilis 168, Bacillus cereus ATCC14579Matthew Berriman, Sanger Centre, Trypanosoma brucei, Leishmania majorHerbert Chiang, Washington University, Bacteroides thetaiotaomicronSergio Encarnacion, UNAM, Sinorhizobium melilotiGregory Fournier, MIT, Mesoplasma florumMark van der Giezen, University of London, Entamoeba histolytica, Giardia intestinalis Michael Gottfert, Technische Universitat Dresden, Bradyrhizobium japonicumArtiva Maria Goudel, Universidade Federal de Santa Catarina, Brazil, Chromobacterium violaceum ATCC 12472
![Page 8: 1 SRI International Bioinformatics The BioCyc Ontologies Markus Krummenacker Bioinformatics Research Group SRI International kr@ai.sri.com BioCyc.org EcoCyc.org,](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649e185503460f94b03a75/html5/thumbnails/8.jpg)
8 SRI International Bioinformatics
Pathway Tools Software: PGDBs Created Outside SRI
Large scale users: C. Medigue, Genoscope, 150+ PGDBs G. Burger, U Montreal, 60+ PGDBs Bart Weimer, Utah State University, Lactococcus lactis, Brevibacterium linens,
Lactobacillus acidophilus, Lactobacillus plantarum, Lactobacillus johnsonii, Listeria monocytogenes
Partial listing of outside PGDBs at BioCyc.org
![Page 9: 1 SRI International Bioinformatics The BioCyc Ontologies Markus Krummenacker Bioinformatics Research Group SRI International kr@ai.sri.com BioCyc.org EcoCyc.org,](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649e185503460f94b03a75/html5/thumbnails/9.jpg)
9 SRI International Bioinformatics
Pathway Evidence
![Page 10: 1 SRI International Bioinformatics The BioCyc Ontologies Markus Krummenacker Bioinformatics Research Group SRI International kr@ai.sri.com BioCyc.org EcoCyc.org,](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649e185503460f94b03a75/html5/thumbnails/10.jpg)
10 SRI International Bioinformatics
Pathway Tools Overviews and Omics Viewers
Designed to avoid the hairball effectGenerated automatically from PGDBMagnify, interrogateOmics viewers paint omics data onto overview diagrams
Different perspectives on same dataset Use animation for multiple time points or
conditions Paint any data that associates numbers
with genes, proteins, reactions, or metabolites
Provide genome-scale visualizations of cellular networksHarness human visual system to interpret patterns in biological contexts
![Page 11: 1 SRI International Bioinformatics The BioCyc Ontologies Markus Krummenacker Bioinformatics Research Group SRI International kr@ai.sri.com BioCyc.org EcoCyc.org,](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649e185503460f94b03a75/html5/thumbnails/11.jpg)
11 SRI International Bioinformatics
Regulatory Overview and Omics Viewer
Show regulatory relationships among gene groups
![Page 12: 1 SRI International Bioinformatics The BioCyc Ontologies Markus Krummenacker Bioinformatics Research Group SRI International kr@ai.sri.com BioCyc.org EcoCyc.org,](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649e185503460f94b03a75/html5/thumbnails/12.jpg)
12 SRI International Bioinformatics
![Page 13: 1 SRI International Bioinformatics The BioCyc Ontologies Markus Krummenacker Bioinformatics Research Group SRI International kr@ai.sri.com BioCyc.org EcoCyc.org,](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649e185503460f94b03a75/html5/thumbnails/13.jpg)
13 SRI International Bioinformatics
![Page 14: 1 SRI International Bioinformatics The BioCyc Ontologies Markus Krummenacker Bioinformatics Research Group SRI International kr@ai.sri.com BioCyc.org EcoCyc.org,](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649e185503460f94b03a75/html5/thumbnails/14.jpg)
14 SRI International Bioinformatics
Comparative Analysis
Via Cellular Overview
Comparative genome browser
Comparative pathway table
Comparative analysis reports Compare reaction complements Compare pathway complements Compare transporter complements
![Page 15: 1 SRI International Bioinformatics The BioCyc Ontologies Markus Krummenacker Bioinformatics Research Group SRI International kr@ai.sri.com BioCyc.org EcoCyc.org,](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649e185503460f94b03a75/html5/thumbnails/15.jpg)
15 SRI International Bioinformatics
Pathway Tools Ontology
1621 Classes Main classes such as:
Pathways, Reactions, Compounds, Macromolecules, Proteins, Replicons, DNA-Segments (Genes, Operons, Promoters)
Taxonomies for Pathways, Reactions (EC), Compounds Cell Component Ontology Protein Feature ontology
221 Slots for attributes and relationships Meta-data: Creator, Creation-Date Comment, Citations, Common-Name, Synonyms Attributes: Molecular-Weight, DNA-Footprint-Size Relationships: Catalyzes, Component-Of, Product
Evidence codes, supporting citations
![Page 16: 1 SRI International Bioinformatics The BioCyc Ontologies Markus Krummenacker Bioinformatics Research Group SRI International kr@ai.sri.com BioCyc.org EcoCyc.org,](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649e185503460f94b03a75/html5/thumbnails/16.jpg)
16 SRI International Bioinformatics
Pathway/Genome Database Schema
![Page 17: 1 SRI International Bioinformatics The BioCyc Ontologies Markus Krummenacker Bioinformatics Research Group SRI International kr@ai.sri.com BioCyc.org EcoCyc.org,](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649e185503460f94b03a75/html5/thumbnails/17.jpg)
17 SRI International Bioinformatics
Protein Feature Ontology
![Page 18: 1 SRI International Bioinformatics The BioCyc Ontologies Markus Krummenacker Bioinformatics Research Group SRI International kr@ai.sri.com BioCyc.org EcoCyc.org,](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649e185503460f94b03a75/html5/thumbnails/18.jpg)
18 SRI International Bioinformatics
Advanced Query FormIntuitive construction of complex database
queries of SQL power
![Page 19: 1 SRI International Bioinformatics The BioCyc Ontologies Markus Krummenacker Bioinformatics Research Group SRI International kr@ai.sri.com BioCyc.org EcoCyc.org,](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649e185503460f94b03a75/html5/thumbnails/19.jpg)
19 SRI International Bioinformatics
Enzymatic-Reactions
Sdh-flavo Sdh-Fe-S Sdh-membrane-1 Sdh-membrane-2
sdhA sdhB sdhC sdhD
Succinate + FAD = fumarate + FADH2
Enzymatic-reaction
Succinate dehydrogenase
TCA Cycle
product
component-of
catalyzes
reaction
in-pathway
![Page 20: 1 SRI International Bioinformatics The BioCyc Ontologies Markus Krummenacker Bioinformatics Research Group SRI International kr@ai.sri.com BioCyc.org EcoCyc.org,](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649e185503460f94b03a75/html5/thumbnails/20.jpg)
20 SRI International Bioinformatics
Need for Enzymatic-Reactions
Reactions can have isozymes Enzymes can be multi-functional
Enzymatic-Reaction frames are needed to decouple the many-to-many relationships
Isozymes may have different inhibitors, etc.
Gene-Reaction schema diagrams:
![Page 21: 1 SRI International Bioinformatics The BioCyc Ontologies Markus Krummenacker Bioinformatics Research Group SRI International kr@ai.sri.com BioCyc.org EcoCyc.org,](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649e185503460f94b03a75/html5/thumbnails/21.jpg)
21 SRI International Bioinformatics
New Representation of RegulationPreviously, regulation was represented idiosyncratically:
One representation for modulation of enzymes Completely different representation for regulation of transcription initiation
Now unified under single Regulation class w/ subclassesThis enables us to easily add support for new kinds of regulation, e.g.
Transcriptional attenuation (done) Regulation of translation by small RNAs (in progress)
New tools for display and editing of new Regulation classes
![Page 22: 1 SRI International Bioinformatics The BioCyc Ontologies Markus Krummenacker Bioinformatics Research Group SRI International kr@ai.sri.com BioCyc.org EcoCyc.org,](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649e185503460f94b03a75/html5/thumbnails/22.jpg)
22 SRI International Bioinformatics
Operons and Transcription Units
Operon: A set of two or more genes that are transcribed as a unit. May include multiple promoters.
Transcription Unit: A set of one or more genes that are transcribed as a unit from a single promoter.
Pathway Tools schema does not represent operons explicitly, only transcription-units
![Page 23: 1 SRI International Bioinformatics The BioCyc Ontologies Markus Krummenacker Bioinformatics Research Group SRI International kr@ai.sri.com BioCyc.org EcoCyc.org,](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649e185503460f94b03a75/html5/thumbnails/23.jpg)
23 SRI International Bioinformatics
Ontology for Transcriptional Regulation
trpLEDCBAp1
trpE
trpD
trpC
trpB
trpA
trpL
reg001
site001
TrpR*trp
trpLEDCBA
trp
apoTrpRBR001
components
left
right
regulated-by
associated-binding-site
regulator
![Page 24: 1 SRI International Bioinformatics The BioCyc Ontologies Markus Krummenacker Bioinformatics Research Group SRI International kr@ai.sri.com BioCyc.org EcoCyc.org,](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649e185503460f94b03a75/html5/thumbnails/24.jpg)
24 SRI International Bioinformatics
Representation of Transcriptional Regulation
Transcription-Unit Components include genes, a single promoter, zero or more terminators
Binding-Sites Linked to regulation frames
Regulation frames Transcriptional Initiation: defines a 3-way pairing between promoter,
transcription factor and binding-site Transcriptional Attenuation: defines relationship between terminator and
the entity (tRNA, protein, small molecule) that regulates it.
![Page 25: 1 SRI International Bioinformatics The BioCyc Ontologies Markus Krummenacker Bioinformatics Research Group SRI International kr@ai.sri.com BioCyc.org EcoCyc.org,](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649e185503460f94b03a75/html5/thumbnails/25.jpg)
25 SRI International Bioinformatics
Infer Anti-Microbial Drug Targets
Infer drug targets as genes coding for enzymes that encode chokepoint reactions
Two types of chokepoint reactions:
Genome Research 14:917 2004
![Page 26: 1 SRI International Bioinformatics The BioCyc Ontologies Markus Krummenacker Bioinformatics Research Group SRI International kr@ai.sri.com BioCyc.org EcoCyc.org,](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649e185503460f94b03a75/html5/thumbnails/26.jpg)
26 SRI International Bioinformatics
Reachability Analysis of Metabolic Network
Given: A PGDB for an organism A set of initial metabolites
Infer: What set of products can be synthesized by the small-
molecule metabolism of the organism
Can known growth medium yield known essential compounds?
Romero and Karp, Pacific Symposium on Biocomputing, 2001
![Page 27: 1 SRI International Bioinformatics The BioCyc Ontologies Markus Krummenacker Bioinformatics Research Group SRI International kr@ai.sri.com BioCyc.org EcoCyc.org,](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649e185503460f94b03a75/html5/thumbnails/27.jpg)
27 SRI International Bioinformatics
Algorithm: Forward PropagationThrough Production System
Each reaction becomes a production rule Each metabolite in nutrient set becomes an axiom
Nutrientset
Metaboliteset
“Fire”reactions
Transport
Products
Reactants
PGDBreaction
pool
![Page 28: 1 SRI International Bioinformatics The BioCyc Ontologies Markus Krummenacker Bioinformatics Research Group SRI International kr@ai.sri.com BioCyc.org EcoCyc.org,](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649e185503460f94b03a75/html5/thumbnails/28.jpg)
28 SRI International Bioinformatics
![Page 29: 1 SRI International Bioinformatics The BioCyc Ontologies Markus Krummenacker Bioinformatics Research Group SRI International kr@ai.sri.com BioCyc.org EcoCyc.org,](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649e185503460f94b03a75/html5/thumbnails/29.jpg)
29 SRI International Bioinformatics
Results
Phase I: Forward propagation 21 initial compounds yielded only half of the 41 essential compounds for E.
coli
Phase II: Manually identify Bugs in EcoCyc (e.g., two objects for tryptophan)
A B B’ C Incomplete knowledge of E. coli metabolic network
A + B C + D “Bootstrap compounds” Missing initial protein substrates (e.g., ACP)
Protein synthesis not represented
Phase III: Forward propagation with 11 more initial metabolites
Yielded all 41 essential compounds
![Page 30: 1 SRI International Bioinformatics The BioCyc Ontologies Markus Krummenacker Bioinformatics Research Group SRI International kr@ai.sri.com BioCyc.org EcoCyc.org,](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649e185503460f94b03a75/html5/thumbnails/30.jpg)
30 SRI International Bioinformatics
Integration with other efforts
Export of BioPAX SBML
Import of Enzyme DB (EC hierarchy of reactions) GO NCBI Taxonomy BioPAX (work in progress)
![Page 31: 1 SRI International Bioinformatics The BioCyc Ontologies Markus Krummenacker Bioinformatics Research Group SRI International kr@ai.sri.com BioCyc.org EcoCyc.org,](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649e185503460f94b03a75/html5/thumbnails/31.jpg)
31 SRI International Bioinformatics
Near Future
Signalling pathways Validating the design
Regulation Small RNAs, and other additional types
Higher Eukaryotes Gene expression, Multiple splice forms Cell types, localization
![Page 32: 1 SRI International Bioinformatics The BioCyc Ontologies Markus Krummenacker Bioinformatics Research Group SRI International kr@ai.sri.com BioCyc.org EcoCyc.org,](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649e185503460f94b03a75/html5/thumbnails/32.jpg)
32 SRI International Bioinformatics
Summary
Pathway/Genome Databases MetaCyc non-redundant DB of literature-derived pathways 370 organism-specific PGDBs available through SRI at
BioCyc.org Computational theories of biochemical machinery
Pathway Tools software Extract pathways from genomes Morph annotated genome into structured ontology Distributed curation tools for MODs Query, visualization, WWW publishing
![Page 33: 1 SRI International Bioinformatics The BioCyc Ontologies Markus Krummenacker Bioinformatics Research Group SRI International kr@ai.sri.com BioCyc.org EcoCyc.org,](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649e185503460f94b03a75/html5/thumbnails/33.jpg)
33 SRI International Bioinformatics
BioCyc and Pathway Tools Availability
BioCyc.org Web site and database files freely available to all
Pathway Tools freely available to non-profits Macintosh, PC/Windows, PC/Linux
References Pathway Tools User’s Guide
Appendix A: Guide to the Pathway Tools Schema Ontology Papers section of http://biocyc.org/publications.shtml
![Page 34: 1 SRI International Bioinformatics The BioCyc Ontologies Markus Krummenacker Bioinformatics Research Group SRI International kr@ai.sri.com BioCyc.org EcoCyc.org,](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649e185503460f94b03a75/html5/thumbnails/34.jpg)
34 SRI International Bioinformatics
Acknowledgements
SRI Suzanne Paley, Ron Caspi,
Ingrid Keseler, Carol Fulcher, Markus Krummenacker, Alex Shearer, Tomer Altman, Joe Dale, Fred Gilham, Pallavi Kaipa
EcoCyc Collaborators Julio Collado-Vides, Robert
Gunsalus, Ian Paulsen
MetaCyc Collaborators Sue Rhee, Peifen Zhang, Kate
Dreher Lukas Mueller, Anuradha Pujar
Funding sources: NIH National Center for
Research Resources NIH National Institute of
General Medical Sciences NIH National Human
Genome Research Institute
BioCyc.org
Learn more from BioCyc webinars: biocyc.org/webinar.shtml
![Page 35: 1 SRI International Bioinformatics The BioCyc Ontologies Markus Krummenacker Bioinformatics Research Group SRI International kr@ai.sri.com BioCyc.org EcoCyc.org,](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649e185503460f94b03a75/html5/thumbnails/35.jpg)
35 SRI International Bioinformatics
BioWarehouse: A Bioinformatics Database
WarehousePeter D. Karp, Tom J. Lee, Valerie Wagner
Oracle (10g) orMySQL (4.1.11)
UniProt
ENZYME
Genbank
Taxonomy
BioCyc
BioPAX
BioWarehouse
GO
MAGE-ML
KEGG
CMR
Eco2DBase
BMC Bioinformatics 7:170 2006bioinformatics.ai.sri.com/biowarehouse/
![Page 36: 1 SRI International Bioinformatics The BioCyc Ontologies Markus Krummenacker Bioinformatics Research Group SRI International kr@ai.sri.com BioCyc.org EcoCyc.org,](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649e185503460f94b03a75/html5/thumbnails/36.jpg)
36 SRI International Bioinformatics
Motivations
Hundreds of bioinformatics DBs exist
Important problems involve queries across multiple DBs
![Page 37: 1 SRI International Bioinformatics The BioCyc Ontologies Markus Krummenacker Bioinformatics Research Group SRI International kr@ai.sri.com BioCyc.org EcoCyc.org,](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649e185503460f94b03a75/html5/thumbnails/37.jpg)
37 SRI International Bioinformatics
Why is the Multidatabase Approach Alone Not Sufficient?
Multidatabase query approaches assume databases are in a queryable DBMS
Most sites that do operate DBMSs do not allow remote query access because of security and loading concerns
Users want to control data stability Users want to control speed of their hardware Internet bandwidth limits query throughput Users need to capture, integrate and publish
locally produced data of different types
Multidatabase and Warehouse approaches complementary
![Page 38: 1 SRI International Bioinformatics The BioCyc Ontologies Markus Krummenacker Bioinformatics Research Group SRI International kr@ai.sri.com BioCyc.org EcoCyc.org,](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649e185503460f94b03a75/html5/thumbnails/38.jpg)
38 SRI International Bioinformatics
Key Challenges for BioWarehouse
Designing a schema that accurately captures the contents of source DBs
Designing a schema that is understandable and scalable
Addressing poorly-specified syntax & semantics of source DBs
Balancing the preservation of source data with mapping into common semantics
![Page 39: 1 SRI International Bioinformatics The BioCyc Ontologies Markus Krummenacker Bioinformatics Research Group SRI International kr@ai.sri.com BioCyc.org EcoCyc.org,](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649e185503460f94b03a75/html5/thumbnails/39.jpg)
39 SRI International Bioinformatics
Technical Approach Multi-platform support: Oracle (10g) and MySQL Schema support for multitude of bioinformatics
datatypes Create loaders for public bioinformatics DBs
Parse file format of the source DB Semantic transformations Insert DB contents into warehouse tables
Provide Warehouse query access mechanisms SQL queries via ODBC, JDBC, OAA
Operate public BioWarehouse server: publichouse
BMC Bioinformatics 7:170 2006
![Page 40: 1 SRI International Bioinformatics The BioCyc Ontologies Markus Krummenacker Bioinformatics Research Group SRI International kr@ai.sri.com BioCyc.org EcoCyc.org,](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649e185503460f94b03a75/html5/thumbnails/40.jpg)
40 SRI International Bioinformatics
PublicHouse Server Publicly queryable BioWarehouse server operated by SRI
Manages a set of biological DBs constructed using BioWarehouse
CMR Open BioCyc DBs ENZYME NCBI Taxonomy UniProt
Large-scale data mining using Dashboard Warehouse Query Analyzer MySQL client command line
See: http://bioinformatics.ai.sri.com/biowarehouse/publichouse.html
Host: publichouse.sri.comPort: 3306Database: biospice
![Page 41: 1 SRI International Bioinformatics The BioCyc Ontologies Markus Krummenacker Bioinformatics Research Group SRI International kr@ai.sri.com BioCyc.org EcoCyc.org,](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649e185503460f94b03a75/html5/thumbnails/41.jpg)
41 SRI International Bioinformatics
BioWarehouse Schema
Manages many bioinformatics datatypes simultaneously Pathways, Reactions, Chemicals Proteins, Genes, Replicons Sequences, Sequence Features Organisms, Taxonomic relationships Computations (sequence matches) Citations, Controlled vocabularies Links to external databases Gene expression datasets Protein-protein interactions datasets Flow cytometry datasets
Each type of warehouse object implemented through one or more relational tables (currently ~150)
![Page 42: 1 SRI International Bioinformatics The BioCyc Ontologies Markus Krummenacker Bioinformatics Research Group SRI International kr@ai.sri.com BioCyc.org EcoCyc.org,](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649e185503460f94b03a75/html5/thumbnails/42.jpg)
42 SRI International Bioinformatics
Warehouse Schema
Manages multiple datasets simultaneously Dataset = Single version of a database
Version comparison
Multiple software tools or experiments that require access to different versions
Each dataset is a warehouse entity
Every warehouse object is registered in a dataset
![Page 43: 1 SRI International Bioinformatics The BioCyc Ontologies Markus Krummenacker Bioinformatics Research Group SRI International kr@ai.sri.com BioCyc.org EcoCyc.org,](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649e185503460f94b03a75/html5/thumbnails/43.jpg)
43 SRI International Bioinformatics
Warehouse Schema
Different databases storing the same biological datatypes are coerced into same warehouse tables
Design of most datatypes inspired by multiple databases
Representational tricks to decrease schema bloat Single space of primary keys Single set of satellite tables such as for synonyms,
citations, comments, etc.
![Page 44: 1 SRI International Bioinformatics The BioCyc Ontologies Markus Krummenacker Bioinformatics Research Group SRI International kr@ai.sri.com BioCyc.org EcoCyc.org,](https://reader036.fdocuments.in/reader036/viewer/2022062409/56649e185503460f94b03a75/html5/thumbnails/44.jpg)
44 SRI International Bioinformatics
Acknowledgements
SRI Suzanne Paley, Ron Caspi,
Ingrid Keseler, Carol Fulcher, Markus Krummenacker, Alex Shearer, Tomer Altman, Joe Dale, Fred Gilham, Pallavi Kaipa
EcoCyc Collaborators Julio Collado-Vides, Robert
Gunsalus, Ian Paulsen
MetaCyc Collaborators Sue Rhee, Peifen Zhang, Kate
Dreher Lukas Mueller, Anuradha Pujar
Funding sources: NIH National Center for
Research Resources NIH National Institute of
General Medical Sciences NIH National Human
Genome Research Institute
BioCyc.org
Learn more from BioCyc webinars: biocyc.org/webinar.shtml