Bio onttalk 30minutes-june2003[1]
-
Upload
joanne-luciano -
Category
Documents
-
view
326 -
download
0
description
Transcript of Bio onttalk 30minutes-june2003[1]
![Page 1: Bio onttalk 30minutes-june2003[1]](https://reader034.fdocuments.in/reader034/viewer/2022051611/54b3e2d14a7959bf068b458e/html5/thumbnails/1.jpg)
BioPAXA Data Exchange Format for
Biological Pathways
BioPAX Workgroup
www.biopax.org
![Page 2: Bio onttalk 30minutes-june2003[1]](https://reader034.fdocuments.in/reader034/viewer/2022051611/54b3e2d14a7959bf068b458e/html5/thumbnails/2.jpg)
Introduction
• BioPAX = Biopathways Exchange Language
• A data exchange format intended to facilitate sharing of pathway data
• BioPAX will provide a consistent format for pathway data so it will be easier for consumers of pathway data (e.g. tool developers, DB curators) to integrate data from multiple sources.
![Page 3: Bio onttalk 30minutes-june2003[1]](https://reader034.fdocuments.in/reader034/viewer/2022051611/54b3e2d14a7959bf068b458e/html5/thumbnails/3.jpg)
Exchange Formats in the Pathway Data Space
BioPAX
Molecular InteractionsPro:Pro All:All
PSI
Biochemical Reactions
SBML,CellML
Regulatory PathwaysLow Detail High Detail
GeneticInteractions
Interaction NetworksMolecular Non-molecularPro:Pro TF:Gene Genetic
Metabolic PathwaysLow Detail High Detail
Database ExchangeFormats
Simulation ModelExchange Formats
SmallMolecules (CML)
RateFormulas
![Page 4: Bio onttalk 30minutes-june2003[1]](https://reader034.fdocuments.in/reader034/viewer/2022051611/54b3e2d14a7959bf068b458e/html5/thumbnails/4.jpg)
High Throughput Experimental Methods
Expression, Interaction Data, Function, Protein modifications
PubMed
Existing Literature
Microarray Two-HybridMass
Spectrometry Genetics
Multiple Pathway Databases
Integration Nightmare!
![Page 5: Bio onttalk 30minutes-june2003[1]](https://reader034.fdocuments.in/reader034/viewer/2022051611/54b3e2d14a7959bf068b458e/html5/thumbnails/5.jpg)
Goals
• Accommodate representations used in existing databases such as BioCyc, BIND, WIT, aMAZE, KEGG, etc.
• Include support for these pathway types:– Metabolic pathways– Signaling pathways– Protein-protein interactions– Genetic regulatory pathways
![Page 6: Bio onttalk 30minutes-june2003[1]](https://reader034.fdocuments.in/reader034/viewer/2022051611/54b3e2d14a7959bf068b458e/html5/thumbnails/6.jpg)
Goals
• Extensible: Specific classes of data in BioPAX have been marked as extensible to allow addition of new types of data in the future
• Encapsulation: An entire pathway can be encapsulated in a single BioPAX record
• Compatible: BioPAX will try to use existing standards for encoding biological pathway related information wherever possible
• Flexible: Different preferred representations of pathway data can be described using BioPAX
![Page 7: Bio onttalk 30minutes-june2003[1]](https://reader034.fdocuments.in/reader034/viewer/2022051611/54b3e2d14a7959bf068b458e/html5/thumbnails/7.jpg)
Ontology Syntax
• Ontology and data exchange format (DEF) are not identical
• Ontology is implemented as the DEF
• Multiple implementations are possible…– Multiple syntax languages to choose from– Multiple ways to organize the data within each
syntax
![Page 8: Bio onttalk 30minutes-june2003[1]](https://reader034.fdocuments.in/reader034/viewer/2022051611/54b3e2d14a7959bf068b458e/html5/thumbnails/8.jpg)
Syntax Languages
• Currently translating BioPAX ontology into:– An XML Schema
• Widely used syntax language– An OWL Ontology
• More powerful data representation abilities• Community appears to be moving toward OWL (e.g. GO)
• Both are XML-based• Both versions will be compatible with and fully
translatable to each other• Rationale for dual syntaxes: BioPAX must be
widely accepted to be useful, dual syntaxes will facilitate this
![Page 9: Bio onttalk 30minutes-june2003[1]](https://reader034.fdocuments.in/reader034/viewer/2022051611/54b3e2d14a7959bf068b458e/html5/thumbnails/9.jpg)
Data stream organizationSimple data packets for
each object
1. No nesting
2. Fully normalized (no repeat data)
3. Many internal pointers (i.e. “extra” data) needed
4. Objects (i.e. instances) are not self-contained
Fully defined objects at every occurrence
1. Highly nested
2. Not normalized (much repetition of data)
3. No internal pointers needed
4. Self-contained objects, even those of complex classes like “interaction” and “pathway”
• Ultimate structure of BioPAX record will likely lie between these two extremes
![Page 10: Bio onttalk 30minutes-june2003[1]](https://reader034.fdocuments.in/reader034/viewer/2022051611/54b3e2d14a7959bf068b458e/html5/thumbnails/10.jpg)
BioPAX Ontology : Root
• Root class: Entity– Any concept that we will refer to as a discrete unit when
describing the biology of pathways.– Does not include metadata
• E.g. “DB source”, “PubMed ID”, “Experimental technique”, etc.
![Page 11: Bio onttalk 30minutes-june2003[1]](https://reader034.fdocuments.in/reader034/viewer/2022051611/54b3e2d14a7959bf068b458e/html5/thumbnails/11.jpg)
BioPAX Ontology : Root• Entity Subclass: Part
– A building block of simple interactions– E.g. Small molecules, Proteins, DNA, RNA
• Entity Subclass: Interaction– A set of entities and some relationship between them– E.g. Reactions, Molecular Associations, Catalyses
• Entity Subclass: Pathway– A set of interactions– E.g. Glycolysis, MAPK, Apoptosis
IS A HAS A
![Page 12: Bio onttalk 30minutes-june2003[1]](https://reader034.fdocuments.in/reader034/viewer/2022051611/54b3e2d14a7959bf068b458e/html5/thumbnails/12.jpg)
BioPAX Ontology: Parts• Cell
– A specific type of cell (e.g. cardiac myocyte, B lymphocyte).
• Cell Component– Part of a cell (e.g. nucleus, mitochondrion). The Gene
Ontology contains a large list in the ‘cellular component’ ontology.
• DNA– Deoxyribonucleic acid (e.g. the EGFR DNA sequence;
see GenBank for more examples).• Environment
– A physical or environmental effect (e.g. calcium wave, electric shock, heat, mechanical stress).
• Photon– Light at some intensity and wavelength (e.g. UV light).
• Protein– A protein (e.g. the EGFR protein sequence; see Swiss-
Prot for more examples).• RNA
– Ribonucleic acid (e.g. messengerRNA, microRNA, ribosomalRNA)
• Small Molecule– A non-polymeric biomolecule. Generally, any bioactive
molecule that is not a peptide, protein, DNA, RNA or possibly not a complex carbohydrate (e.g. glucose, penicillin)
![Page 13: Bio onttalk 30minutes-june2003[1]](https://reader034.fdocuments.in/reader034/viewer/2022051611/54b3e2d14a7959bf068b458e/html5/thumbnails/13.jpg)
BioPAX Ontology: Interactions• Control
– The control of a process (e.g. enzyme catalysis controls a biochemical reaction, gene regulation controls gene expression).
• Conversion– A conversion process, which converts one set of entities to
another set (e.g. a biochemical reaction converts substrates to products, the process of complex assembly converts single molecules to a complex, transport converts entities in one compartment to the same entities in another compartment).
• Molecular Association– An association between a set of molecules (e.g. Arp2-Arp3
protein-protein interaction; protein complex e.g. the result of a co-immunoprecipitation experiment; hexokinase-glucose).
• Co-occurrence– The co-occurrence of entities in some context. That context
could be time, space, a sentence, sequence similarity space, etc. (e.g. Colocalization of a few receptors e.g. in a GPI anchored lipid raft; co-migration of cells; genes expressed at the same time).
• Equivalence Class– A set of entities that can be considered equivalent in some
context (e.g. a set of paralogs that can replace each other as enzymes in a biochemical reaction, a set of enzymes that may not be homologs, but are functionally identical e.g. glucose-6-phosphatase).
• Genetic– A genetic interaction (e.g. a synthetic lethal interaction). An
interaction between elements of a genotype that results in a change in phenotype.
![Page 14: Bio onttalk 30minutes-june2003[1]](https://reader034.fdocuments.in/reader034/viewer/2022051611/54b3e2d14a7959bf068b458e/html5/thumbnails/14.jpg)
BioPAX Ontology
• Current structure of class hierarchy:
• Will be implemented in:– XML Schema
• Widely used
– OWL• Powerful data
representation
![Page 15: Bio onttalk 30minutes-june2003[1]](https://reader034.fdocuments.in/reader034/viewer/2022051611/54b3e2d14a7959bf068b458e/html5/thumbnails/15.jpg)
Use Cases
A scientist studies particular Pathway
• Toxicology study: given a pathway, are new compound/analogs connected?– Requires: compounds/analogs, cross
database search
• RNAi, KO: know genes, construct network, identify functional disruptor genes
![Page 16: Bio onttalk 30minutes-june2003[1]](https://reader034.fdocuments.in/reader034/viewer/2022051611/54b3e2d14a7959bf068b458e/html5/thumbnails/16.jpg)
Representing Metabolic Data in BioPAX
Reaction
ID 1
NameGlucose-6-p to fructose-6-p
Substrate<cml>glucose-6-phosphate</cml>
Product<cml>fructose-6-phosphate</cml>
Delta G 0.4 kcal/mole
EC 5.3.1.9
EcoCyc: Reaction BioPAX Class: Reaction
![Page 17: Bio onttalk 30minutes-june2003[1]](https://reader034.fdocuments.in/reader034/viewer/2022051611/54b3e2d14a7959bf068b458e/html5/thumbnails/17.jpg)
Representing Metabolic Data in BioPAX (cont 1)
Catalysis
ID 2
NameCatalysis of glucose-6-p to fructose-6-p
Enzymeglucose-6-phosphate isomerase
Reaction BioPAX ID=1
Inhibitors Low pH
EcoCyc: Enzyme Catalysis BioPAX Class: Catalysis
![Page 18: Bio onttalk 30minutes-june2003[1]](https://reader034.fdocuments.in/reader034/viewer/2022051611/54b3e2d14a7959bf068b458e/html5/thumbnails/18.jpg)
Representing Metabolic Data in BioPAX (cont 2)
Pathway
ID 10
Name Glycolysis
Interactions
1. BioPAX ID=2
2. BioPAX ID=4
3. BioPAX ID=6
etc.
EcoCyc: Pathway BioPAX Class: Pathway
![Page 19: Bio onttalk 30minutes-june2003[1]](https://reader034.fdocuments.in/reader034/viewer/2022051611/54b3e2d14a7959bf068b458e/html5/thumbnails/19.jpg)
Converting PSI Data into BioPAX
Molecular Association
ID 1
Name hGHR binds to hGH
Participants hGRH; hGH
DB Source
PDB:3HHR
Reference PMID = 1549776
Experiment Description
X-ray Crystallography
PSI XML BioPAX Class: Molecular Association
![Page 20: Bio onttalk 30minutes-june2003[1]](https://reader034.fdocuments.in/reader034/viewer/2022051611/54b3e2d14a7959bf068b458e/html5/thumbnails/20.jpg)
Signal Transduction DBs
• CSNDB– Stores signal events as
interactions between two proteins.• E.g. “Grb2 -> Sos”
– Generates pathways automatically by:
• Displaying downstream interactions within a specific distance from a starting point
• Or, finding the shortest path between two proteins
• TRANSPATH – no longer publicly available; based on CSNDB
CSNDB Pathway
![Page 21: Bio onttalk 30minutes-june2003[1]](https://reader034.fdocuments.in/reader034/viewer/2022051611/54b3e2d14a7959bf068b458e/html5/thumbnails/21.jpg)
BioPAX Subgroups
• Created for multiple purposes:– Tackling specific conceptual problems– Developing spin-off projects
• Small Molecule Database• Database of Pathway Resources
– Gathering specific resources for core group
• Typically consist of:– Core group members (1-3)– Experts from external community (1-2)
![Page 22: Bio onttalk 30minutes-june2003[1]](https://reader034.fdocuments.in/reader034/viewer/2022051611/54b3e2d14a7959bf068b458e/html5/thumbnails/22.jpg)
BioPAX Subgroups: Small Molecule
• Evaluated CML 2.0 as means for exchanging small molecules
– No comprehensive small molecule DB exists• Need to transfer entire small molecule structure,
not just DB x-ref
– Proof of concept:1. EcoCyc small molecules CML 2.0 file
2. CML 2.0 file Shah lab visualization program• No loss of information
![Page 23: Bio onttalk 30minutes-june2003[1]](https://reader034.fdocuments.in/reader034/viewer/2022051611/54b3e2d14a7959bf068b458e/html5/thumbnails/23.jpg)
BioPAX Subgroups: States
• Determining best mechanism to represent biological states
– E.g. post-translational modification states of proteins, cell-cycle states
![Page 24: Bio onttalk 30minutes-june2003[1]](https://reader034.fdocuments.in/reader034/viewer/2022051611/54b3e2d14a7959bf068b458e/html5/thumbnails/24.jpg)
BioPAX Subgroups: Examples
• Gathering sample data from various sources to illustrate use cases, promote practical development of BioPAX
![Page 25: Bio onttalk 30minutes-june2003[1]](https://reader034.fdocuments.in/reader034/viewer/2022051611/54b3e2d14a7959bf068b458e/html5/thumbnails/25.jpg)
Current Status
• Holding biweekly conference calls, bimonthly meetings
• Finishing Level 1 Ontology– Finishing slot definitions on Level 1 main-tree
classes– Finishing class structure of side-trees
• States, provenance, evidence, timing
• Working feverishly on presentation materials for ISMB 2003
![Page 26: Bio onttalk 30minutes-june2003[1]](https://reader034.fdocuments.in/reader034/viewer/2022051611/54b3e2d14a7959bf068b458e/html5/thumbnails/26.jpg)
Next Steps
• Finish level 1 ontology in GKB• Implement ontology
– In OWL (easy)– In XML Schema (slightly less easy)
• Translate data from a few major DBs into BioPAX Level 1– Make revisions if necessary
• Release Level 1– By end of summer 2003 (hopefully)
![Page 27: Bio onttalk 30minutes-june2003[1]](https://reader034.fdocuments.in/reader034/viewer/2022051611/54b3e2d14a7959bf068b458e/html5/thumbnails/27.jpg)
How to Contribute
• Participate in email list discussions to make your views heard– sign up via web site: http://www.biopax.org
• Join a subgroup (if space)
• Make your data available in BioPAX format, when complete
![Page 28: Bio onttalk 30minutes-june2003[1]](https://reader034.fdocuments.in/reader034/viewer/2022051611/54b3e2d14a7959bf068b458e/html5/thumbnails/28.jpg)
BioPAX Supporting GroupsGroups• Memorial Sloan-Kettering Cancer Center: C.
Sander, J. Luciano, M. Cary, G. Bader• University of Colorado Health Sciences
Center: I. Shah• SRI Bioinformatics Research Group: P.
Karp, S. Paley, J. Pick• BioPathways Consortium: J. Luciano (
www.biopathways.org)• Argonne National Laboratory: N. Maltsev• Samuel Lunenfeld Research Insitute: C.
Hogue• Harvard Medical School: Aviv Regev• Biopathways Consortium: Eric Neumann,
Vincent Schachter
Collaborating Organizations:• Proteomics Standards Initiative
(psidev.sf.net)• Chemical Markup Language (
www.xml-cml.org)• SBML (www.sbw-sbml.org)• CellML (www.cellML.org)
Databases
• BioCyc (www.biocyc.org)• BIND (www.bind.ca)• WIT (wit.mcs.anl.gov/WIT2)
Grants• Department of Energy