Pathway/Genome Databases: Concepts and Software...

62
Pathway/Genome Databases: Concepts and Software Tools Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International [email protected] http://www.ai.sri.com/pkarp/ http://BioCyc.org/

Transcript of Pathway/Genome Databases: Concepts and Software...

Page 1: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter

Pathway/GenomeDatabases:

Concepts and SoftwareTools

Peter D. Karp, Ph.D.Bioinformatics Research Group

SRI International

[email protected]

http://www.ai.sri.com/pkarp/

http://BioCyc.org/

Page 2: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter

SRI InternationalBioinformaticsOverview

l Pathway/genome databases

l Pathway Tools software

l EcoCyc and MetaCyc

lCharacterization of the E. coli metabolic network

Page 3: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter

SRI InternationalBioinformaticsWhat to do When Theories Become

Larger than Minds can Grasp?

l Example: E. coli genetic network

l Control by 97 transcription factors of 1174 genes in 630transcription units

l Example: E. coli metabolic network

l 160 pathways involving 744 reactions and 791 substrates

l Partition theories across multiple minds

lRely on the printed word

Page 4: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter

SRI InternationalBioinformaticsLimitations

lCannot effectively

l Evaluate them for internal consistency

l Evaluate them for consistency with new data: microarrays

l Refine them with respect to new data

l Integrate across them to produce system understanding

l They are too large and complex

l The printed word cannot be manipulatedeffectively

Page 5: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter

SRI InternationalBioinformaticsSolution:

Biological Knowledge Bases

l Store biological knowledge and theories in computers in adeclarative form

l Amenable to computational analysis and generative user interfaces

l Accepted to store data in computers, but not knowledge

l Refined, interpreted, consensus views

l Establish ongoing efforts to curate (maintain, refine,embellish) these knowledge bases

l Such knowledge bases are an integral part of the scientificenterprise

Page 6: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter

SRI InternationalBioinformaticsOrganism-Specific

Pathway/Genome Databases

l Layer functional information above the genome

lRich ontology to encode biological informationwith high fidelity

l Chromosomes, genes, operons, gene products, reactions,pathways

lCurated by experts for that organism

l Integrate literature and computational predictions

Page 7: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter

SRI InternationalBioinformaticsPathway/Genome Database

Chromosomes,Plasmids

Genes

Proteins

Reactions

Pathways

Compounds

CELL

Operons,Promoters,DNA Binding Sites

Page 8: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter

SRI InternationalBioinformaticsPathway Tools Software

l PathoLogicl Prediction of metabolic network from genomel Computational creation of new Pathway/Genome Databases

l Pathway/Genome Editorsl Distributed curation of genome annotationsl Distributed object database systeml Interactive editing tools

l Pathway/Genome Navigatorl WWW publishing of PGDBsl Graphic depictions of pathways, chromosomes, operonsl Analysis operations

u Pathway visualization of gene-expression datau Global comparisons of metabolic networks

Page 9: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter

SRI InternationalBioinformaticsSequence Project Workflow

Raw Sequence

Phred

Phrap

CONSED

BLAST, BLOCKS

GeneMark/Glimmer

PathoLogic

P/G Navigator

P/G Editors

WWW Publishing Analyses

Page 10: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter

SRI InternationalBioinformaticsBioCyc Collection of

Pathway/Genome DBs

Literature-based Datasets:

lMetaCyc

lEscherichia coli (EcoCyc)

Computationally DerivedDatasets:

lAgrobacterium tumefaciens

lCaulobacter crescentus

lChlamydia trachomatis

lBacillus subtilis

lHelicobacter pylori

lHaemophilus influenzae

lMycobacterium tuberculosis

lMycoplasma pneumonia

lPseudomonas aeruginosa

lSaccharomyces cerevisiae

lTreponema pallidum

http://BioCyc.org/

Page 11: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter

SRI InternationalBioinformaticsEcoCyc Project Overview

l E. coli Encyclopedia

l Model-Organism Database for E. colil Tracks the evolving annotation of the E. coli genome

l Over 3500 literature citationsl Collaborative development via Internet

l Karp (SRI) -- Bioinformatics architect

l Riley (MBL) -- Metabolic pathways, signal transduction

l Saier (UCSD) and Paulsen (TIGR)-- Transport

l Collado (UNAM)-- Regulation of gene expression

l Ontology: 1000 biological classes

l Database content: 16,000 instances

Page 12: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter

SRI InternationalBioinformatics

EcoCyc = E.coli Dataset + Pathway/Genome Navigator

Genes: 4,393

Proteins: 4,273

Reactions: 2,760

Pathways: 165

Compounds: 774

http://BioCyc.org/

Transcription Units: 684 Factors: 108

Enzymes: 914Transporters: 162

Promoters: 781TransFac Sites: 910

Citations: 3,508

Page 13: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter

SRI InternationalBioinformaticsEcoCyc Pathways

lBiosynthesis of amino acids, purines,pyrimidines, fatty acids, cofactors (heme, biotin,folic acid, etc)

lCatabolism of fatty acids, D-glucuronate,L-alanine, L-arabinose, fucose, galactonate,galactose, glucose, mannose, ribose, xylose

l Entner-Doudoroff pathway, TCA cycle,fermentation, gluconeogenesis, glycerolmetabolism, glycolysis, glyoxylate cycle, pentosephosphate pathway

Page 14: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter

SRI InternationalBioinformaticsMotivations for Understanding

Schema

l Pathway Tools visualizations and analysesdepend upon the software being able to findprecise information in precise places within aPathway/Genome DB

lWhen writing Lisp complex queries to PGDBs,those queries must name classes and slots withinthe schema

lA Pathway/Genome Database is a web ofinterconnected objects; each object represents abiological entity

Page 15: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter

SRI InternationalBioinformaticsWeb of Relationships for One Enzyme

Sdh-flavo Sdh-Fe-S Sdh-membrane-1 Sdh-membrane-2

sdhA sdhB sdhC sdhD

Succinate + FAD = fumarate + FADH2

Enzymatic-reaction

Succinate dehydrogenase

TCA Cycle

Page 16: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter

SRI InternationalBioinformaticsFrames

l Entities with which facts are associated

l Kinds of frames:

l Classes: Genes, Pathways, Biosynthetic Pathways

l Instances (objects): trpA, TCA cycle

l Classes:

l Superclass(es)

l Subclass(es)

l Instance(s)

l A symbolic frame name (id, key) uniquely identifies eachframe

Page 17: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter

SRI InternationalBioinformaticsSlots

l Encode attributes/properties of a frame

l Integer, real number, string

lRepresent relationships between frames

l The value of a slot is the identifier of another frame

l Every slot is described by a “slot frame” in a KBthat defines meta information about that slot

Page 18: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter

SRI InternationalBioinformaticsSlot Links

Sdh-flavo Sdh-Fe-S Sdh-membrane-1 Sdh-membrane-2

sdhA sdhB sdhC sdhD

Succinate + FAD = fumarate + FADH2

Enzymatic-reaction

Succinate dehydrogenase

TCA Cycle

product

component-of

catalyzes

reaction

in-pathway

Page 19: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter

SRI InternationalBioinformaticsRepresentation of Function

Sdh-flavo Sdh-Fe-S Sdh-membrane-1 Sdh-membrane-2

sdhA sdhB sdhC sdhD

Succinate + FAD = fumarate + FADH2

Enzymatic-reaction

Succinate dehydrogenase

TCA Cycle

EC#Keq

CofactorsInhibitors

Molecular wtpI

Left-end-position

Page 20: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter

SRI InternationalBioinformaticsMonofunctional Monomer

Gene

Reaction

Enzymatic-reaction

Monomer

Pathway

Page 21: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter

SRI InternationalBioinformaticsBifunctional Monomer

Gene

Reaction

Enzymatic-reaction

Monomer

Pathway

Reaction

Enzymatic-reaction

Page 22: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter

SRI InternationalBioinformaticsMonofunctional Multimer

Monomer Monomer Monomer Monomer

Gene Gene Gene Gene

Reaction

Enzymatic-reaction

Multimer

Pathway

Page 23: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter

SRI InternationalBioinformaticsPathway and Substrates

Reactant-1

Reaction

Pathway

ReactionReactionReaction

Reactant-2

Product-2

Product-1

in-pathwayleft

right

Page 24: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter

SRI InternationalBioinformaticsTranscriptional Regulation

site001

pro001

trpE

trpD

trpC

trpB

trpA

trpL

Int003 RpoSig70

TrpR*trpInt001

trpLEDCBA

trp

apoTrpRInt005

Page 25: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter

SRI InternationalBioinformaticsPrinciple Classes

l Class names are capitalized, plural

l Genetic-Elements, with subclasses:l Chromosomesl Plasmids

l Genes

l Transcription-Units

l RNAs

l Proteins, with subclasses:l Polypeptidesl Protein Complexes

Page 26: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter

SRI InternationalBioinformaticsPrinciple Classes

lReactions, with subclasses:

l Transport-Reactions

l Enzymatic-Reactions

l Pathways

lCompounds-And-Elements

Page 27: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter

SRI InternationalBioinformaticsSlots in Multiple Classes

lCommon-Name

l Synonyms

lNames (computed as union of Common-Name,Synonyms)

lComment

lCitations

lDB-Links

Page 28: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter

SRI InternationalBioinformaticsGenes Slots

lChromosome

l Left-End-Position

lRight-End-Position

lCentisome-Position

l Transcription-Direction

l Product

Page 29: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter

SRI InternationalBioinformaticsProteins Slots

lMolecular-Weight-Seq

lMolecular-Weight-Exp

l pI

l Locations

lModified-Form

lUnmodified-Form

lComponent-Of

Page 30: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter

SRI InternationalBioinformaticsPolypeptides Slots

lGene

Page 31: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter

SRI InternationalBioinformaticsProtein-Complexes Slots

lComponents

Page 32: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter

SRI InternationalBioinformaticsReactions Slots

l EC-Number

l Left, Right

l Substrates (computed as union of Left, Right)

lDeltaG0

lKeq

l Spontaneous?

l Species

Page 33: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter

SRI InternationalBioinformaticsEnzymatic-Reactions Slots

l Enzyme

lReaction

lActivators

l Inhibitors

l Physiologically-Relevant

lCofactors

l Prosthetic-Groups

lAlternative-Substrates

lAlternative-Cofactors

Page 34: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter

SRI InternationalBioinformaticsPathways Slots

lReaction-List

l Predecessors

l Primaries

Page 35: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter

SRI InternationalBioinformatics

MetaCyc Overview

lMeta Metabolic Encyclopedia

l 445 pathways, 1115 enzymes, 4218 reactions

l 173 E. coli pathways; 158 organisms

l 2381 citations

l Literature-based DB with extensive referencesand commentary

l Pathways, reactions, enzymes, substrates

Page 36: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter

SRI InternationalBioinformaticsMetaCyc Frequent Organisms

7M. pneumoniae

7P. putida

8S. cerevisiae

12M. capricolum

15Hp. influenzae

17Pseudomonas

18Soybean

18B. subtilis

20Sf. sulfataricus

31Ho. sapiens

35Sm. typhimurium

173E. coli

Page 37: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter

SRI InternationalBioinformaticsMetaCyc Data

lMetaCyc contains one DB object for each distinctpathway

l Distinct in terms of reaction steps

l Each pathway labeled with species it occurs in

lMetaCyc pathways are experimentally determined

l 4218 reactions in MetaCyc

l 401 lack EC numbers

Page 38: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter

SRI InternationalBioinformaticsMetaCyc Enzyme Data

lReaction(s) catalyzed

lAlternative substrates

lCofactors / prosthetic groups

lActivators and inhibitors

l Subunit structure

lMolecular weight, pI

lComment, literature citations

l Species

Page 39: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter

SRI InternationalBioinformaticsMetaCyc Super-Pathways

l Groups of pathways linked by common substrates

l Example: Super-pathway containing

l Chorismate biosynthesis

l Tryptophan biosynthesis

l Phenylalanine biosynthesis

l Tyrosine biosynthesis

l Super-pathways defined by listing their componentpathways

l Multiple levels of super-pathways can be defined

l Pathway layout algorithms accommodate super-pathways

Page 40: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter

SRI InternationalBioinformaticsComparison of MetaCyc to KEGG

lDatal KEGG has no literature citations, no commentsl KEGG has no detailed information about enzymes (inhibitors,

subunits)l KEGG pathways are composites of pathways found in many

organismsu Unclear what sub-pathways occur in what organisms

l Software toolsl KEGG has no algorithmic visualization toolsl KEGG has no queryable metabolic-map overview diagraml KEGG has no interactive editing tools

Page 41: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter

SRI InternationalBioinformaticsEcoCyc/MetaCyc Availability

lWWW EcoCyc-Plus freely availablel EcoCyc, MetaCycl Pathway/genome DBs for 12 other organisms

lhttp://BioCyc.org/

lOn-site EcoCyc-Plus freely available to non-profits

l Flatfilesl Binary executable: Hardware requirements

u Sun UltraSparc-170 w/ 64MB memoryu PC, 500MHz CPU, 64MB memory, Windows-98

Page 42: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter

SRI InternationalBioinformatics

EcoCyc and MetaCyc:Resources for Microbial GenomeAnalysis

l E. coli has large fraction of gene functionsidentified experimentally

lAssigning function by similarity to E. coli genesless likely to introduce annotation errors

l Predict metabolic pathways of other microbesusing MetaCyc

Page 43: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter

SRI InternationalBioinformaticsApplications of EcoCyc and MetaCyc

lReference sources on E. coli and metabolism

l Sequence/pathway analysis of microbial genomes

lAnalysis of gene-expression data

lComputer-aided education

lAnti-microbial drug discovery

l Pathway engineering

l Investigations of

l Comparative metabolism

l Global properties of E. coli metabolic network

Page 44: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter

SRI InternationalBioinformaticsPathway Tools Software

l PathoLogicl Prediction of metabolic network from genomel Computational creation of new Pathway/Genome Databases

l Pathway/Genome Editorsl Distributed curation of genome annotationsl Distributed object database systeml Interactive editing tools

l Pathway/Genome Navigatorl WWW publishing of PGDBsl Graphic depictions of pathways, chromosomes, operonsl Analysis operations

u Pathway visualization of gene-expression datau Global comparisons of metabolic networks

Page 45: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter

SRI InternationalBioinformaticsImplementation Details

lAllegro Common Lisp

l Sun and PC platforms

lOcelot object database

l Lisp-based WWW server at BioCyc.org

l CWEST-based

l Manages 14 organism DBs

Page 46: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter

SRI InternationalBioinformaticsPathway Tools Architecture

Object DBMS

GFP API

PathwayGenome Navigator

WWWServer

X-Windows Graphics

Object EditorPathway EditorReaction Editor

Oracle

Page 47: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter

SRI InternationalBioinformaticsOcelot Knowledge Server

Architecture

l Frame data modell Classes, instances, inheritancel Classes and instances both treated as data

l Persistent storage via disk files, Oracle DBMSl Concurrent development: Oraclel Single-user development: disk filesl Read-only delivery: bundle data into binary program

l Transaction logging facilitylOptimistic concurrency-control protocoll Schema evolutionl Local disk cache to improve Internet performance

Page 48: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter

SRI InternationalBioinformaticsEcoCyc WWW Server

Page 49: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter

SRI InternationalBioinformaticsVisualization and Editing Tools

l Full Metabolic Map

l Pathways

lReactions

lCompounds

l Enzymes, Transporters, Transcription Factors

lGenes

lChromosomes

lOperons

Page 50: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter

SRI InternationalBioinformatics

Inference of Metabolic Pathways

GenomicMap

Genes

Gene Products

Reactions

Pathways

Compounds

Pathway/Genome Database

PathoLogicList of Genes/ORFs

List of Gene Products

ANNOTATED GENOMEStructured ASCII Text File

DNA Sequence

MetaCyc

Page 51: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter
Page 52: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter
Page 53: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter
Page 54: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter

SRI InternationalBioinformaticsPathoLogic Analysis Phases

l Trial parsing of input data files

lAutomated build of initial PGDB

l Initialize schema of new PGDB

l Create DB objects for chromosomes, genes, proteins

l Predict reactions and pathways present

lDefine protein complexes

lDefine metabolic overview diagram

Page 55: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter

SRI InternationalBioinformaticsPathoLogic Pathway Prediction

l Create associations between enzymes and metabolicreactions

l Reactions and substrates imported from MetaCycl Automatically via EC numbersl Automatically via enzyme name matchingl Manuallyl CC0092 / galE / “UDP-glucose-4-epimerase” / EC 5.1.3.2l UDP-D-glucose à UDP-galactose

l Import from MetaCyc all pathways associated with inferredreactions

l UDP-D-glucose à UDP-galactose is a reaction of:l galactose metabolism, UDP-glucose conversion,l lactose degradation 4, colanic acid building blocks biosynthesis

l Prune out pathways with insufficient evidence

Page 56: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter

SRI InternationalBioinformaticsPathoLogic Prunes Pathways With

Insufficient Evidence

lNo unique enzyme AND EITHER

l 1 reaction present for pathway greater than 2steps

l Set of reactions present is a subset of reactionspresent in another pathway

l There exists a variant pathway with moreevidence

Page 57: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter

SRI InternationalBioinformaticsPathoLogic: Inference of

Pathway Complement

l Extends the paradigm of genome analysis

l Predicted genes placed in their biochemicalcontext

l Information reduction device

l Assess coherence of the set of genes in a genome

l Identifies pathway holes and singleton enzymes

l Provides a framework for analysis of functional-genomicsdata

Page 58: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter
Page 59: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter
Page 60: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter

SRI InternationalBioinformaticsPathway Comparisons

Eco Mtb Bsu Hin Sce Hpy

Eco 130 103 92 90 84 73

Mtb 103 84 79 82 70

Bsu 96 77 72 65

Hin 90 67 61

Sce 84 64

Hpy 74

Mp

Page 61: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter

SRI InternationalBioinformaticsSummary

l Pathway/Genome Databases

l 14 PGDBs available through SRI at BioCyc.org

l Computational theories of biochemical machinery

l Pathway Tools software

l Extract pathways from genomes

l Distributed curation tools

l Query, visualization, WWW publishing

l Analysis algorithms

Page 62: Pathway/Genome Databases: Concepts and Software Toolsisoft.postech.ac.kr/Course/NLP_for_Bioinformatics/LectureNotes... · Pathway/Genome Databases: Concepts and Software Tools Peter

SRI InternationalBioinformaticsAcknowledgements

l SRI: Suzanne Paley, Pedro Romero, John Pick

l EcoCyc Project: Milton Saier, Julio Collado, Ian Paulsen,Monica Riley

l Stanford: Harley McAdams, Lucy Shapiro, Gary Schoolnik,Russ Altman

l Funding sources:l NIH National Center for Research Resourcesl Department of Energy Microbial Cell Projectl DARPA BioSpice, UPC

[email protected]://BioCyc.org/