Getting Started in Biological Pathway Construction and Analysis

download Getting Started in Biological Pathway Construction and Analysis

of 5

Transcript of Getting Started in Biological Pathway Construction and Analysis

  • 7/30/2019 Getting Started in Biological Pathway Construction and Analysis

    1/5

    Message from ISCB

    Getting Started in Biological Pathway

    Construction and AnalysisGanesh A. Viswanathan, Jeremy Seto, Sonali Patil, German Nudelman, Stuart C. Sealfon

    *

    Introduction

    Life depends on the capacity ofindividual cells to respond effectivelyto cues about their changing internaland external environments. Cellulardecision making and responses areorchestrated by complex molecularnetworks consisting of entities such asproteins or RNAs connected byinteractions such as activation orsynthesis. Information contained inprimary databases and in the

    experimental literature relevant tothese networks is so extensive andrapidly growing that it is increasinglydifficult to integrate. As an aid totheoretical and experimental research,it is convenient to distill the inferencescontained in the experimentalliterature and databases intoknowledgebases that consist ofannotated representations of biologicalpathways.

    Pathway building has beenperformed by individual groups

    studying a network of interest (e.g.,Kitanos group who assembled animmune signaling pathway [1]) as wellas by large bioinformatics consortia(e.g., the Reactome Project [2]) andcommercial entities (e.g., IngenuitySystems). Pathway building is theprocess of identifying and integratingthe entities, interactions, andassociated annotations, and populatingthe knowledgebase. Pathwayconstruction can have either a data-driven objective (DDO) or aknowledge-driven objective (KDO).Data-driven pathway construction isused to generate relationshipinformation of genes or proteinsidentified in a specific experiment suchas a microarray study. Knowledge-

    driven pathway construction entailsdevelopment of a detailed pathwayknowledgebase for particular domainsof interest, such as a cell type, disease,or system. To help researchers get theirbearings in this field, in the subsequentsections we provide a brief, practicalorientation to existing knowledgebasesand to the methods of pathwayconstruction and analysis.

    Biological Pathway ConstructionWorkflow

    The curation process of a biologicalpathway entails identifying andstructuring content, mininginformation manually and/orcomputationally, and assembling aknowledgebase using appropriatesoftware tools. A schematic illustratingthe major steps involved in the data-driven and knowledge-drivenconstruction processes is shown inFigure 1. For either DDO or KDOpathway construction, the first step is

    to mine pertinent information fromrelevant information sources (discussedin Public and Private InformationSources) about the entities andinteractions. The information retrievedis assembled using appropriateformats, information standards, andpathway building tools (discussed inFormats, Standards, and PathwayBuilding Tools) to obtain a pathwayprototype. The pathway is furtherrefined to include context-specificannotations such as species, cell/tissuetype, or disease type. The pathway can

    then be verified by the domain expertsand updated by the curators based onappropriate feedback. In the sectionIllustration of the Pathway BuildingProcess, we describe an example of theKDO approach for building a pathway.

    Public and Private InformationSources

    The extension of reductive biologybegun with Aristotles Parts of Animalsto the molecular realm has defined

    large numbers of entities andinteractions in various cells andorganisms. Recent attempts to improveknowledge integration have led torefined classifications of cellularentities, such as Gene Ontology (GO),and to the assembly of structuredknowledge repositories. Datarepositories, which containinformation regarding sequence data,metabolism, signaling, reactions, andinteractions are a major source ofinformation for pathway building. Afew useful databases are described inTable 1. A comprehensive list ofresources can be found at http://www.pathguide.org.

    Formats, Standards, andPathway Building Tools

    Various standard, computerreadable, object-oriented formats havebeen developed to facilitate theorganization, storage, exchange, andparsing of pathway knowledgebases

    and the relevant experimental evidenceinformation. Important pathway andpathway-related formats, which are allXML-based, include Systems BiologyMarkup Language (SBML), ProteomicsStandards InitiativeMolecularInteractions (PSI-MI), and Biological

    Editor: Olga Troyanskaya, Princeton University,United States of America

    Citation: Viswanathan GA, Seto J, Patil S, NudelmanG, Sealfon SC (2008) Getting started in biologicalpathway construction and analysis. PLoS Comput

    Biol 4(2): e16. doi:10.1371/journal.pcbi.0040016Copyright: 2008 Viswanathan et al. This is anopen-access article distributed under the terms ofthe Creative Commons Attribution License, whichpermits unrestricted use, distribution, andreproduction in any medium, provided the originalauthor and source are credited.

    Ganesh A. Viswanathan, Jeremy Seto, Sonali Patil,German Nudelman, and Stuart C. Sealfon are with theCenter for Translational Systems Biology andDepartment of Neurology, Mount Sinai School ofMedicine, New York, New York, United States ofAmerica.

    * To whom correspondence should be addressed. E-mail: [email protected]

    PLoS Computational Biology | www.ploscompbiol.org February 2008 | Volume 4 | Issue 2 | e160001

  • 7/30/2019 Getting Started in Biological Pathway Construction and Analysis

    2/5

    Pathways eXchange (BioPAX) [3].SBML, which is used mainly forrepresentation of pathways andmathematical models and supported bymore than 100 software systems, is

    currently the best-suited format formathematical modeling andsimulations. PSI-MI is designed forstructured representation ofexperimental evidence information,such as molecular interactions data.The richest format, BioPAX, integratesPSI-MI within a pathwayrepresentation format and providesgeneral representation mechanisms

    that permit storage of additionalinformation, such as mathematicalmodels. However, BioPAX is relativelynew, and its features are rapidlyevolving, making it a technicalchallenge to implement. Standardshave also been developed forrepresentation of different biologicalinformation such as the nomenclatureof entities and interactions (e.g.,HUGO, Human Genome

    Organization), and experimental data,

    (e.g., MIAME, Minimal InformationAssociated with MicroarrayExperiments). The ability to extractinformation automatically and to makeinferences is furthered by the use of thecontrolled vocabularies of establishedtaxonomies and ontologies [4]. GOclassifies genes to provide insight intotheir function and relationships and

    serves as a model for other biologicalontologies. A comprehensive review ofbiological information standards canbe found in [5].

    Pathway building tools are requiredto populate, visualize, and store apathway. Currently there are variouspathway building tools [3] that providethe ability to extract information aswell as to support multiple standardformats. Cytoscape, CellDesigner, and

    JDesigner are graphical environments

    for constructing pathways that canimport/export SBML models forsimulation. Cytoscape can also accesslarge databases containing protein andgene interactions with additionalsupport for PSI-MI and BioPAX

    formats. Pathway Analysis Tools forIntegration and Knowledgebase(PATIKA) provides a Web-basedinterface to public databases, such asReactome, HPRD, and IntAct throughsupporting both SBML and BioPAXformats. Its visualization and layouttools facilitate pathway analysis.Reactome displays reactions as pathwaydiagrams and provides online tools forauthoring, curation, and visualizationas well as export to SBML and BioPAXformats. Ingenuity pathway analysistool, a Web-based interface of theIngenuity Knowledgebase, available bypaid subscription, enables users toquery molecular interactions,biological functions, and diseases forgenerating customized pathways andanalysis.

    Illustration of the Pathway

    Building ProcessPathway curation can be either

    manual or automated. Manual curationprovides the most reliable informationextraction from the literature.However, the pace of new discovery canmake manually populated databasesdifficult to maintain. In the miningprocess, use of appropriate keywordsincreases the chances of identifying therelevant information. Automated textmining through Natural LanguageProcessing reduces the personnel

    required for recovery of information,but has severe limitations in accuracy.Information in the scientific literatureis highly specialized, semanticallyunpredictable, and often not textual.Agreeing on facts is difficult even forexpert curators. The presentgeneration of text mining tools isprobably most useful as an aid tomanual curation.

    The efficient mining of informationfrom the plethora of resourcedatabases hinges on the identification

    of the most useful primary literatureand databases for the biological area ofinterest. This often poses a challenge,as the choice of databases and miningstrategies are biological areaspecific.We find Reactome, UniHI, andIngenuity Systems useful andappropriate for many biological areas.

    We provide here an example ofassembly of a human dendritic cellsignaling pathway involved inresponding to microbes, assembled inCellDesigner, built using a KDO-based

    doi:10.1371/journal.pcbi.0040016.g001

    Figure 1. Schematic Illustrating the Biological Pathway Building Process

    Pathway curators initially mine information (Step 1). The mining process can be initiated by twobroad pathway building objectives: (a) DDO wherein a list of genes and/or proteins are obtained byhigh-throughput experiments such as microarray, mass spectrometry or (b) KDO wherein a broadtopic of interest is chosen and then the knowledge concerning this topic is mined from resourcessuch as the primary literature and knowledgebases. Information from the mining process isassembled (Step 2), using pathway building tools, into a pathway, which, following many iterationsof feedback from domain experts (Step 3) and refinement (Step 4), leads to the desired specificannotated pathway.

    PLoS Computational Biology | www.ploscompbiol.org February 2008 | Volume 4 | Issue 2 | e160002

  • 7/30/2019 Getting Started in Biological Pathway Construction and Analysis

    3/5

    information mining approach. Asnapshot of the pathway is in Figure 2.We extracted information such as

    TLRs, TRIF, MyD88, RIGI, IRF3, andIFNb predominantly from primaryliterature and comprehensive reviewpapers obtained from databases such asPubMed. The Reactomes and

    Ingenuity systems presorted manuallycurated information and search toolsenabled us to reliably identify andextract the pertinent entities and

    interactions. Identification and

    extraction of relevant informationfrom appropriate primary literature isa tedious task. Although slower, use ofinformation from the pathwayresources expedited the identification

    step. The relevant primary literature isalso populated as annotations forentities and interactions while creatingthe pathway (unpublished data). Theefficient building and visualization of apathway requires the use of

    appropriate software. We chose to

    assemble the pathway in CellDesignerdue to its flexible graphics capabilitiesthat facilitate a clear presentation ofhigh granularity pathways.

    DDO pathway building, which canfollow a similar process, differs in thatthe starting point is typically acollection of genes or proteinsidentified in a global experiment whoserelationships are not well understood.In this case, the pathway buildingprocess is used to elucidate thepathways and functional relationships

    shared by regulated entities.

    Pathway Analysis

    Pathway analysis refers to thecomputational approaches used toinvestigate network behavior as asystem. Pathway analysis can be broadlyclassified into two types: topological/structural network analysis anddynamical analysis.

    Topological analysis of a pathwayidentifies the global qualitative

    properties of the system [6]. Oneapproach uses classical graph theory toidentify various motifs in a pathway

    represented as a directed graph. Amotif is a group of interacting entitiescapable of information processing thatappears repeatedly. If the graph issigned (i.e., the positive or negative

    regulatory effects of each interactionthat may be obtained from primaryliterature are specified), Booleannetwork analysis can be used to identify

    the semi-quantitative features such as

    positive/negative feedback loops andminimal cut sets in the pathway.Feedback loops strongly affect thebehavior of the system. A minimal cutset of entities is the smallest group of

    entities that, when disrupted, affect theparticular network behavior of interest.The identification of minimal cut setsaids the assessment of the robustness ofa system. Motifs, feedback loops, andminimal cut sets of a pathway

    connecting, for example, a receptor

    Table 1. A List of Databases, Classified Based on the Type of Information Represented, Commonly Used during a Biological PathwayConstruction

    Database Description

    ProteinProtein Interaction Databases: Organize

    experimental and/or in silico interactions

    BIND 200,000 documented biomolecular interactions and complexes

    MINT Exp erimentall y v erif ied in teracti ons

    HPRD Elegant and comprehensive presentation of the interactions, entities,and evidences

    MPact Yeast interactions. A part of MIPS

    D IP Exp erimentall y determi ned interacti ons

    IntAct Database and analysis system of binary and multiprotein interactions

    PD ZBase PD Z D omai n co ntain ing p roteins

    GNPV B ased on spec if ic e xperiments and l it erature

    BioGr id Ph ysical a nd geneti c in teractio ns

    UniHi Comprehensive human prote in int eract ions

    O PHID Comb ines PPI f rom BIND, HPRD, and MINT

    Metabolic Pathways Databases: Compendium of pathways

    describing metabolic and physical processes (Primary source

    for metabolic information initiated by Stanford Research Initiative)

    EcoCyc Ent ire ge nome and biochemical mac hine ry of E. coli

    MetaC yc Pa th ways of mo re than 165 species

    HumanCyc Human metabolic pathways and the human genome

    B ioCyc Col lec tion of dat abases for several organism

    Signaling Pathways Databases: Pathways

    pertaining to signal transduction

    K EGG Comprehensive . L inks t o several useful database s

    PANTHER Compendium of pathways built using CellDesigner

    Reactome Hierarchical layout. Extensive links to relevant databases

    Biomodels Domain experts curated pathways and associated mathematical models

    STKE Repository of canonical pathways

    Ingenuity Systems Commercial mammalian biological knowledgebase

    PID Compendium of several assembled s ignaling pathways

    BioPP Repository of biological pathways built using CellDesigner

    Most databases have a graphics viewer for displaying entities and interactions. Refer to Table S1 for a more detailed description and URLs of these databases.BIND, Biomolecular Interaction Network Database; BioPP, Biological Pathway Publisher; DIP, Database of Interacting Proteins; EcoCyc, Encyclopaedia of E. coli Genes and Metabolism;GNPV, Genome Network Platform Viewer; HPRD, Human Protein Reference Database; KEGG, Kyoto Encyclopedia of Genes and Genomes; MetaCyc, a Metabolic Pathway database; MINT,Molecular INTeration database; MIPS, Munich Information center for Protein Sequences; OPHID, Online Predicted Human Interaction Database; PANTHER, Protein Analysis throughEvolutionary Relationship database; PID, The Pathway Interaction Database; STKE, Signal Transduction Knowledge Environment; UNIHI, Unified Human Interactome.doi:10.1371/journal.pcbi.0040016.t001

    PLoS Computational Biology | www.ploscompbiol.org February 2008 | Volume 4 | Issue 2 | e160003

  • 7/30/2019 Getting Started in Biological Pathway Construction and Analysis

    4/5

    and a transcription factor, such asNFjB, that regulates many genes,illustrate the global properties of thesystem. Probabilistic graphical modelsapproaches such as Bayesian networkanalysis are used to analyze and learnabout the cellular networks fromquantitative experimental data and toinfer indirect relationships.

    Dynamical analysis, a higher

    resolution mathematical modeling,elucidates the detailed local and certainglobal quantitative behaviors of thesystem. Dynamical analysis requiresmore information on the reactionparameters and initial conditions thantopological approaches [6].Deterministic dynamical analysis usesdifferential equations to describereactions. Deterministic partial least

    square (PLS) models assume thenetwork of pathways as a processorunit. Based on the appropriatequantitative experimentalmeasurements of key entities in an apriori known network of pathways, PLSmodels can be used to predict the time-dependent cross-talk between pathwaysof the network under certainconditions. Another approach is

    doi:10.1371/journal.pcbi.0040016.g002

    Figure 2. Example of KDO Pathway Assembly: Signal Transduction Pathways Involved during Infection due to Pathogens such as Virus, Bacteria inMammalian Dendritic Cells

    Starting from a broad topic of interestinfection in mammalian dendritic cellsusing the resources in Table 1, this network of pathways was built.

    PLoS Computational Biology | www.ploscompbiol.org February 2008 | Volume 4 | Issue 2 | e160004

  • 7/30/2019 Getting Started in Biological Pathway Construction and Analysis

    5/5

    stochastic modeling which uses aprobabilistic representation.Deterministic models describe averagebehavior. Stochastic approaches areimportant when the absolute numberof the reactant molecules in each cell issmall. In this condition, theprobabilistic nature of chemicalreactions may affect system behavior

    and deterministic models may not bevalid. Many software tools are availablefor topological and dynamical pathwayanalysis [7,8]. &

    Supporting Information

    Table S1. A list of Frequently UsedDatabases, Classified Based on the Type ofInformation Represented, during aBiological Pathway Construction, TheirProperties, and URLs

    A comprehensive list of databases can befound in Pathguide (http://www.pathguide.org). A, automated curation; B, both manualand automated curation; BIND,Biomolecular Interaction Network

    Database; BioPP, Biological PathwayPublisher; DIP, Database of InteractingProteins; EcoCyc, Encyclopaedia of E. coliGenes and Metabolism; GNPV, GenomeNetwork Platform Viewer; HPRD, HumanProtein Reference Database; KEGG, KyotoEncyclopedia of Genes and Genomes; M,manual curation; MetaCyc, a MetabolicPathway database; MINT, MolecularInteration Database; MIPS, MunichInformation Center for Protein Sequences;

    N, No; OPHID, Online Predicted HumanInteraction Database; PANTHER, ProteinAnalysis through Evolutionary RelationshipDatabase; PID, The Pathway InteractionDatabase; STKE, Signal TransductionKnowledge Environment, UNIHI, UnifiedHuman Interactome; Y, yes.

    Found at doi:10.1371/journal.pcbi.0040016.st001 (61 KB DOC)

    Acknowledgments

    Author contributions. GAV, JS, SP, GN,and SCS wrote the paper.

    Funding. Our pathway research is sup-ported by US National Institutes of HealthNIAID contract HHSN2662000500021C.

    Competing interests. The authors havedeclared that no competing interests exist.

    References

    1. Oda K, Kitano H (2006) A comprehensive mapof the toll-like receptor signaling network. MolSyst Biol 2: 2006 0015.

    2. Joshi-Tope G, Gillespie M, Vastrik I,DEustachio P, Schmidt E, et al. (2005)Reactome: a knowledgebase of biologicalpathways. Nucleic Acids Res 33: D428D432.

    3. Stromback L, Jakoniene V, Tan H, Lambrix P(2006) Representing, storing and accessingmolecular interaction data: a review of modelsand tools. Brief Bioinform 7: 331338.

    4. Baclawski K, Niu T (2006) Ontologies forbioinformatics. Cambridge (Massachusetts):The MIT Press.

    5. Brazma A, Krestyaninova M, Sarkans U (2006)Standards for systems biology. Nat Rev Genet7: 593605

    6. Alon U (2007) An introduction to systemsbiology: design principles of biological circuits.Boca Raton (Florida): Chapman & Hall/CRC.

    7. Kashtan N, Itzkovitz S, Milo R, Alon U (2004)Efficient sampling algorithm for estimatingsubgraph concentrations and detectingnetwork motifs. Bioinformatics 20: 17461758.

    8. Alves R, Antunes F, Salvador A (2006) Tools forkinetic modeling of biochemical networks. NatBiotechnol 24: 667672.

    PLoS Computational Biology | www.ploscompbiol.org February 2008 | Volume 4 | Issue 2 | e160005