Converting the KEGG Pathway Database to SBMLsbml.org/images/b/bd/Kegg2sbmlICSB2004.pdf · As a...

1
Akira Funahashi 1,2,3* , Akiya Jouraku 3,* , Hiroaki Kitano 1,3,4,5 1 JST/ERATO-SORST Kitano Symbiotic Systems Project, Tokyo, Japan 2 School of Medicine, Keio University, Tokyo, Japan 3 School of Fundamental Science and Technology, Keio University, Kanagawa, Japan 4 The Systems Biology Institute, Tokyo, Japan Abstr act Systems biology is characterized by the synergistic integration of theory, computational modeling, and experiment. Though software infrastructure is one of the most critical components of systems biology research, the field still lacks common infrastructure and standards to enable the integration of computational resources. The Systems Biology Markup Language (SBML) was developed to help address this problem. SBML is an open, XML-based format for representing biochemical reaction networks. Several dozen simulation and analysis packages already support SBML and more are in the process of being extended to support it. Identification of gene-regulatory logic and biochemical networks is a major challenge of systems biology. Several attempts are underway to create large-scale, comprehensive databases of gene-regulatory and biochemical networks. Making the contents of these databases available in SBML format is useful for the following reasons: (1) it will enable researchers to apply many SBML-aware software tools to the networks in those databases, and (2) the feedback from developing the translation tools will provide valuable feedback for the continued evolution of SBML. As a first attempt at writing translation tools, we have decided to convert the KEGG (Kyoto Encyclopedia of Genes and Genomes) database. We have implemented a converter called KEGG2SBML that automatically converts KEGG pathway database files into SBML. With this converter (KEGG2SBML), we have succeeded in converting 12,122 KEGG pathways into SBML Level 1 and Level 2 documents. All converted SBML documents are freely available. KEGG2SBML is available as an open-source package. KEGG : Kyoto Encyclopedia of Genes and Genomes KEGG database contains more than 13,000 metabolic pathways for more than 170 organisms. http://www.genome.jp/kegg/ Pathway Database 1. Metabolism 2. Genetic Information Processing 3. Environmental Information Processing 4. Cellular Processes 5. Human Diseases Converting the KEGG Pathway Database to SBML KEGG to SBML converter We have implemented a converter called KEGG2SBML that automatically converts KEGG pathway database files into SBML Level 1 and Level 2 files. KEGG2SBML uses the PATHWAY database, LIGAND database and KEGG Markup Language (KGML) as an input to generate SBML documents. The LIGAND database is a collection of information about biochemical compounds and reactions, and KGML is a specification of graph objects in the KEGG PATHWAY database. Further, KEGG2SBML can parse diagram layout information from KEGG and add it to SBML; the result can be used in CellDesigner, a process network diagram editor we have also developed. (See http://www.systems-biology.org for information about CellDesigner.) KEGG2SBML is implemented with Perl5 (>= 5.6.1), and requires following libraries: expat (>= 1.95.2), XML::Parser (>= 2.1.9) and libxml-perl (>= 0.07). KEGG2SBML has been tested on UNIX-based operating systems such as Linux and FreeBSD. The converter should also run on other UNIX platforms and Cygwin under Windows. KEGG2SBML is open source and available from http://sbml.org/software/kegg2sbml/ KEGG Database Converted SBML Document (CellDesigner) KEGG metabolic pathways The KEGG database contains a variety of metabolic pathways, which consists of compounds and enzymes. For example, the citrate cycle (TCA cycle) of Saccharomyces cerevisiae is represented as shown below. Compounds are represented as circle nodes, and enzymes are represented as rectangle nodes. A reaction between these compounds is represented as an arrow, and the enzyme which corresponds to this reaction is on the arrow. SBML The Systems Biology Markup Language (SBML) is an XML dialect for representing and exchanging quantitative and qualitative models of biochemical reaction networks. SBML is applicable to metabolic networks, cell-signaling pathways, genomic regulatory networks, and many other areas in systems biology. Citrate cycle (TCA cycle) of Saccharomyces cerevisiae Example of a reaction used in KEGG metabolic pathways Compound Compound Enzyme <listOfS pecies> <species name="X0"> <species name="S 1"> </lis tO fS pecies > <lis tO fR ea ctions > <reaction name="reaction_1"/> <lis tO fR ea cta nts > <speciesR eference species="X0"/> </lis tO fR ea cta nts > <lis tO fP roducts > <speciesR eference species="S 1"/> </lis tO fP roducts > <kineticLawformula="k1 * X0"> <listOfParameters> </lis tO fP a ra meters > </kineticLaw> </reaction> </lis tO fR ea ctions > Biochemical reaction SBML representation Product Reactant X0 S1 k1 * X0 Converted SBML models are available from http://systems-biology.org PATHWAY LIGAND KGML KEGG2SBML Level 1 Level 2 KEGG database Structure of KEGG-to-SBML converter Converted SBML models We have succeeded in converting 12,122 KEGG metabolic pathways into SBML Level 1 and Level 2 documents. Existing SBML-aware applications (ex. CellDesigner) can directly use these pathways. 5 Sony Computer Science Labs, Tokyo, Japan * These authors contributed equally to this work. Acknowledgement Support for KEGG2SBML development comes from ERATO-SORST Kitano Symbiotic Systems Project, JST and Special Coordination Funds for Promoting Science, and the Ministry of Education, Culture, Sports, Science and Technology, Grant-in-Aid for the 21st century COE Program entitled "Understanding and Control of Life's Function via Systems Biology", Keio University.

Transcript of Converting the KEGG Pathway Database to SBMLsbml.org/images/b/bd/Kegg2sbmlICSB2004.pdf · As a...

Page 1: Converting the KEGG Pathway Database to SBMLsbml.org/images/b/bd/Kegg2sbmlICSB2004.pdf · As a first attempt at writing translation tools, we have decided to convert the KEGG (Kyoto

Akira Funahashi1,2,3*, Akiya Jouraku3,*, Hiroaki Kitano1,3,4,5

1JST/ERATO-SORST Kitano Symbiotic Systems Project, Tokyo, Japan 2School of Medicine, Keio University, Tokyo, Japan

3School of Fundamental Science and Technology, Keio University, Kanagawa, Japan 4The Systems Biology Institute, Tokyo, Japan

AbstractSystems biology is characterized by the synergistic integration of theory, computational modeling, and experiment. Though software infrastructureis one of the most critical components of systems biology research,the field still lacks common infrastructure and standards to enable the integrationof computational resources. The Systems Biology Markup Language (SBML) was developed to help address this problem. SBML is an open, XML-based format for representing biochemical reaction networks. Several dozen simulation and analysis packages already support SBML and more are in the process of being extended to support it.

Identification of gene-regulatory logic and biochemical networks is a major challenge of systems biology. Several attempts are underway to create large-scale, comprehensive databases of gene-regulatory and biochemical networks. Making the contents of these databases availablein SBML format is useful for the following reasons:

(1) it will enable researchers to apply many SBML-aware software tools to the networks in those databases, and (2) the feedback from developing the translation tools will provide valuable feedback for the continued evolution of SBML.

As a first attempt at writing translation tools, we have decided to convertthe KEGG (Kyoto Encyclopedia of Genes and Genomes) database. We have implemented a converter called KEGG2SBML that automatically converts KEGG pathway database files into SBML. With this converter (KEGG2SBML), we have succeeded in converting 12,122 KEGG pathways into SBML Level 1 and Level 2 documents. All converted SBML documents are freely available. KEGG2SBML is available as an open-source package.

KEGG : Kyoto Encyclopedia of Genes and Genomes・KEGG database contains more than 13,000 metabolic pathways for more than 170 organisms.・http://www.genome.jp/kegg/・Pathway Database 1. Metabolism 2. Genetic Information Processing 3. Environmental Information Processing 4. Cellular Processes 5. Human Diseases

Converting the KEGG Pathway Database to SBML

KEGG to SBML converterWe have implemented a converter called KEGG2SBML that automatically convertsKEGG pathway database files into SBML Level 1 and Level 2 files. KEGG2SBML usesthe PATHWAY database, LIGAND database and KEGG Markup Language (KGML) as aninput to generate SBML documents. The LIGAND database is a collection of informationabout biochemical compounds and reactions, and KGML is a specification of graphobjects in the KEGG PATHWAY database. Further, KEGG2SBML can parse diagramlayout information from KEGG and add it to SBML; the result can be used in

CellDesigner, a process network diagram editor we have also developed. (See http://www.systems-biology.org for information about CellDesigner.) KEGG2SBML is implemented with Perl5 (>= 5.6.1), and requires following libraries: expat (>= 1.95.2), XML::Parser (>= 2.1.9) and libxml-perl (>= 0.07). KEGG2SBML has been tested on UNIX-based operating systems such as Linux and FreeBSD. The converter should also run on other UNIX platforms and Cygwin under Windows.

KEGG2SBML is open source and availablefrom http://sbml.org/software/kegg2sbml/

KEGG Database

Converted SBML Document (CellDesigner)

KEGG metabolic pathwaysThe KEGG database contains a variety of metabolic pathways, which consists ofcompounds and enzymes. For example, the citrate cycle (TCA cycle) ofSaccharomyces cerevisiae is represented as shown below. Compounds are represented as circle nodes, and enzymes are represented as rectanglenodes. A reaction between these compounds is represented as an arrow, andthe enzyme which corresponds to this reaction is on the arrow.

SBMLThe Systems Biology Markup Language (SBML) is an XML dialect for representingand exchanging quantitative and qualitative models of biochemical reaction networks. SBML is applicable to metabolic networks, cell-signaling pathways, genomic regulatory networks, and many other areas in systems biology.

Citrate cycle (TCA cycle) of Saccharomyces cerevisiae

Example of a reaction used in KEGG metabolic pathways

CompoundCompound Enzyme

<lis tOfS pecies> <species name="X0"> <species name="S 1"></lis tOfS pecies><lis tOfR eactions> <reaction name="reaction_1"/> <lis tOfR eactants> <speciesR eference species="X0"/> </lis tOfR eactants> <lis tOfP roducts> <speciesR eference species="S 1"/> </lis tOfP roducts> <kineticLawformula="k1 * X0"> <lis tOfP arameters> </lis tOfP arameters> </kineticLaw> </reaction></lis tOfR eactions>

Biochemical reaction

SBML representation

ProductReactant

X0 S1k1 * X0

Converted SBML models are available from http://systems-biology.org

PATHWAY

LIGAND

KGML

KEGG2SBML

Level 1

Level 2

KEGG database

Structure of KEGG-to-SBML converter

Converted SBML modelsWe have succeeded in converting 12,122 KEGG metabolic pathways into

SBML Level 1 and Level 2 documents. Existing SBML-aware applications (ex. CellDesigner) can directly use these pathways.

5Sony Computer Science Labs, Tokyo, Japan * These authors contributed equally to this work.

Acknowledgement Support for KEGG2SBML development comes from ERATO-SORST Kitano Symbiotic Systems Project, JST and Special Coordination Funds for Promoting Science, and the Ministry of Education, Culture, Sports, Science and Technology, Grant-in-Aid for the 21st century COE Program entitled "Understanding and Control of Life's Function via Systems Biology", Keio University.