Bioinformatics 2005 Torrance 2537 8

download Bioinformatics 2005 Torrance 2537 8

of 2

Transcript of Bioinformatics 2005 Torrance 2537 8

  • 7/27/2019 Bioinformatics 2005 Torrance 2537 8

    1/2

    BIOINFORMATICS APPLICATIONS NOTEVol. 21 no.10 2005, pages 25372538

    doi:10.1093/bioinformatics/bti331

    Structural bioinformatics

    Protein structure topological comparison, discovery and

    matching serviceG. M. Torrance1, D. R. Gilbert1,, I. Michalopoulos2 and D. W. Westhead2

    1Bioinformatics Research Centre, Department of Computer Science, University of Glasgow, Glasgow G12 8QQ,

    UK and 2School of Biochemistry and Microbiology, University of Leeds, Leeds LS2 9JT, UK

    Received on July 23, 2004; revised on February 4, 2005; accepted on February 14, 2005

    Advance Access publication March 1, 2005

    ABSTRACT

    Summary: We describe a fold level fast protein comparison and motif

    matching facility based on the TOPS representation of structure. Thisprovides an updateto a previous serviceat theEBI, with a better graph

    matching with faster results and visualization of both the structures

    being compared against and the common pattern of each with the

    target domain.

    Availability: Web service at http://balabio.dcs.gla.ac.uk/tops or via

    the main TOPS site at http://www.tops.leeds.ac.uk. Software is also

    available for download from these sites.

    Contact: [email protected]

    INTRODUCTION

    Protein topological comparison provides a means of assessing struc-

    tural similarities between distantly related domains. In addition, the

    abstract idea of a fold can be expressed as a motif (pattern), andsearched for in the database of known structures. Finally, discovery

    of patterns for sets of structures is a simple extension of the pairwise

    pattern discovery used for comparison.

    Two of these services, comparison and matching, have been avail-

    able from a site at the EBI for a while (Gilbert et al., 1999, 2001),

    and this update provides faster computation and visualization of the

    results both as topology cartoons (Westhead et al., 1999) and dia-

    grams. The availability of the results immediately (Fig. 1) rather

    than as an email reply is due to the incorporation of a new faster

    graph-matching algorithm which exploits the fact that the graphs are

    vertex oriented. Full details of the algorithm can be found in Viksna

    and Gilbert (2001). Some of these services have beenintegrated with

    the main TOPS site at Leeds (Michalopoulos et al., 2004).

    Comparison is here defined as pairwise pattern discovery; match-ing is determination of the presence or absence of a pattern as a

    subgraph of a structure graph. Indeed, pattern discovery relies on

    matching as it tries to find the largest pattern that will match all the

    members of a set of structures.

    COMPARISON

    Structures for comparison are submitted as coordinate files that are

    converted by DSSP (Kabsch and Sander, 1983) and the TOPS pro-

    gram (Gilbert et al., 2001) into a more abstract topological form.

    This topology graph is then compared by pairwise pattern discovery

    To whom correspondence should be addressed.

    with subsets of a database of other, precomputed, domain graphs.

    Domain definitions from both the CATH and SCOP classifications

    are available, with subsets of these classifications at different levels;homologous superfamily, topology for CATH; superfamily, fold

    for SCOP.

    The purpose of the comparison service is to give an overview of

    how a structure relates to all of its neighboursand this can quickly

    be achieved by a comparisonwith the CATH topology representatives

    (orthe SCOP folds). Theresultsare rankedby a valuecalled compres-

    sion, which is a measure of how well the common pattern between

    two structures describes those structures. When comparing a probe

    structure to a database, this value can be useful for distinguishing

    reasonable hits from more distant similarities.

    DISCOVERY

    Common patterns can be discovered for sets of more than two struc-tures. As pattern discovery scales well with the number of graphs

    in the input set, there is no practical limit to the size of family that

    could be used. A service is provided for browsing the common pat-

    terns of groups of structures based on the representative structures

    for a particular level of the CATH classification tree. These dynamic-

    ally discovered patterns can be matched back against the database to

    determine the number of structures it matches outside of the group

    that it was generated from. This gives an idea of how specific the

    pattern is for that group.

    MATCHING

    Several predefined classicpatterns are provided to matchagainst the

    database (Fig. 2). These are based on widely known fold types (thesuperfolds: Rossmann folds, TIM barrels, jellyrolls, OB fold, plaits

    and immunoglobulins). To determine the answers to more abstract

    research questions like what sizes of porin exist?, there is a form for

    user-defined patterns which must be encoded in a compact string

    graph. This format is similar in principle to that recently used for

    fold comparison by alignment of strings (Jonassen and Taylor, 2003)

    although in our case the strings are only a representation, not a data

    format.

    MULTIPLE STRUCTURE ALIGNMENT

    The patterns discovered by TOPS can be thought of as aligned cores

    of a set of structures. Two other programs use this capability of

    The Author 2005 . Published by Oxford University Press. All rights reserved. For Permissions, please email: jour [email protected] 2537

    byguestonSeptember8,2013

    http://bioinformatics.oxfordjourn

    als.org/

    Downloadedfrom

    http://balabio.dcs.gla.ac.uk/topshttp://www.tops.leeds.ac.uk/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://www.tops.leeds.ac.uk/http://balabio.dcs.gla.ac.uk/tops
  • 7/27/2019 Bioinformatics 2005 Torrance 2537 8

    2/2

    G.M.Torrance et al.

    Fig. 1. Matching results.

    Fig. 2. Predefined patterns

    multiple structural alignment as the basis for alignment of multiple

    sequences.

    The MSATmultiple sequence alignment by topologysite (Ren

    et al., 2004) has alignments of sequences driven by the common

    TOPSpattern of theirstructures. The sequence segments correspond-

    ing to those secondary structure elements that have beenmatched are

    aligned with ClustalW. The Topsalign program, on the other hand,

    uses simulated annealing to optimize the initial TOPS alignment of

    the secondary structure elements (Williams et al., 2003).

    DISPLAY

    The graphical display of each common pattern as a diagram (similar

    to those first produced by Koch et al., 1992) gives a more obvious,

    and more comprehensible, idea of why a particular structure was

    ranked at that position on the result list. Since the common patternis an intermediate result in the process of comparison, it is easily

    available for display and does not require any further calculation.

    Cartoons have also been pre-generated for all the structures whose

    graphsare in the database, and these are shown alongside each result.

    These cartoons are easily scaled since they are stored as coordinates

    and images are generated anew each time; they are also available in

    postscript, pdf and svg formats.

    The cartoon drawing is largely independent of the comparison,

    matching and discovery services. The same goes for the linear dia-

    grams, which more easily allows the two of them to be reused as a

    web page result. Therefore, it can be used by other sites to visualize

    their results. For example, MSAT patterns can be displayed as high-

    lights on the cartoons of a set of protein topologies. Even though the

    pictures are generated dynamically, the cgi parameters are construc-

    ted into a pseudo-url so that browsers will use the correct filename

    when saving the image.

    IMPLEMENTATION

    All the software has been written in Java, apart from two accessory C

    programs (DSSP and TOPS). The service is implemented as servlets

    running in an Apache Tomcat container on a linux webserver. Picture

    generation is enabled by the pure java AWT (pja) toolkit. A mysql

    database is also part of the system.

    ACKNOWLEDGEMENT

    GMT and IM were supported by a joint BBSRC/EPSRC grant,project number 320/BIO14429.

    REFERENCES

    Gilbert,D. et al. (1999) Motif-based searching in TOPS protein topology databases.

    Bioinformatics, 15, 317326.

    Gilbert,D. etal. (2001) A computersystem to perform structure comparisonusingTOPS

    representations of protein structure. Comput. Chem., 26, 2330.

    Johannissen,L.O. and Taylor,W.R. (2003) Protein fold comparison by the alignment of

    topological strings. Protein Eng., 12, 949955.

    Kabsch,W. and Sander,C. (1983) Dictionary of protein secondary structure: pattern

    recognition of hydrogen-bonded and geometrical features. Biopolymers , 22,

    25772637.

    Koch,I. et al. (1992) Analysis of protein sheet topologies by graph theoretical methods.

    Proteins, 12, 314323.

    Michalopoulos,I. et al. (2004) TOPS: an enhanced database of protein structuraltopology. Nucleic Acid Res., 32, D251D254.

    Ren,T. et al. (2004). MSAT: a multiple sequence alignment tool based on TOPS. Appl.

    Bioinformatics, 3(23), 149158

    Viksna,J. and Gilbert,D. (2001) Pattern matching and pattern discovery algorithms for

    protein topologies. In Algorithms in Bioinformatics: First International Workshop,

    WABI2001 Proceedings, Vol. 2149, LectureNotes in ComputerScience. pp. 98111.

    ISBN 3-540-42516-0.

    Westhead,D. etal. (1999)Protein structuraltopology: automated analysis, diagrammatic

    representation and database searching. Protein Sci., 8, 897904.

    Williams,A. et al. (2003) Multiple structural alignment for distantly related all struc-

    tures using TOPS pattern discovery and simulated annealing Protein Eng., 16,

    91323.

    2538

    byguestonSeptember8,2013

    http://bioinformatics.oxfordjourn

    als.org/

    Downloadedfrom

    http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/