Bioinformatics 2005 Torrance 2537 8
-
Upload
niranjan-bhuvanaratnam -
Category
Documents
-
view
216 -
download
0
Transcript of Bioinformatics 2005 Torrance 2537 8
-
7/27/2019 Bioinformatics 2005 Torrance 2537 8
1/2
BIOINFORMATICS APPLICATIONS NOTEVol. 21 no.10 2005, pages 25372538
doi:10.1093/bioinformatics/bti331
Structural bioinformatics
Protein structure topological comparison, discovery and
matching serviceG. M. Torrance1, D. R. Gilbert1,, I. Michalopoulos2 and D. W. Westhead2
1Bioinformatics Research Centre, Department of Computer Science, University of Glasgow, Glasgow G12 8QQ,
UK and 2School of Biochemistry and Microbiology, University of Leeds, Leeds LS2 9JT, UK
Received on July 23, 2004; revised on February 4, 2005; accepted on February 14, 2005
Advance Access publication March 1, 2005
ABSTRACT
Summary: We describe a fold level fast protein comparison and motif
matching facility based on the TOPS representation of structure. Thisprovides an updateto a previous serviceat theEBI, with a better graph
matching with faster results and visualization of both the structures
being compared against and the common pattern of each with the
target domain.
Availability: Web service at http://balabio.dcs.gla.ac.uk/tops or via
the main TOPS site at http://www.tops.leeds.ac.uk. Software is also
available for download from these sites.
Contact: [email protected]
INTRODUCTION
Protein topological comparison provides a means of assessing struc-
tural similarities between distantly related domains. In addition, the
abstract idea of a fold can be expressed as a motif (pattern), andsearched for in the database of known structures. Finally, discovery
of patterns for sets of structures is a simple extension of the pairwise
pattern discovery used for comparison.
Two of these services, comparison and matching, have been avail-
able from a site at the EBI for a while (Gilbert et al., 1999, 2001),
and this update provides faster computation and visualization of the
results both as topology cartoons (Westhead et al., 1999) and dia-
grams. The availability of the results immediately (Fig. 1) rather
than as an email reply is due to the incorporation of a new faster
graph-matching algorithm which exploits the fact that the graphs are
vertex oriented. Full details of the algorithm can be found in Viksna
and Gilbert (2001). Some of these services have beenintegrated with
the main TOPS site at Leeds (Michalopoulos et al., 2004).
Comparison is here defined as pairwise pattern discovery; match-ing is determination of the presence or absence of a pattern as a
subgraph of a structure graph. Indeed, pattern discovery relies on
matching as it tries to find the largest pattern that will match all the
members of a set of structures.
COMPARISON
Structures for comparison are submitted as coordinate files that are
converted by DSSP (Kabsch and Sander, 1983) and the TOPS pro-
gram (Gilbert et al., 2001) into a more abstract topological form.
This topology graph is then compared by pairwise pattern discovery
To whom correspondence should be addressed.
with subsets of a database of other, precomputed, domain graphs.
Domain definitions from both the CATH and SCOP classifications
are available, with subsets of these classifications at different levels;homologous superfamily, topology for CATH; superfamily, fold
for SCOP.
The purpose of the comparison service is to give an overview of
how a structure relates to all of its neighboursand this can quickly
be achieved by a comparisonwith the CATH topology representatives
(orthe SCOP folds). Theresultsare rankedby a valuecalled compres-
sion, which is a measure of how well the common pattern between
two structures describes those structures. When comparing a probe
structure to a database, this value can be useful for distinguishing
reasonable hits from more distant similarities.
DISCOVERY
Common patterns can be discovered for sets of more than two struc-tures. As pattern discovery scales well with the number of graphs
in the input set, there is no practical limit to the size of family that
could be used. A service is provided for browsing the common pat-
terns of groups of structures based on the representative structures
for a particular level of the CATH classification tree. These dynamic-
ally discovered patterns can be matched back against the database to
determine the number of structures it matches outside of the group
that it was generated from. This gives an idea of how specific the
pattern is for that group.
MATCHING
Several predefined classicpatterns are provided to matchagainst the
database (Fig. 2). These are based on widely known fold types (thesuperfolds: Rossmann folds, TIM barrels, jellyrolls, OB fold, plaits
and immunoglobulins). To determine the answers to more abstract
research questions like what sizes of porin exist?, there is a form for
user-defined patterns which must be encoded in a compact string
graph. This format is similar in principle to that recently used for
fold comparison by alignment of strings (Jonassen and Taylor, 2003)
although in our case the strings are only a representation, not a data
format.
MULTIPLE STRUCTURE ALIGNMENT
The patterns discovered by TOPS can be thought of as aligned cores
of a set of structures. Two other programs use this capability of
The Author 2005 . Published by Oxford University Press. All rights reserved. For Permissions, please email: jour [email protected] 2537
byguestonSeptember8,2013
http://bioinformatics.oxfordjourn
als.org/
Downloadedfrom
http://balabio.dcs.gla.ac.uk/topshttp://www.tops.leeds.ac.uk/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://www.tops.leeds.ac.uk/http://balabio.dcs.gla.ac.uk/tops -
7/27/2019 Bioinformatics 2005 Torrance 2537 8
2/2
G.M.Torrance et al.
Fig. 1. Matching results.
Fig. 2. Predefined patterns
multiple structural alignment as the basis for alignment of multiple
sequences.
The MSATmultiple sequence alignment by topologysite (Ren
et al., 2004) has alignments of sequences driven by the common
TOPSpattern of theirstructures. The sequence segments correspond-
ing to those secondary structure elements that have beenmatched are
aligned with ClustalW. The Topsalign program, on the other hand,
uses simulated annealing to optimize the initial TOPS alignment of
the secondary structure elements (Williams et al., 2003).
DISPLAY
The graphical display of each common pattern as a diagram (similar
to those first produced by Koch et al., 1992) gives a more obvious,
and more comprehensible, idea of why a particular structure was
ranked at that position on the result list. Since the common patternis an intermediate result in the process of comparison, it is easily
available for display and does not require any further calculation.
Cartoons have also been pre-generated for all the structures whose
graphsare in the database, and these are shown alongside each result.
These cartoons are easily scaled since they are stored as coordinates
and images are generated anew each time; they are also available in
postscript, pdf and svg formats.
The cartoon drawing is largely independent of the comparison,
matching and discovery services. The same goes for the linear dia-
grams, which more easily allows the two of them to be reused as a
web page result. Therefore, it can be used by other sites to visualize
their results. For example, MSAT patterns can be displayed as high-
lights on the cartoons of a set of protein topologies. Even though the
pictures are generated dynamically, the cgi parameters are construc-
ted into a pseudo-url so that browsers will use the correct filename
when saving the image.
IMPLEMENTATION
All the software has been written in Java, apart from two accessory C
programs (DSSP and TOPS). The service is implemented as servlets
running in an Apache Tomcat container on a linux webserver. Picture
generation is enabled by the pure java AWT (pja) toolkit. A mysql
database is also part of the system.
ACKNOWLEDGEMENT
GMT and IM were supported by a joint BBSRC/EPSRC grant,project number 320/BIO14429.
REFERENCES
Gilbert,D. et al. (1999) Motif-based searching in TOPS protein topology databases.
Bioinformatics, 15, 317326.
Gilbert,D. etal. (2001) A computersystem to perform structure comparisonusingTOPS
representations of protein structure. Comput. Chem., 26, 2330.
Johannissen,L.O. and Taylor,W.R. (2003) Protein fold comparison by the alignment of
topological strings. Protein Eng., 12, 949955.
Kabsch,W. and Sander,C. (1983) Dictionary of protein secondary structure: pattern
recognition of hydrogen-bonded and geometrical features. Biopolymers , 22,
25772637.
Koch,I. et al. (1992) Analysis of protein sheet topologies by graph theoretical methods.
Proteins, 12, 314323.
Michalopoulos,I. et al. (2004) TOPS: an enhanced database of protein structuraltopology. Nucleic Acid Res., 32, D251D254.
Ren,T. et al. (2004). MSAT: a multiple sequence alignment tool based on TOPS. Appl.
Bioinformatics, 3(23), 149158
Viksna,J. and Gilbert,D. (2001) Pattern matching and pattern discovery algorithms for
protein topologies. In Algorithms in Bioinformatics: First International Workshop,
WABI2001 Proceedings, Vol. 2149, LectureNotes in ComputerScience. pp. 98111.
ISBN 3-540-42516-0.
Westhead,D. etal. (1999)Protein structuraltopology: automated analysis, diagrammatic
representation and database searching. Protein Sci., 8, 897904.
Williams,A. et al. (2003) Multiple structural alignment for distantly related all struc-
tures using TOPS pattern discovery and simulated annealing Protein Eng., 16,
91323.
2538
byguestonSeptember8,2013
http://bioinformatics.oxfordjourn
als.org/
Downloadedfrom
http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/