Bioinformatics 2005 Torrance 2537 8

7/27/2019 Bioinformatics 2005 Torrance 2537 8

1/2

BIOINFORMATICS APPLICATIONS NOTEVol. 21 no.10 2005, pages 25372538

doi:10.1093/bioinformatics/bti331

Structural bioinformatics

Protein structure topological comparison, discovery and

matching serviceG. M. Torrance1, D. R. Gilbert1,, I. Michalopoulos2 and D. W. Westhead2

1Bioinformatics Research Centre, Department of Computer Science, University of Glasgow, Glasgow G12 8QQ,

UK and 2School of Biochemistry and Microbiology, University of Leeds, Leeds LS2 9JT, UK

Received on July 23, 2004; revised on February 4, 2005; accepted on February 14, 2005

Advance Access publication March 1, 2005

ABSTRACT

Summary: We describe a fold level fast protein comparison and motif

matching facility based on the TOPS representation of structure. Thisprovides an updateto a previous serviceat theEBI, with a better graph

matching with faster results and visualization of both the structures

being compared against and the common pattern of each with the

target domain.

Availability: Web service at http://balabio.dcs.gla.ac.uk/tops or via

the main TOPS site at http://www.tops.leeds.ac.uk. Software is also

available for download from these sites.

Contact: [email protected]

INTRODUCTION

Protein topological comparison provides a means of assessing struc-

tural similarities between distantly related domains. In addition, the

abstract idea of a fold can be expressed as a motif (pattern), andsearched for in the database of known structures. Finally, discovery

of patterns for sets of structures is a simple extension of the pairwise

pattern discovery used for comparison.

Two of these services, comparison and matching, have been avail-

able from a site at the EBI for a while (Gilbert et al., 1999, 2001),

and this update provides faster computation and visualization of the

results both as topology cartoons (Westhead et al., 1999) and dia-

grams. The availability of the results immediately (Fig. 1) rather

than as an email reply is due to the incorporation of a new faster

graph-matching algorithm which exploits the fact that the graphs are

vertex oriented. Full details of the algorithm can be found in Viksna

and Gilbert (2001). Some of these services have beenintegrated with

the main TOPS site at Leeds (Michalopoulos et al., 2004).

Comparison is here defined as pairwise pattern discovery; match-ing is determination of the presence or absence of a pattern as a

subgraph of a structure graph. Indeed, pattern discovery relies on

matching as it tries to find the largest pattern that will match all the

members of a set of structures.

COMPARISON

Structures for comparison are submitted as coordinate files that are

converted by DSSP (Kabsch and Sander, 1983) and the TOPS pro-

gram (Gilbert et al., 2001) into a more abstract topological form.

This topology graph is then compared by pairwise pattern discovery

To whom correspondence should be addressed.

with subsets of a database of other, precomputed, domain graphs.

Domain definitions from both the CATH and SCOP classifications

are available, with subsets of these classifications at different levels;homologous superfamily, topology for CATH; superfamily, fold

for SCOP.

The purpose of the comparison service is to give an overview of

how a structure relates to all of its neighboursand this can quickly

be achieved by a comparisonwith the CATH topology representatives

(orthe SCOP folds). Theresultsare rankedby a valuecalled compres-

sion, which is a measure of how well the common pattern between

two structures describes those structures. When comparing a probe

structure to a database, this value can be useful for distinguishing

reasonable hits from more distant similarities.

DISCOVERY

Common patterns can be discovered for sets of more than two struc-tures. As pattern discovery scales well with the number of graphs

in the input set, there is no practical limit to the size of family that

could be used. A service is provided for browsing the common pat-

terns of groups of structures based on the representative structures

for a particular level of the CATH classification tree. These dynamic-

ally discovered patterns can be matched back against the database to

determine the number of structures it matches outside of the group

that it was generated from. This gives an idea of how specific the

pattern is for that group.

MATCHING

Several predefined classicpatterns are provided to matchagainst the

database (Fig. 2). These are based on widely known fold types (thesuperfolds: Rossmann folds, TIM barrels, jellyrolls, OB fold, plaits

and immunoglobulins). To determine the answers to more abstract

research questions like what sizes of porin exist?, there is a form for

user-defined patterns which must be encoded in a compact string

graph. This format is similar in principle to that recently used for

fold comparison by alignment of strings (Jonassen and Taylor, 2003)

although in our case the strings are only a representation, not a data

format.

MULTIPLE STRUCTURE ALIGNMENT

The patterns discovered by TOPS can be thought of as aligned cores

of a set of structures. Two other programs use this capability of

The Author 2005 . Published by Oxford University Press. All rights reserved. For Permissions, please email: jour [email protected] 2537

byguestonSeptember8,2013

http://bioinformatics.oxfordjourn

als.org/

Downloadedfrom
http://balabio.dcs.gla.ac.uk/topshttp://www.tops.leeds.ac.uk/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://www.tops.leeds.ac.uk/http://balabio.dcs.gla.ac.uk/tops

7/27/2019 Bioinformatics 2005 Torrance 2537 8

2/2

G.M.Torrance et al.

Fig. 1. Matching results.

Fig. 2. Predefined patterns

multiple structural alignment as the basis for alignment of multiple

sequences.

The MSATmultiple sequence alignment by topologysite (Ren

et al., 2004) has alignments of sequences driven by the common

TOPSpattern of theirstructures. The sequence segments correspond-

ing to those secondary structure elements that have beenmatched are

aligned with ClustalW. The Topsalign program, on the other hand,

uses simulated annealing to optimize the initial TOPS alignment of

the secondary structure elements (Williams et al., 2003).

DISPLAY

The graphical display of each common pattern as a diagram (similar

to those first produced by Koch et al., 1992) gives a more obvious,

and more comprehensible, idea of why a particular structure was

ranked at that position on the result list. Since the common patternis an intermediate result in the process of comparison, it is easily

available for display and does not require any further calculation.

Cartoons have also been pre-generated for all the structures whose

graphsare in the database, and these are shown alongside each result.

These cartoons are easily scaled since they are stored as coordinates

and images are generated anew each time; they are also available in

postscript, pdf and svg formats.

The cartoon drawing is largely independent of the comparison,

matching and discovery services. The same goes for the linear dia-

grams, which more easily allows the two of them to be reused as a

web page result. Therefore, it can be used by other sites to visualize

their results. For example, MSAT patterns can be displayed as high-

lights on the cartoons of a set of protein topologies. Even though the

pictures are generated dynamically, the cgi parameters are construc-

ted into a pseudo-url so that browsers will use the correct filename

when saving the image.

IMPLEMENTATION

All the software has been written in Java, apart from two accessory C

programs (DSSP and TOPS). The service is implemented as servlets

running in an Apache Tomcat container on a linux webserver. Picture

generation is enabled by the pure java AWT (pja) toolkit. A mysql

database is also part of the system.

ACKNOWLEDGEMENT

GMT and IM were supported by a joint BBSRC/EPSRC grant,project number 320/BIO14429.

REFERENCES

Gilbert,D. et al. (1999) Motif-based searching in TOPS protein topology databases.

Bioinformatics, 15, 317326.

Gilbert,D. etal. (2001) A computersystem to perform structure comparisonusingTOPS

representations of protein structure. Comput. Chem., 26, 2330.

Johannissen,L.O. and Taylor,W.R. (2003) Protein fold comparison by the alignment of

topological strings. Protein Eng., 12, 949955.

Kabsch,W. and Sander,C. (1983) Dictionary of protein secondary structure: pattern

recognition of hydrogen-bonded and geometrical features. Biopolymers , 22,

25772637.

Koch,I. et al. (1992) Analysis of protein sheet topologies by graph theoretical methods.

Proteins, 12, 314323.

Michalopoulos,I. et al. (2004) TOPS: an enhanced database of protein structuraltopology. Nucleic Acid Res., 32, D251D254.

Ren,T. et al. (2004). MSAT: a multiple sequence alignment tool based on TOPS. Appl.

Bioinformatics, 3(23), 149158

Viksna,J. and Gilbert,D. (2001) Pattern matching and pattern discovery algorithms for

protein topologies. In Algorithms in Bioinformatics: First International Workshop,

WABI2001 Proceedings, Vol. 2149, LectureNotes in ComputerScience. pp. 98111.

ISBN 3-540-42516-0.

Westhead,D. etal. (1999)Protein structuraltopology: automated analysis, diagrammatic

representation and database searching. Protein Sci., 8, 897904.

Williams,A. et al. (2003) Multiple structural alignment for distantly related all struc-

tures using TOPS pattern discovery and simulated annealing Protein Eng., 16,

91323.

2538

byguestonSeptember8,2013

http://bioinformatics.oxfordjourn

als.org/

Downloadedfrom
http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/http://bioinformatics.oxfordjournals.org/

Bioinformatics 2005 Torrance 2537 8

Documents

Transcript of Bioinformatics 2005 Torrance 2537 8