Embed Size (px)
Transcript of Cath Database
Presented By Shri Vaishnavi & Pinky
HIERARCHIAL DOMAIN CLASSIFICATION OF PROTEIN STRUCTURES
CLASS: Secondary structure packing within the protein structure. Alpha-helices, BetaSheets and AlphaBeta. Includes both alpha/beta and alpha+beta.
ArchitectureDistinguishes structures within the same class, but different architectures. Groupings can sometimes be rather broad as they describe general features of protein-fold shape. Ex: Tim Barrel, the number of layers in an sandwich(Orengo C.A et al., 1997)
TopologyArrangement and connectivity of secondary structure elements are same in number. Within the topology level, structures are same but may differ in function. Ex. Globin or immunoglobin fold.
HomologyStructures are grouped by their high structural similarity and similar functions. They may have evolved from a common ancestor. Non-bundle globin-like foldsthe erythrocruorins, colicins, phycocyanins and domain 1 of diptheria toxin all have the same CAT number (1.10.340), but are differentiated by their H numbers 10, 20, 30 and 40, respectively
Sequence familyHave sequence identities >35%Presumed to have extremely similar structures and functions they may be slightly different examples of the same protein from different species belonging to the same sequence superfamily. SOLID.
FLOW CHART OF CATH DATABASE
CATH SERVER PROTOCOL (FRANCES PEARL ET AL.,2005)Input Structure to Server (http://www.biochem.ucl.ac.uk/cgi-bin/cath/CathServer.pl)
Generate derived data from the PDB coordinate filesIdentify more remote homologues
Set Threshold E-value from validated structural homologsIf match is found, superfamilies are structurally compared with the query structure using the SSAP structure alignment program
Any query structure unmatched is scanned against a library of representative structures from each close sequence family in CATH
The top 10 matches are displayed
DICTIONARY OF HOMOLOGOUS SUPERFAMILIES (DHS) ( J.E.BRAY ET AL.,2000)
Database of validated multiple structural alignment annotated with consensus functional information for evolutionary protein families.
A powerful resource to validate, examine and visualize key structural and functional features of each homologous superfamily.
Also provides a tool for examining sequence-structure relationships for proteins within each fold group
GENERATION OF DATA FOR THE DHSGeneration of structure comparison data using SSAP
Comparisons provide a complete data set for analyzing analogues , homologous and checking for incorrect classifications
Automatic validation of structural relatives(DHS-VALID)
DHS-VALID program is used to check automatically all the pairwise sequence and structure comparison data generated for each fold group and homologues superfamily in CATH.
Generation of multiple structural alignment using CORA
Conserved Residue Attributes Uses the pairwise structural comparison data from SSAP to determine the initial set of proteins to be aligned Identifies conserved characteristics and expresses as a 3D structural profile Profiles encapsulate the core
Annotation of structural alignments
GENE3D (DANIEL W.A. BUCHAN ET AL.,2002)
It is focused on providing structural annotation for protein sequences without structural representatives The protein sequences have also been clustered into whole chain families so as to aid functional prediction.
The structural annotation is generated using HMM models based on the CATH domain familiesApplications:
Annotate Hypothetical proteins and gene (Corin Yeats et al.,2006)Examine the functions of homologous superfamilies that are multiply expanded within genomes or sets of genomes.
APPLICATIONS OF CATH DATABASE
CATH database was used as a guide to select proteins from a wide variety of protein families (Jonathan G. Lees et al.,2006)
To capture evolutionary divergence (Lesley H. Greene et al.,2007)For identifying remote homologs (J.E.Bray et al.,2000)
The organization of proteins by global structural similarity helps improve prediction algorithms based on fold recognitionAllow the distribution of common motifs to be explored more easily
Gives insights into which combinations of motifs generate stable protein architecturesAllows newly determined structures to be easily examined for recognizable folds (CA Orengo et al.,1997)
EN E 3 D4
1. Boundary assignment by inheriting from other chain 2. Predicts Hypothetical proteins 3. Database of validated multiple structural alignments 4. Scores used for identifying matches
S S A P
CA Orengo et al.,1997 CATH a hierarchic classification of protein domain structures J.E.Bray et al.,2000 The CATH Dictionary of Homologous Superfamilies(DHS): a consensus approach for identifying distant structural homologues CA Orengo et al.,1999 The CATH Database provides insights into protein structure/function relationships Lesley H. Greene et al.,2007 The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution Frances Pearl et al.,2005 The CATH Domain Structure Database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis Daniel W.A. Buchan et al., 2002 Gene3D: Structural Assignment for Whole Genes and Genomes Using the CATH Domain Structure Database Corin Yeats et al.,2006 Gene3D: modelling protein structure, function and evolution