Cath Database

Presented ByShri Vaishnavi & Pinky

CLASS: Secondary structure

packing within the protein structure.

Alpha-helices, Beta-Sheets and Alpha-Beta.

Includes both alpha/beta and alpha+beta.

Architecture Distinguishes structures

within the same class, but different architectures.

Groupings can sometimes be rather broad as they describe general features of protein-fold shape.

Ex: Tim Barrel, the number of layers in an α-β sandwich(Orengo C.A et al., 1997)

Topology Arrangement and

connectivity of secondary structure elements are same in number.

Within the topology level, structures are same but may differ in function.

Ex. Globin or immunoglobin fold.

Homology Structures are grouped by

their high structural similarity and similar functions.

They may have evolved from a common ancestor.

α Non-bundle globin-like folds—the erythrocruorins, colicins, phycocyanins and domain 1 of diptheria toxin — all have the same CAT number (1.10.340), but are differentiated by their H numbers 10, 20, 30 and 40, respectively

Sequence family Have sequence identities >35%

Presumed to have extremely similar structures and functions— they may be slightly different examples of the same protein from different species belonging to the same sequence superfamily.

SOLID.

Input Structure to Server(http://www.biochem.ucl.ac.uk/cgi-bin/cath/CathServer

.pl)

Generate derived data from the PDB coordinate files

Identify more remote homologues

If match is found, superfamilies are structurally compared with the query structure using the SSAP structure alignment program

Set Threshold E-value from validated structural homologs

Any query structure unmatched is scanned against a library of representative structures from each close sequence family in

CATH

The top 10 matches are displayed

Database of validated multiple structural alignment annotated with consensus functional information for evolutionary protein families.

A powerful resource to validate, examine and visualize key structural and functional features of each homologous superfamily.

Also provides a tool for examining sequence-structure relationships for proteins within each fold group

Generation of structure comparison data using SSAP- Comparisons provide a complete data set for analyzing analogues ,

homologous and checking for incorrect classifications

Automatic validation of structural relatives(DHS-VALID)- DHS-VALID program is used to check automatically all the pairwise

sequence and structure comparison data generated for each fold group and homologues superfamily in CATH.

Generation of multiple structural alignment using CORA- Conserved Residue Attributes Uses the pairwise structural comparison data from SSAP to determine

the initial set of proteins to be aligned Identifies conserved characteristics and expresses as a 3D structural

profile Profiles encapsulate the ‘core’

Annotation of structural alignments

It is focused on providing structural annotation for protein sequences without structural representatives

The protein sequences have also been clustered into whole chain families so as to aid functional prediction.

The structural annotation is generated using HMM models based on the CATH domain families

Applications: Annotate Hypothetical proteins and gene (Corin Yeats et

al.,2006)

Examine the functions of homologous superfamilies that are multiply expanded within genomes or sets of genomes.

CATH database was used as a guide to select proteins from a wide variety of protein families (Jonathan G. Lees et al.,2006)

To capture evolutionary divergence (Lesley H. Greene et al.,2007)

For identifying remote homologs (J.E.Bray et al.,2000)

The organization of proteins by global structural similarity helps improve prediction algorithms based on fold recognition

Allow the distribution of common motifs to be explored more easily

Gives insights into which combinations of motifs generate stable protein architectures

Allows newly determined structures to be easily examined for recognizable folds (CA Orengo et al.,1997)

1

2

3

4

1. Boundary assignment by inheriting from other chain

2. Predicts Hypothetical proteins3. Database of validated multiple

structural alignments4. Scores used for identifying matches

C H O P C L O S EG

NE3D H S

SAP

CA Orengo et al.,1997 CATH — a hierarchic classification of protein domain structures

J.E.Bray et al.,2000 The CATH Dictionary of Homologous Superfamilies(DHS): a consensus approach for identifying distant structural homologues

CA Orengo et al.,1999 The CATH Database provides insights into protein structure/function relationships

Lesley H. Greene et al.,2007 The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution

Frances Pearl et al.,2005 The CATH Domain Structure Database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis

Daniel W.A. Buchan et al., 2002 Gene3D: Structural Assignment for Whole Genes and Genomes Using the CATH Domain Structure Database

Corin Yeats et al.,2006 Gene3D: modelling protein structure, function and evolution

Cath Database

Documents

Transcript of Cath Database