Domain database The CATH domain database and associated resources - DHS, Gene3D The CATH domain...

download Domain database The CATH domain database and associated resources - DHS, Gene3D The CATH domain database and associated resources - DHS, Gene3D How do.

If you can't read please download the document

Transcript of Domain database The CATH domain database and associated resources - DHS, Gene3D The CATH domain...

  • Slide 1

domain database The CATH domain database and associated resources - DHS, Gene3D The CATH domain database and associated resources - DHS, Gene3D How do we determine domain boundaries? How do we determine domain boundaries? How do we you identify fold groups and evolutionary superfamilies? How do we you identify fold groups and evolutionary superfamilies? What is the distribution of the CATH domain families in the PDB and in the genomes? What is the distribution of the CATH domain families in the PDB and in the genomes? C A T H lass rchitecture opology or Fold Group omologous Superfamily Orengo & Thornton 1994 CATH Slide 2 ~20,000 chains from Protein Databank (PDB) ~50,000 domains in CATH structure database ~40% of the entries in CATH are multidomain Multidomain proteins Slide 3 Domains are important evolutionary units analysis by Teichmann and others suggests that ~60- 80% of genes in genomes may be multidomain Slide 4 Carboxypeptidase G2 (1cg2A) Carboxypeptidase A (2ctc) ~30% of multidomains in CATH are discontinuous Slide 5 Algorithms for Recognising Domain Boundaries DETECTIVE Swindells 1995 DETECTIVE Swindells 1995 each domain should have a recognisable hydrophobic core DOMAK Siddiqui & Barton, 1995 DOMAK Siddiqui & Barton, 1995 residues comprising a domain make more internal contacts than external ones PUU Holm & Sander, 1994 PUU Holm & Sander, 1994 parser for protein folding units: maximal interaction within domains and minimal interaction between domains Consensus is sought between the three methods on average this occurs about 20% of the time Slide 6 74% 29%21% 4% 11% Close homologues Twilight zone Midnight zone Homologues/analogues Slide 7 Algorithms for Recognising Homologues Sequence Based methods Sequence Based methods close homologues BLAST (Altschul et al.) - SSEARCH (Smith & Waterman) - SSEARCH (Smith & Waterman) remote homologues SAM-T99 (Karplus et al) Structure Based Methods Structure Based Methods close & remote homologues - CATHEDRAL (Harrison, Thornton Orengo) - SSAP (Taylor & Orengo) - SSAP (Taylor & Orengo) - CORA (Orengo) - CORA (Orengo) Slide 8 74% 29%21% 4% 11% Close homologues Twilight zone Midnight zone Homologues/analogues SSEARCH HMMs, SSAP CATHEDRAL, SSAP Slide 9 Hidden Markov Models (HMMs) query sequence Non redundant GenBank database hits these methods can currently identify ~70% of remote homologues (3 times more powerful than BLAST) SAM-T99 Karplus Group SAMOSA Orengo Group Slide 10 59.2 20.7 7.6 8.6 1.9 2.0 Percentage of PDB structures classified in CATH by different methods over the last 2 years Near-identical SSEARCH Close homologues (>30%) SSEARCH remote homologues ( 22.0 8.0 22.0 28.4 7.7 11.8 Percentage of structural genomics PDB structures classified in CATH by different methods over the last 2 years near-identical SSEARCH close homologues (>30%) SSEARCH remote homologues ( CATH http://www.biochem.ucl.ac.uk/bsm/cath CATH sequence families (>=35% identity) in each superfamily Slide 53 CATH http://www.biochem.ucl.ac.uk/bsm/cath CATH classification information for individual domains Slide 54 CATH http://www.biochem.ucl.ac.uk/bsm/cath CATH structural relatives listed for each domain Slide 55 CATH server http://www.biochem.ucl.ac.uk/cgi-bin/cath/CathServer.pl Slide 56 CATH server http://www.biochem.ucl.ac.uk/cgi-bin/cath/CathServer.pl Slide 57 CATH server structural matches and statistics listed for query domain http://www.biochem.ucl.ac.uk/cgi-bin/cath/CathServer.pl Slide 58 Library of HMMs built for representative sequences from each CATH domain superfamily Library of HMMs built for representative sequences from each CATH domain superfamily Expanding CATH with sequence relatives from genomes Scan against CATH HMM library protein sequences from genomes assign domains to CATH superfamilies Slide 59 H S1 S2 S3 H S1 S2 S3 S4 S5 Homologous Superfamily sequences added from GenBank, genomes, SWPT- TrEMBL CATH-HMMs Sequence family Expanding CATH ~1400 Domain Structure Superfamilies ~50,000 sequences ~4,000 sequence families ~600,000 sequences ~24,000 sequence families Up to 70% of sequences in completed genomes can be assigned to CATH domain superfamilies Slide 60 Rossmann Fold Jelly Roll Alpha/Beta Plaits TIM Barrel Immunoglobulin-like Arc repressor-like OB Fold Four helix bundle SH3-type barrel Alpha horseshoe fold Gene3D Rossmann Alpha-beta plait TIM barrel Jelly Roll Arc repressor-like Up-down SH3-like OB fold Immunoglobulin Alpha horseshoe Slide 61 Gene3D http://www.biochem.ucl.ac.uk/bsm/Gene3D CATH domain structure annotations for complete genomes Slide 62 Gene3D http://www.biochem.ucl.ac.uk/bsm/Gene3D Individual genome statistics Slide 63 Gene3D http://www.biochem.ucl.ac.uk/bsm/Gene3D Assignment of sequences to Gene3D protein families Slide 64 Gene3D http://www.biochem.ucl.ac.uk/bsm/Gene3D Functional annotations for individual sequences Slide 65 Gene3D http://www.biochem.ucl.ac.uk/bsm/Gene3D Functional annotations for individual sequences Slide 66 Gene3D http://www.biochem.ucl.ac.uk/bsm/Gene3D Domain annotations for individual sequences Slide 67 Gene3D http://www.biochem.ucl.ac.uk/bsm/Gene3D Domain annotations for individual sequences Slide 68 Summary CATH currently identifies ~1500 superfamilies in the ~50,000 structural domains from the PDB These domains families contain over 600,000 domain sequences from the genomes and sequence databases Up to 70% of genome sequences can be assigned to domain structure families using HMMs and threading Slide 69 Frances Pearl Ian Sillitoe Oliver Redfern Mark Dibley Tony Lewis Chris Bennett Andrew Harrison Gabrielle Reeves Alastair Grant David Lee Acknowledgements Janet Thornton Medical Research Council, Wellcome Trust, NIH Biotechnology and Biological Sciences Research Council http://www.biochem.ucl.ac.uk/bsm/cath