LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing...
-
Upload
hailie-ryder -
Category
Documents
-
view
223 -
download
7
Transcript of LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing...
![Page 1: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/1.jpg)
LSM2104/CZ2251 LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing
Protein Structure and Protein Structure and Visualization (2)Visualization (2)
Chen Yu Zong [email protected]
6874-6877
![Page 2: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/2.jpg)
Lecture 10
Protein structure databases; visualization; and classifications
LSM2104/CZ2251 LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing
1. Introduction to Protein Data Bank (PDB)2. Free graphic software for 3D structure
visualization3. Hierarchical classification of protein domains:
SCOP & CATH & DALI
![Page 3: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/3.jpg)
1. Protein Data Bank (PDB)
• Protein Data Bank: maintained by the Research Collaboratory for Structural Bioinformatics (RCSB)
• http://www.rcsb.org/pdb/– 30060 Structures 15-Mar-2005– 27570 Structures 05-Oct-2004– 23997 Structures 20-Jan-2004
• Also contains structures of other bio-macromolecules: DNA, carbohydrates and protein-DNA complexes.
![Page 4: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/4.jpg)
1. Protein Data Bank (PDB)
![Page 5: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/5.jpg)
1. Protein Data Bank (PDB)
![Page 6: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/6.jpg)
PDB Content Growth
![Page 7: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/7.jpg)
PDB Presentation of Selected Molecules
![Page 8: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/8.jpg)
Deficiencies in our structural knowledge
Only deposited data is actually available Many structures not deposited in PDB, why?
Structures available for soluble proteins A few dozen entries for membrane protein domains, why?
X-ray data only for those proteins that crystallize well or diffract properly.
Why?
NMR structures are usually for small proteinsHow to survey the size of NMR-determined proteins?
Estimated that structural data available for only 10-15% of all known proteins.
![Page 9: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/9.jpg)
Alternative Source of Structure: NCBI
![Page 10: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/10.jpg)
Protein Structure in PDB
• Text files
• Each entry is specified by a unique 4-letter code (PDB code): say 1HUY for a variant of GFP; 1BGK for a 37-residue toxin protein isolated from sea anemone
• 1HUY and 1BGK– Header information– Atomic coordinates in Å (1 Ångstrom = 1.0e-10 m)
![Page 11: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/11.jpg)
Header Details
• Identifies the molecule, modifications, date of release
• Host organism, keywords, method of study
• Authors, reference, resolution for X-ray structure– Smaller the number, better the structure.
• Sequence, reference
![Page 12: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/12.jpg)
![Page 13: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/13.jpg)
The Atomic Coordinates
• XYZ Coordinates for each atom (starting with ATOM, only heavy atom for X-ray structure) from the first residue to the last
• XYZ coordinates for any ligands (starting with HETATM) complexed to the bio-macromolecule
• O atoms of water molecules (starting with HETATM, normally at the last part of the xyz coordinate section)
• Usually, for X-ray structure, resolution is not high enough to locate H atoms: hence only heavy atoms are shown in the PDB file.
• For NMR structure, all atoms (including hydrogen atoms) are specified in the PDB file.
![Page 14: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/14.jpg)
X-ray structure 1HUY
![Page 15: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/15.jpg)
NMR structure 1BGK
![Page 16: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/16.jpg)
2. Free Software for Protein Structure Visualization
• RASMOL: available for all platforms http://www.openrasmol.org
• Swiss PDB Viewer: from Swiss-Prot http://www.expasy.ch/spdbv/
• Chemscape Chime Plug-in: for PC and Mac http://www.mdl.com/downloads/downloadable/index.jsp
• YASARA: http://www.yasara.org/
• MOLMOL: MOLecule analysis and MOLecule display
http://129.132.45.141/wuthrich/software/molmol/index.html
![Page 17: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/17.jpg)
1HUY An Improved Yellow Variant Of Green Fluorescent Protein
From Tsien’s group J.Biol.Chem. 276 29188 (2001)
Ribbon representation by RasMol
![Page 18: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/18.jpg)
Ribbon representation by YASARA
![Page 19: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/19.jpg)
Ribbon representation by YASARA
![Page 20: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/20.jpg)
Ribbon representation by MOLMOL
![Page 21: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/21.jpg)
![Page 22: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/22.jpg)
An ensemble of 15 structures (NMR, toxin Bgk);Proton atoms also included
15 backbone structures of the sea anemone toxin Bgk
![Page 23: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/23.jpg)
15 all-atom structures of the sea anemone toxin Bgk
Line representation
![Page 24: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/24.jpg)
Ribbon representation
![Page 25: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/25.jpg)
Space-filling representation
![Page 26: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/26.jpg)
• SCOP: Structural Classification of Proteins University of Cambridge, UK
http://scop.mrc-lmb.cam.ac.uk/scop/Hyperlink in Singapore: http://scop.bic.nus.edu.sg/
• CATH: Class—Architecture—Topology--Homologous SuperfamilySequence family
University College London, UKhttp://www.biochem.ucl.ac.uk/bsm/cath/
3. Hierarchical classification of protein domains: SCOP & CATH
![Page 27: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/27.jpg)
Proteins adopt a limited number of topologiesMore than 50,000 sequences fold into ~1000 unique folds.
Homologous sequences have similar structures Usually, when sequence identity>30%, proteins adopt the same fold. Even in the absence of sequence homology, some folds are preferred by vastly different sequences.
The “active site” is highly conservedA subset of functionally critical residues are found to be conserved even the folds are varied.
Basis for protein classification
![Page 28: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/28.jpg)
How many unique folds do organisms use to express functions?
Sequence space> 50,000
Conformationalspace
~1,000 ???????
Many sequences to form one unique fold
![Page 29: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/29.jpg)
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
1986
1988
1990
1992
1994
1996
1998
2000
No
of
Seq
uen
ces
0
2000
4000
6000
8000
10000
12000
No
. o
f S
tru
ctu
res
and
Fo
ldsSequences
Structures
Folds
Growth of Protein Databases
![Page 30: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/30.jpg)
• University of Cambridge, UK: http://scop.mrc-lmb.cam.ac.uk/scop/– mirrored at Singapore: http://scop.bic.nus.edu.sg/– contains PDB entries grouped hierachically by:
• Structural class, • Fold,• Superfamily,• Family,• Individual member
(domain-based)
Structural Classification of Proteins SCOP
![Page 31: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/31.jpg)
• Family
Structural Classification of Proteins SCOP
• Proteins are clustered together into families on the basis of one of two criteria that imply their having a common evolutionary origin:
• All proteins that have residue identities of 30% and greater;
• Proteins with lower sequence identities but whose functions and structures are very similar
Example, globins with sequence identities of 15%.
![Page 32: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/32.jpg)
• Superfamily
Structural Classification of Proteins SCOP
• Families, whose proteins have low sequence identities but whose structures and, in many cases, functional features suggest that a common evolutionary origin is probable, are placed together in superfamilies
• Example, actin, the ATPase domain of the heat-shock protein and hexokinase
![Page 33: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/33.jpg)
• Fold
Structural Classification of Proteins SCOP
• Superfamilies and families are defined as having a common fold if their proteins have same major secondary structures in same arrangement with the same topological connections.
![Page 34: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/34.jpg)
Structural Classification of Proteins SCOP
• Class– For convenience of users, the different folds have been grouped into
classes. Most of the folds are assigned to one of a few structural classes on the basis of the secondary structures of which they composed
![Page 35: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/35.jpg)
![Page 36: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/36.jpg)
SCOP Class: All- topologies
ferritin cytochrome b-562
![Page 37: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/37.jpg)
SCOP Class: All- topologies
![Page 38: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/38.jpg)
SCOP Class: All- topologies
![Page 39: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/39.jpg)
SCOP Class: All- topologies
sandwiches -barrels
![Page 40: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/40.jpg)
SCOP Class: All- topologies
![Page 41: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/41.jpg)
SCOP Class: Topologies
horseshoe
![Page 42: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/42.jpg)
barrels
SCOP Class: Topologies
![Page 43: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/43.jpg)
SCOP Class: Topologies
![Page 44: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/44.jpg)
SCOP Class: Alpha+Beta Topologies
![Page 45: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/45.jpg)
SCOP Class: Alpha+Beta Topologies
![Page 46: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/46.jpg)
![Page 47: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/47.jpg)
Ubiquitin
1ubi
![Page 48: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/48.jpg)
Ubiquitin
1ubi
![Page 49: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/49.jpg)
Ubiquitin
1ubi
![Page 50: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/50.jpg)
Ubiquitin
1ubi
![Page 51: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/51.jpg)
CATH database
Orengo et al. CATH-a hierarchical classification of protein domain structures (1997) Structure 5, 1093-1108
Sequence identity >30% the same overall foldSequence identity >70% the same overall fold
+ the similar function
CATH: Class—Architecture—Topology--Homologous Superfamily--Sequence family
http://www.biochem.ucl.ac.uk/bsm/cath/
![Page 52: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/52.jpg)
CATH databaseClassDerived from secondary structure content, is assigned for more than 90% of protein structures automatically.
ArchitectureDescribes the gross orientation of secondary structures, independent of connectivities, is currently assigned manually.
Topology Clusters structures according to their topological connections and numbers of secondary structures.
Homologous superfamilies Cluster proteins with highly similar structures and functions. The assignments of structures to topology families and homologous superfamilies are made by sequence and structure comparisons.
Sequence familiesStructures within each H-level are further clustered on sequence identity. Domains clustered in the same sequence families have sequence identities >35%.
Non-identical sequence domains, Identical sequence domains, Domains
![Page 53: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/53.jpg)
CATH database
![Page 54: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/54.jpg)
![Page 55: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/55.jpg)
The class (C), architecture (A) and topology (T) levels in the CATH database
Class
Architecture
Topology
![Page 56: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/56.jpg)
The class (C), architecture (A) and topology (T) levels in the CATH database
Homologous Superfamily
![Page 57: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/57.jpg)
CATH – architecturesCATH – architectures
![Page 58: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/58.jpg)
CATH – architectures (cont.)CATH – architectures (cont.)
![Page 59: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/59.jpg)
The protein structure universe in the PDB (1997) by a CATH wheel
The distribution of non-homologous structures (i.e. a single representative from each homologous superfamily at the H-level in CATH) amongst the different classes (C), architectures (A) and fold families (T) in the CATH database.
![Page 60: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/60.jpg)
SCOP / CATH -> DALI
SCOP & CATHSCOP & CATH
• Hierarchical and based on abstractions• Include some manual aspects and are curated by
experts in the field of protein structure
Presentation of results of computer classification, where the methods that underlie the classification remain
internal
Structure comparison
Dali
![Page 61: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/61.jpg)
DALI
anti parallel barrel
meander
More information about DALI
Touring protein fold space with Dali/FSSP: Liisa Holm and Chris Sander
Comparing protein structures in 3D
![Page 62: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/62.jpg)
Compare 3D protein structures by Dali http://www.ebi.ac.uk/dali/
![Page 63: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/63.jpg)
• The FSSP database (Fold classification based on Structure-Structure alignment of Proteins) is based on exhaustive all-against-all 3D structure comparison of protein structures currently in the Protein Data Bank (PDB).
• The classification and alignments are automatically maintained and continuously updated using the Dali search engine.
Dali Domain Dictionary
• Structural domains are delineated automatically using the criteria of recurrence and compactness. Each domain is assigned a Domain Classification number DC_l_m_n_p , where:
l - fold space attractor region
m - globular folding topology
n - functional family
p - sequence family
Compare 3D protein structures by Dali http://www.ebi.ac.uk/dali/
![Page 64: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/64.jpg)
Functional families
• Evolutionary relationships from strong structural similarities which are accompanied by functional or sequence similarities.
• Functional families are branches of the fold dendrogram where all pairs have a high average neural network prediction for being homologous.
Sequence families
• Representative subset of the Protein Data Bank extracted using a 25 % sequence identity threshold.
• All-against-all structure comparison was carried out within the set of representatives.
• Homologues are only shown aligned to their representative.
Compare 3D protein structures by Dali http://www.ebi.ac.uk/dali/
![Page 65: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/65.jpg)
Fold types
• Fold types are defined as clusters of structural neighbors in fold space with average pairwise Z-scores (by Dali) above 2.
Structural neighbours of 1urnA (top left). 1mli (bottom right) has the same topology even though there are shifts in the relative orientation of secondary structure elements
Compare 3D protein structures by Dali http://www.ebi.ac.uk/dali/
![Page 66: LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (2) Chen Yu Zong.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649c9d5503460f9495c2cd/html5/thumbnails/66.jpg)
Summary
Protein structure database (PDB)
Protein structure visualization software
Structural classification, databases and servers