CS177 Lecture 6 Computational Aspects of Protein Structure II Tom Madej 10.17.05.
-
Upload
katrina-robinson -
Category
Documents
-
view
216 -
download
0
Transcript of CS177 Lecture 6 Computational Aspects of Protein Structure II Tom Madej 10.17.05.
CS177 Lecture 6
Computational Aspects of Protein Structure II
Tom Madej 10.17.05
Research news (Nature 10.21.04)
• Another milestone for the Human Genome Project.– Fills in approx. 99% of the “gene rich” portion of the genome
(10% more than the 2001 drafts).– Only 341 remaining gaps, formerly hundreds of thousands.– New estimate of the number of genes: 20,000-25,000.
• Megabase deletions result in viable mice!– Researchers deleted 1.5 Mb and 0.8 Mb portions of the mouse
genome, non-coding regions, and the mice seem to be fine!
Nature Oct. 21, 2004, 931-945
Overview of lecture• Protein structure
– General principles– Structure hierarchy– Supersecondary structures– Superfolds and examples: TIM barrels, OB fold
• Protein structure comparison algorithms– VAST (Vector Alignment Search Tool)– CE (Combinatorial Extension)
• Protein fold classification databases– SCOP (Structural Classification of Proteins)– CATH (Class, Architecture, Topology, Homologous superfamily)
General principles
• Most protein structures are composed of two types of regular structural elements interconnected by less well-structured regions.
• Regular secondary structure elements (SSEs): α-helices and β-strands.
• Irregular regions: loops or coil.
• A pair of SSEs positioned next to each other in space may be parallel or anti-parallel.
General principles (cont.)
• Helices are stabilized by “internal” hydrogen bonds.
• Hydrogen bonds will form between an adjacent pair of strands.
• Strands will form larger structures such as β-sheets or β-barrels.
• Due to the residue side chains, there are favored packing angles between helices/helices, helices/sheets, and sheets/sheets.
Examples of protein architecture
β-sheet with all pairsof strands parallel
β-sheet with all pairsof strands anti-parallel
Architecture refersto the arrangementand orientation ofSSEs, but not to theconnectivity.
Examples of protein topology
Topology refers tothe manner in whichthe SSEs areconnected.
Two β-sheets (allparallel) with differenttopologies.
Exercise
• Take a look at 1r7sA in Cn3D.
• Draw a topology diagram showing the way the strands are connected.
Angles between SSEs in contact
• The data on the next 3 slides gives the cosine of angles between a pair of SSE vectors.
• The SSE’s were required to be “in contact”, i.e. within 10 Å of each other.
• Note: The SSEs are not necessarily consecutive in the sequence!
General packing of SSEs…
• SSEs tend to be oriented either parallel or anti-parallel to each other.
• For strand-strand packing there is a stronger tendency to be parallel or anti-parallel, than for helix-helix.
• For helix-strand packing there is a strong tendency to be anti-parallel.
• This applies to SSEs that are relatively close to each other.
Examples of structures formed by β-strands
• Triosphosphate isomerase 7timA
• Retinol binding protein 1rbp
• Porin 1oh2P
Higher level organization
• A single protein may consist of multiple domains. Examples: 1liy A, 1bgc A. The domains may or may not perform different functions.
• Proteins may form higher-level assemblies. Useful for complicated biochemical processes that require several steps, e.g. processing/synthesis of a molecule. Example: 1l1o chains A, B, C.
Example: Replication Protein A
E. Bochkareva et al. The EMBO Journal (2002) 21 1855-1863
RPA binds to ssDNA, is involved in recombination, replication, and repair.It is a heterotrimer, consisting of three subunit proteins that bind together.See structure 1l1o.
Supersecondary structures
• β-hairpin
• α-hairpin
• βαβ-unit
• β4 Greek key
• βα Greek key
Supersecondary structure: simple units
G.M. Salem et al. J. Mol. Biol. (1999) 287 969-981
Supersecondary structure: Greek key motifs
G.M. Salem et al. J. Mol. Biol. (1999) 287 969-981
Examples of β4 Greek key motif
• 1hk0 Human Gamma-D Crystallin; residues 32 thru 64 in domain 1.
• OB fold (we’ll see this fold later).
Examples of βα Greek key motif
• 1bgw Topoisomerase; residues 487 thru 540 in domain 5.
• 1ris Ribosomal protein S6.
Protein folds
• There is a continuum of similarity!
• Fold definition: two folds are similar if they have a similar arrangement of SSEs (architecture) and connectivity (topology). Sometimes a few SSEs may be missing.
• Fold classification: To get an idea of the variety of different folds, one must adjust for sequence redundancy and also try to correctly assign homologs that have low sequence identity (e.g. below 25%).
Superfolds (Orengo, Jones, Thornton)
• Distribution of fold types is highly non-uniform.
• There are about 10 types of folds, the superfolds, to which about 30% of the other folds are similar. There are many examples of “isolated” fold types.
• Superfolds are characterized by a wide range of sequence diversity and spanning a range of non-similar functions.
• It is a research question as to the evolutionary relationships of the superfolds, i.e. do they arise by divergent or convergent evolution?
Superfolds and examples
• Globin 1hlm sea cucumber hemoglobin; 1cpcA phycocyanin; 1colA colicin
• α-up-down 2hmqA hemerythrin; 256bA cytochrome B562; 1lpe apolipoprotein E3
• Trefoil 1i1b interleukin-1β; 1aaiB ricin; 1tie erythrina trypsin inhibitor
• TIM barrel 1timA triosephosphate isomerase; 1ald aldolase; 5rubA rubisco
• OB fold 1quqA replication protein A 32kDa subunit; 1mjc major cold-shock protein; 1bcpD pertussis toxin S5 subunit
• α/β doubly-wound 5p21 Ras p21; 4fxn flavodoxin; 3chy CheY
• Immunoglobulin 2rhe Bence-Jones protein; 2cd4 CD4; 1ten tenascin
• UB αβ roll 1ubq ubiquitin; 1fxiA ferredoxin; 1pgx protein G
• Jelly roll 2stv tobacco necrosis virus; 1tnfA tumor necrosis factor; 2ltnA pea lectin
• Plaitfold (Split αβ sandwich) 1aps acylphosphatase; 1fxd ferredoxin; 2hpr histidine-containing phosphocarrier
TIM barrels
• Classified into 21 families in the CATH database.
• Mostly enzymes, but participate in a diverse collection of different biochemical reactions.
• There are intriguing common features across the families, e.g. the active site is always located at the C-terminal end of the barrel.
N. Nagano et al. J. Mol. Biol. (2002) 321 741-785
TIM barrel evolutionary relationships(Nagano, Orengo, Thornton)
• Sequence analysis with advanced programs such as PSI-BLAST and IMPALA have identified further relationships among the families.
• Further interesting similarities observed from careful comparison of structures, e.g. a phosphate binding site commonly formed by loops 7, 8 and a small helix.
• In summary, there is evidence for evolutionary relationships between 17 of the 21 families.
OB (oligonucleotide/oligosaccharide-binding) fold
• 5-stranded β-barrel with Greek key topology.
• All OB folds have the same binding face that is involved in their biochemistry.
V. Arcus Curr. Opinion Struct. Biol. (2002) 12 794-801
OB evolutionary relationships
• SCOP lists 9 superfamilies.
• Bacterial enterotoxin superfamily consists of two families, almost certainly evolutionarily related.
• Nucleic acid-binding superfamily has 11 families, if evolutionarily related the ancestral protein would come from the LUCA (Last Universal Common Ancestor).
• Evidence for common ancestry of all OB folds is probably weaker than for TIM barrels.
Protein structure comparison
• How to compare 3D protein structures?
• Analogous computational considerations to sequence comparison, e.g. accuracy, efficiency for database searches, statistical significance of results, etc.
• Additional complication: working with atomic coordinates in 3D space!
Some protein structure comparison methods
• VAST (Vector Alignment Search Tool, NCBI)
• CE (Combinatorial Extension, RCSB/PDB)
• DALI (EBI)
VAST outline
1. Parse protein structures into SSEs (helices and strands).
2. Fit vectors to SSEs.
3. To compare a pair of proteins attempt to superpose as many vectors as possible, subject to constraints.
4. Evaluate the vector alignment for statistical significance( computer an E-value).
5. If the vector alignment is significant then proceed to a more detailed residue-to-residue alignment (“refined alignment”).
3chy 1ipf A
Two protein with vectors assigned to SSEs
Vector superposition Refined alignment
VAST comparison of 3chy and 1ipfA
SCOP (Structural Classification of Proteins)
• http://scop.mrc-lmb.cam.ac.uk/scop/
• Levels of the SCOP hierarchy:– Family: clear evolutionary relationship– Superfamily: probable common evolutionary origin– Fold: major structural similarity
CATH (Class, Architecture, Topology, Homologous superfamily)
• http://www.biochem.ucl.ac.uk/bsm/cath/