Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them
description
Transcript of Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them
![Page 1: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/1.jpg)
Bioinformatics of proteins:Sequence, structure and the ‘symbiosis’ between
them
Maya SchushanThe Ben-Tal lab
![Page 2: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/2.jpg)
Bioinformatics of proteins:
Sequence, structure and
the ‘symbiosis’
between them
![Page 3: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/3.jpg)
OUTLINE
• Sequence:Databases, domains, motifs & annotations
• Structure:Secondary structure, structure databases, visualization and identification of functional site
![Page 4: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/4.jpg)
UniProt• UniProt is a collaboration between the
European Bioinformatics Institute (EBI), the Swiss Institute of Bioinformatics (SIB) and the Protein Information Resource (PIR).
• In 2002, the three institutes decided to pool their resources and expertise and formed the UniProt Consortium.
Sequences, domains, motifs & annotations
![Page 5: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/5.jpg)
Sequences, domains, motifs & annotations
UniProt• The world's most comprehensive catalog of
information on proteins
• Sequence, function & more…
• Comprised mainly of the databases:
– SwissProt – 366226 last year, 412525 protein entries now –high quality annotation, non-redundant & cross-referenced to many other databases.
– TrEMBL - 5708298 last year, 7341751 protein entries now – computer translation of the genetic information from the EMBL Nucleotide Sequence Database many proteins are poorly annotated since only automatic annotation is generated
![Page 6: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/6.jpg)
UniProt
• Annotation description includes:– Function(s) of the protein; – Posttranslational modification(s) such as carbohydrates,
phosphorylation, acetylation and GPI-anchor; – Domains and sites, for example, calcium-binding regions, ATP-
binding sites, zinc fingers, homeoboxes, – Secondary structure, e.g. alpha helix, beta sheet; – Quaternary structure, i.g. homodimer, heterotrimer, etc.; – Similarities to other proteins; – Disease(s) associated with any number of deficiencies in the
protein; – Sequence conflicts, variants, etc
Sequences, domains, motifs & annotations
![Page 7: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/7.jpg)
UniProt
• Connected to many other databases(e.g. Pfam , Prosite, EC, GO, PdbSum, PDB (to be discussed…))
• Each sequence has a unique 6 letter accession
• Entries in SwissProt also have IDs, which usually make sense(e.g. CADH1_HUMAN for a cadherin of humans)
• Download sequence in FASTA format
Sequences, domains, motifs & annotations
![Page 8: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/8.jpg)
UniProt: http://www.uniprot.org/
Type accession: P05102 Or ID:
MTH1 _HAEPH
Sequences, domains, motifs & annotations
![Page 9: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/9.jpg)
Sequences, domains, motifs & annotations
![Page 10: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/10.jpg)
General data: name, origin, EC (enzymatic reaction)…
Sequences, domains, motifs & annotations
![Page 11: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/11.jpg)
Scroll down to find the sequence & download the FASTA
Functional data, including the GO annotations
Sequences, domains, motifs & annotations
![Page 12: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/12.jpg)
Known sites, predicted/known secondary structures,Natural variation or mutagenesis
Sequences, domains, motifs & annotations
![Page 13: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/13.jpg)
The protein’s sequence in FASTA format
Download
Send to BLAST
Sequences, domains, motifs & annotations
![Page 14: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/14.jpg)
References for all info in the page- important to take a look…
Sequences, domains, motifs & annotations
![Page 15: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/15.jpg)
Connections to other databases
Other sequence database, e.g. genebank
Related structures in the PDB (if available)
Model-structure in the ModBase database-
automatically derived!
All sorts of domain\motifs databases -
The family related to the entry
Sequences, domains, motifs & annotations
![Page 16: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/16.jpg)
Sequences, domains, motifs & annotations
Pfam- domain database
• Proteins are generally composed of one or more functional regions, commonly termed domains.
• Different combinations of domains give rise to the diverse range of proteins found in nature.
• The identification of domains that occur within proteins can therefore provide insights into their function.
![Page 17: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/17.jpg)
Sequences, domains, motifs & annotations
Pfam- domain database• The Pfam database is a large collection of protein
domainfamilies.
• Each family is represented by multiple sequence alignmentsand hidden Markov models (HMMs).
• Pfam entries are classified in one of four ways: Family: A collection of related proteinsDomain: A structural unit which can be found in multiple protein contextsRepeat: A short unit which is unstable in isolation but
forms a stable structure when multiple copies are presentMotifs: A short unit found outside globular domains
![Page 18: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/18.jpg)
Sequences, domains, motifs & annotations
Pfam- domain databaseThere are two components to Pfam:• Pfam-A entries are high quality, manually curated
families. these Pfam-A entries cover a large proportion of the sequences in the sequence database.
• Pfam-B- automatically generated entries. Although of lower quality, Pfam-B families can be useful for identifying functionally conserved regions when no Pfam-A entries are found.
• Pfam also generates higher-level groupings of related families, known as clans. A clan is a collection of Pfam-A entries which are related by similarity of sequence, structure or profile-HMM.
![Page 19: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/19.jpg)
Sequences, domains, motifs & annotations
Pfam- domain databaseAllows http://pfam.sanger.ac.uk/ :
•Analyze your protein sequence for Pfam matches
•View Pfam family annotation and alignments
•See groups of related families
•Look at the domain organization of a protein sequence
•Find the domains on a PDB structure
•Query Pfam by keyword
![Page 20: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/20.jpg)
Sequences, domains, motifs & annotations
Pfam- domain databaseSearching for a certain protein accession
![Page 21: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/21.jpg)
Sequences, domains, motifs & annotations
Pfam- domain databaseSearching for a certain protein accession
![Page 22: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/22.jpg)
Sequences, domains, motifs & annotations
Pfam- domain database
![Page 23: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/23.jpg)
Sequences, domains, motifs & annotations
![Page 24: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/24.jpg)
Sequences, domains, motifs & annotations
Classifying protein function
• Each protein performs one (or more…) specific functions. This can be, e.g., catalyzation of a specific enzymatic reaction, transport of an ion, interaction with a DNA molecule etc…
• In order to easily address the specific functions, attempts have been made to numerate and classify the various functions performed by proteins.
![Page 25: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/25.jpg)
Sequences, domains, motifs & annotations
Classifying protein function
Example-
some of the diversefunctions exhibited byMembrane proteins.
![Page 26: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/26.jpg)
Sequences, domains, motifs & annotations
Enzyme Commission number (EC number)
• A numerical classification scheme for enzymes, based on the chemical reactions they catalyze
• EC numbers do not specify enzymes, but enzyme-catalyzed reactions. If different enzymes (for instance from different organisms) catalyze the same reaction, then they receive the same EC number.
• By contrast, the UniProt database identifiers uniquely specify a protein by its amino acid sequence.
![Page 27: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/27.jpg)
Sequences, domains, motifs & annotations
Enzyme Commission number (EC number)
• Every enzyme code consists of the letters "EC" followed by four numbers separated by periods. Those numbers represent a progressively finer classification of the enzyme.
• For example, the tripeptide aminopeptidases have the code "EC 3.4.11.4":• EC 3 enzymes are hydrolases (enzymes that use water to break up some other molecule)• EC 3.4 are hydrolases that act on peptide bonds•EC 3.4.11 are those hydrolases that cleave off the amino-terminal amino acid from a polypeptide•EC 3.4.11.4 are those that cleave off the amino-terminal end from a tripeptide
![Page 28: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/28.jpg)
Sequences, domains, motifs & annotations
Enzyme Commission number (EC number)
• For example, the tripeptide aminopeptidases have the code "EC 3.4.11.4“, as shown for an enzyme from
Lactobacillus helveticus in the BRENDA database for Comprehensive Enzyme Information System:
![Page 29: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/29.jpg)
Sequences, domains, motifs & annotations
Enzyme Commission number (EC number)
• EC 1 - Oxidoreductases• EC 2 - Transferases• EC 3 - Hydrolases• EC 4 - Lyases• EC 5 - Isomerases• EC 6 - Ligases
![Page 30: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/30.jpg)
Sequences, domains, motifs & annotations
Gene Ontology• A collaborative effort to address the need for
consistent descriptions of gene products in different database
• The GO project has developed three structured controlled vocabularies (ontologies) that describe gene products in terms of their associated biological processes, cellular components and molecular functions in a species-independent manner.
• The use of GO terms by collaborating databases facilitates uniform queries across them. The controlled vocabularies are structured so that they can be queried at different levels.
![Page 31: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/31.jpg)
Sequences, domains, motifs & annotations
Gene OntologyCellular componentA cellular component is just that, a component
of a cell, but that it is part of some larger object;
this may be an anatomical structure (e.g. rough endoplasmic reticulum or nucleus) or a gene product group (e.g. ribosome, proteasome or a protein dimer)
![Page 32: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/32.jpg)
Sequences, domains, motifs & annotations
Gene OntologyCellular componentA cellular component is just that, a component
of a cell, but that it is part of some larger object;
this may be an anatomical structure (e.g. rough endoplasmic reticulum or nucleus) or a gene product group (e.g. ribosome, proteasome or a protein dimer)
![Page 33: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/33.jpg)
Sequences, domains, motifs & annotations
Gene OntologyBiological processA biological process is series of events
accomplished by one or more ordered assemblies of molecular functions.
Examples of biological process terms are signal transduction or pyrimidine metabolism.
It can be difficult to distinguish between a biological process and a molecular function, but the general rule is that a process must have more than one distinct steps.
![Page 34: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/34.jpg)
Sequences, domains, motifs & annotations
Gene OntologyMolecular functiondescribes activities, such as catalytic or binding activities, that occur at the molecular level.
Molecular functions generally correspond to activities that can be performed by individual gene products, but some activities are performed by assembled complexes of gene products.
Examples of broad functional terms are catalytic activity, transporter activity, or binding; examples of narrower functional terms are adenylate cyclase activity or Toll receptor binding.
![Page 35: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/35.jpg)
Sequences, domains, motifs & annotations
Gene OntologyTopologyThe ontologies are in the form of directed acyclic graphs
(DAG), with the graph nodes being GO terms.
The ontologies are hierarchically structured, a more specialized term (child) can be related to more than one less specialized term (parent).
E.g. the biological process hexose biosynthetic process has two parents, hexose metabolic process and monosaccharide biosynthetic process. biosynthetic process is a type of metabolic process and a hexose is a type of monosaccharide. When any gene is involved in hexose biosynthetic process, it is automatically annotated to both hexose metabolic process and monosaccharide biosynthetic process.
![Page 36: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/36.jpg)
Sequences, domains, motifs & annotations
Gene Ontology Example
![Page 37: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/37.jpg)
Sequences, domains, motifs & annotations
Gene Ontology InterfaceSearch by gene or protein accession
http://www.geneontology.org/
![Page 38: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/38.jpg)
Sequences, domains, motifs & annotations
Summary of the first part- protein sequence databases and tools
• UniProt- the most comprehensive protein sequence database. Connected to many other databases and resources,
• Pfam- domain database. Many others… interpor, prosite, BLOCKS etc.
• EC and GO classifications of protein function
![Page 39: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/39.jpg)
OUTLINE
• Sequence:Databases, domains, motifs & annotations
• Structure:Secondary structure, structure databases, visualization and identification of functional site
![Page 40: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/40.jpg)
• All information about the native structure of a protein is encoded in the amino acid sequence + its native solution environment.
• Many possible conformation still only one or few native folds are exhibited for each protein (Levinthal’s paradox)
• Protein folding is driven by various forces:– Ionic forces– Hydrogen bonds– The hydrophobic affect– . . .
From Sequence to Structure
Investigating & visualizing protein structures
![Page 41: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/41.jpg)
Investigating & visualizing protein structures
Secondary Structure Prediction
Why predict secondary structures of proteins?
1)When the structure of the protein is still unknown. This can serve as the first step for structure prediction- first predict the secondary structures, then how they are arranged together.
2) For calculating better multiple sequence alignments or pairwise alignments.
![Page 42: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/42.jpg)
Predicting 2° Structure Each amino acid has a
different propensity for being in each 2° structure.
For example, Proline causes a kink which destroys the helix structure. Thus, Proline is usually found only at the helix end.
The different structures also have typical lengths.
Investigating & visualizing protein structures
![Page 43: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/43.jpg)
http://www.predictprotein.org/
Predicting 2° Structure
Investigating & visualizing protein structures
![Page 44: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/44.jpg)
Predicting 2° StructureAll these and more…
Investigating & visualizing protein structures
![Page 45: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/45.jpg)
Input: Sequence
Output: Secondary structure prediction, globular regions, coiled-coil regions, transmembrane helices, PROSITE motifs, bound cystein…
The Meta Predict Protein server now allows many other options…
http://www.predictprotein.org/meta.php
Predicting 2° Structure
Investigating & visualizing protein structures
![Page 46: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/46.jpg)
A common measure is Q3 = the % of amino acids that were predicted correctly.
Today, Q3 is about 75-78% (as determined objectively by CASP)
The theoretical limit is thougt to be about 90%
Authors Year % acurracy MethodChou-Fasman 1974 50% propensities of aa's in 2nd structures Garnier 1978 62% interactions between aa'sLevin 1993 69% multiple seq. alignments (MSA)Rost & Sander 1994 72% neural networks + MSA
Predicting 2° Structure
Investigating & visualizing protein structures
![Page 47: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/47.jpg)
Predicting 2° Structure
Investigating & visualizing protein structures
E.g. PSIPRED http://bioinf.cs.ucl.ac.uk/psipred/
psiform.html • A simple and accurate secondary structure prediction method, incorporating two feed-forward neural networks which perform an analysis on output obtained from PSI-BLAST.
• Using a very stringent cross validation method to evaluate the method's performance, PSIPRED recent version achieves an average Q3 score of 80.7%.
![Page 48: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/48.jpg)
Protein 3D Structures
A protein’s structure has a critical effect on its function:
1. Binding pockets
PDB ID 1nw7
Investigating & visualizing protein structures
![Page 49: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/49.jpg)
A protein’s structure has a critical effect on its function:
2. Areas of specific chemical\electrical properties
Protein 3D Structures
Investigating & visualizing protein structures
![Page 50: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/50.jpg)
A protein’s structure has a critical effect on its function:
3. Importance of the global fold for function
Protein 3D Structures
Investigating & visualizing protein structures
![Page 51: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/51.jpg)
Tertiary structure = protein fold
Complete 3-dimensional structure
Why is it interesting ? isn’t the sequence enough?
A key to understand protein function
Structure-based drug design
Detection of distant evolutionary relationships
The structure is more conserved
Investigating & visualizing protein structures
![Page 52: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/52.jpg)
Investigating & visualizing protein structures
RCSB- the Protein Data Bank
• The main & comprehensive database for biological macro-molecular structures
• Each structure receives a PDB ID: a 4 letters unique identifier
• Search by author, PDB id or any keyword.
• Download structures
![Page 53: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/53.jpg)
RCSB- Protein Databankhttp://www.rcsb.org/pdb/home/home.do
PDB ID: 3mht
Investigating & visualizing protein structures
![Page 54: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/54.jpg)
RCSB- The Protein Data BankDownload structure
Displaystructure
Data concerning the structure -
resolution, R-value.…
The paper describingthe structure
Investigating & visualizing protein structures
![Page 55: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/55.jpg)
RCSB- The Protein Data Bank
• TITLE• REMARK• COMPND• JRNL- reference• SEQRES- the original sequence• HELIX, BETA- secondary structure• ATOM – The actual protein/DNA/RNA chain• HETATM- additional atoms such as ligands, water etc.• …
PDB files have a specific format:
Investigating & visualizing protein structures
![Page 56: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/56.jpg)
RCSB – The Protein Data BankPDB files have a specific format:
ATOM 7 SD MET A 1 -29.059 28.614 71.539 1.00 26.90 S ATOM 8 CE MET A 1 -27.535 29.074 70.866 1.00 16.57 C ATOM 9 N ILE A 2 -29.656 32.903 69.094 1.00 25.93 N ATOM 10 CA ILE A 2 -30.077 33.171 67.730 1.00 25.49 C HETATM 3139 C6 SAH 328 -11.642 26.514 89.489 1.00 17.97 C HETATM 3140 N6 SAH 328 -10.474 26.661 90.103 1.00 14.50 N HETATM 3141 N1 SAH 328 -11.895 25.334 88.899 1.00 23.10 N HETATM 3142 C2 SAH 328 -13.079 25.090 88.350 1.00 16.93 C HETATM 3143 N3 SAH 328 -14.120 25.887 88.278 1.00 16.05 N HETATM 3144 C4 SAH 328 -13.832 27.092 88.861 1.00 14.31 C HETATM 3145 O HOH 329 -29.525 42.890 90.934 1.00 24.84 O HETATM 3146 O HOH 330 -28.213 42.867 93.588 1.00 8.11 O HETATM 3147 O HOH 331 -24.619 35.287 96.173 1.00 17.96 O
Coordinates: X, Y,ZAtom, residueor molecule
Chain if existsNumbering
http://www.wwpdb.org/documentation/format3.1-20080211.pdf
Investigating & visualizing protein structures
![Page 57: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/57.jpg)
More Sequences Than Structures
Discrepancy between the number of known sequences and solved structures:
5,047,807 UniRef90 entries vs. 19988 90% Non-redundant structures
Computational methods are needed to
obtain more structures
RCSB – The Protein Data Bank
Investigating & visualizing protein structures
![Page 58: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/58.jpg)
Fold classification
Classification: clustering proteins into structural families
Motivation?
Profound analysis of evolutionary mechanisms
Constraints on secondary structure packing?
Classification at domain level
Investigating & visualizing protein structures
![Page 59: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/59.jpg)
Investigating & visualizing protein structures
Fold classificationhttp://scop.berkeley.edu
• The SCOP database aims to provide a description of the structural and evolutionary relationships between all proteins whose structure is known, including all entries in the PDB.
• The SCOP classification of proteins has been constructed manually, but with the assistance of tools to make the task manageable and help provide generality.
![Page 60: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/60.jpg)
Investigating & visualizing protein structures
Fold classification
1. Family: Clear evolutionarily relationshipGenerally, this means that pairwise residue identities between the proteins are 30% and greater.
2. Superfamily: Probable common evolutionary originProteins that have low sequence identities, but whose structural and functional features suggest that a common evolutionary origin is probable are placed together in superfamilies.
![Page 61: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/61.jpg)
Investigating & visualizing protein structures
Fold classification
3. Fold: Major structural similaritySame major secondary structures in the same arrangement and with the same topological connections. Different proteins with the same fold often have peripheral elements of secondary structure and turn regions that differ in size and conformation. In some cases, these differing peripheral regions may comprise half the structure.
Proteins of the same fold category may not have a common evolutionary origin: the structural similarities could arise from convergent evolution.
![Page 62: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/62.jpg)
Growth of unique folds as defined by SCOP
Year
Nu
mb
er
Investigating & visualizing protein structures
![Page 63: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/63.jpg)
Hierarchical classification of
protein domain structures in the
PDB.
Domains are clustered at five
major levels:
Class
Architecture
Topology
Homologous superfamily
Sequence family
Fold classification
Investigating & visualizing protein structures
![Page 64: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/64.jpg)
Fold classification
Investigating & visualizing protein structures
• Class [C] - derived from secondary structure content (automatic)- alpha, beta, alpha and beta, few.
• Architecture [A] - derived from orientation of secondary structures (manual)
• Topology [T] - derived from topological connection and secondary structures- (by automated structural alignment)
• Homologous Superfamily [H]/sequence family- clusters of similar structures & functions.
![Page 65: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/65.jpg)
Investigating & visualizing protein structures
SCOP Vs. CATH
Same SCOP family, different CATH topologies: d1rh6b (a.6.1.7) / 1rh6B00 (1.10.1660.20) vs. d1g4da(a.6.1.7) / 1g4dA00 (1.10.10.10)
Different SCOP classes, same CATH homologous superfamilies: d1bbxd (b.34.13.1) / 1bbxD00(2.40.50.40) vs. d1rhpa (d.9.1.1) / 1rhpA00 (2.40.50.40)
Csaba et al., 2009
![Page 66: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/66.jpg)
Investigating & visualizing protein structures
SCOP Vs. CATH
SCOP CATHclass class
architecturefold topology
homologous superfamilysuperfamilyfamily sequence family
CATH more directed toward structural classification,SCOP pays more attention to evolutionary
relationships
![Page 67: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/67.jpg)
PdbSum
• A database providing an overview of all biological macromolecular structures
• Connected to UniProt find the sequence accession of a known PDB ID
• Detailed description of many structure properties, e.g.:– EC number– Chains & ligands and their interactions– Clefts– Secondary structure– FASTA sequence of structure…– …
Investigating & visualizing protein structures
![Page 68: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/68.jpg)
PdbSumPDB ID
Free text
Search by sequence
http://www.ebi.ac.uk/thornton-srv/databases/pdbsum/
Investigating & visualizing protein structures
![Page 69: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/69.jpg)
PdbSum
Useful tabs
UniProtaccession
Chains &
ligands
Investigating & visualizing protein structures
![Page 70: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/70.jpg)
PdbSum
Highlights fromthe related paper
EC and reaction
GO annotation
Investigating & visualizing protein structures
![Page 71: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/71.jpg)
PdbSum
Protein tab
Secondary structure-from the PDB
Investigating & visualizing protein structures
![Page 72: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/72.jpg)
PdbSum
Ligand tab
LigPlot-Predicts the residues that
bind the ligand
The ligand’sstructure
Investigating & visualizing protein structures
![Page 73: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/73.jpg)
Before the invention of computer graphics, trained artists were employed for hand-drawing understandable picture of
a protein
Irving Geis (1908 – 1997)
Investigating & visualizing protein structures
![Page 74: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/74.jpg)
PyMol ViewerFeatures:
• Viewing 3D Structures
• Rendering Figures
• Giving Presentations
• Animating Molecules
• Sharing Visualizations
• Exporting Geometry
Investigating & visualizing protein structures
![Page 75: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/75.jpg)
Pymol Viewer:Potassium channel from (kcsa) from streptomyces lividans, pdb id 1bl8
Declan et al., 1998
Investigating & visualizing protein structures
![Page 76: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/76.jpg)
• Identify the different parts of the screen:-the external GUI window-the internal GUI window.
• The internal window contains the viewer, which displays the molecule, and the command line.
View Manipulation
Investigating & visualizing protein structures
![Page 77: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/77.jpg)
View Manipulation
To manipulate an object, we use the letter icons near its name- A – Action- S – Show- H – Hide- L – Label- C – Color
Investigating & visualizing protein structures
![Page 78: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/78.jpg)
View Manipulation
Change the representation of the object to “Cartoon” using: S (show) As Cartoon
Investigating & visualizing protein structures
![Page 79: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/79.jpg)
View Manipulation
Other protein representations under “S” “As”:• Lines
•Ribbons
• Sticks
• Dots
• Spheres
• Surface
Investigating & visualizing protein structures
![Page 80: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/80.jpg)
Part 1: View Manipulation
Color by chain: C (color) by chain
Investigating & visualizing protein structures
![Page 81: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/81.jpg)
View Manipulation
Other coloring options:
• Color by spectrum: b-factor, rainbow
• Color by secondary structure (“SS”)
• Color by element:
• A lot of available colors, other can be defined in the external GUI“settings””colors…” “new”
Investigating & visualizing protein structures
![Page 82: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/82.jpg)
•Select specific amino acids by clicking on them .
•Select a range in the sequence by clicking the first residue, and then “shift+click” on the last residue.
•The selection will be indicated on the structure (in pink dots).
Selecting and manipulating specific parts of the molecule
Investigating & visualizing protein structures
![Page 83: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/83.jpg)
Selecting and manipulating specific parts of the molecule
• In the object list, a new object “(sele)” was added.•This object represents the current selection
• You can manipulate it with the buttons next to the object. For example, change its representation to sticks•(“S” “As” “Sticks”)
Investigating & visualizing protein structures
![Page 84: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/84.jpg)
Selecting and manipulating specific parts of the molecule
• Give a different name to the selection, so you can easily manipulate it later.
•Select the first chain again (using the sequence) and change it name to “chain1” by pressing: “Action Rename Selection” and typing “chain1”.
Investigating & visualizing protein structures
![Page 85: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/85.jpg)
Making high-quality photos
1. Change the background color to white, with“Display Background White”
on the external GUI menu:
Investigating & visualizing protein structures
![Page 86: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/86.jpg)
Making high-quality photos2. Type in the command line: “ray [x], [y]” ”… wait…
3. Save the image by: “Save” “Image
Pay attention not to accidentally press on the image before saving!
Investigating & visualizing protein structures
![Page 87: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/87.jpg)
Making high-quality photos
Investigating & visualizing protein structures
![Page 88: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/88.jpg)
Making high-quality photos
Investigating & visualizing protein structures
![Page 89: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/89.jpg)
Approach: Functionally important amino acid sites are often evolutionarily conserved
ConSurfThe goal: identification of functionally
important amino acids that mediate the interaction of a query protein with
ligands, DNA/RNA, other proteins etc.
Investigating & visualizing protein structures
![Page 90: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/90.jpg)
Beta Class N6-Adenine DNA Methyltransferase
Investigating & visualizing protein structures Consurf
![Page 91: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/91.jpg)
The 3D structure ofBeta Class N6-Adenine DNA Methyltransferase
has already been solved:
PDB id : 1nw7
Investigating & visualizing protein structures ConSurf
![Page 92: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/92.jpg)
• The ConSurf webserver calculates the evolutionary rate for each position in the protein
• The results, mapped on the structure, reveal residues crucial for function and structure stability
• In this case, the ligand is bound in a highly conserved cluster of residues
http://consurf.tau.ac.il/
Investigating & visualizing protein structures Consurf
![Page 93: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/93.jpg)
The consensus sequence approach:
..W..
..W..
..W..
..W..
.. E..
.. G..
Investigating & visualizing protein structures Consurf
![Page 94: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/94.jpg)
However, some
sequences might be
close homologues of each other
primates
..W..
..W..
..W..
..W..
.. E..
.. G..
Investigating & visualizing protein structures Consurf
Conclusion: Assessing conservation without taking into
consideration the phylogenetic relations may lead to uneven sampling in sequence space
![Page 95: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/95.jpg)
Phylogenetic reconstruction may be used to distinguish between two possible cases:
1.Structural/functional constraints that truly result in sequence conservation as a result of evolutionary pressure.
2. Short evolutionary time that may be mistaken as sequence conservation, while no evolutionary pressure affects the examined position.
Investigating & visualizing protein structures Consurf
![Page 96: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/96.jpg)
Rate4Site:an algorithm for calculating the
evolutionary rate at each amino acid site
Conserved sites evolve slowlyvariable sites evolve rapidly
Definition: Evolutionary rate = number of AA replacements/(site*year)
Pupko et al., 2002Mayrose et al., 2005
Investigating & visualizing protein structures Consurf
![Page 97: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/97.jpg)
Landau et al., 2005
Web-Server: http://consurf.tau.ac.il/
Investigating & visualizing protein structures Consurf
![Page 98: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/98.jpg)
The Rate4Site conservation scores are not specificintegers.
Such scores are impossible to display on a structure.
Hence, the ConSurf webserver divides them into 9
bins- 1 for highly variable , 9 for the most conserved
Investigating & visualizing protein structures
Consurf coloring bar
![Page 99: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/99.jpg)
The ConSurf webserver
Essential input- MSA and tree constructed
by ConSurf through “advanced options”:1. PDB ID\PDB file\model-structure and chain
Essential and optional input:1. PDB ID\PDB file\model-structure and chain 2. Constructed MSA, with the query sequence
included3. Phylogenetic tree
Investigating & visualizing protein structures Consurf
![Page 100: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/100.jpg)
http://consurf.tau.ac.il/index.html
![Page 101: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/101.jpg)
BayesianMax Likelihood
1NW7
Check inthe PDBsum…
http://consurf.tau.ac.il/index.html
Essential and Optional input:
MSA
Sequence namein the MSA
Tree
![Page 102: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/102.jpg)
Essential input:
http://consurf.tau.ac.il/index.html
1NW7
Check inthe PDBsum…
![Page 103: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/103.jpg)
Essential input:
http://consurf.tau.ac.il/index.html
SWISS-PROTUniProt
Alignment method
Additional BLAST options
![Page 104: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/104.jpg)
Calculation Finished:
Easy web-based viewer
View scores
Produced or input MSA
View phylogenetic tree
Script for coloring in RasTop*
Instructions for PyMOl*
Viewer for producing medium-quality images*
![Page 105: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/105.jpg)
Jmol- Easy web-based viewer
Investigating & visualizing protein structures Consurf
![Page 106: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/106.jpg)
Summary - MSA Quality• ConSurf is dependent on the quality of the MSA.
• When an MSA is not given by the user, sequences are automatically gathered by PSI-BLAST and aligned by CLUSTALW with default parameters.
• Even though these alignments are usually good, it is highly recommended to inspect the alignment manually and with other tools in order to improve the quality of the evolutionary data .
Investigating & visualizing protein structures Consurf
![Page 107: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/107.jpg)
A caveat: In some cases the functionally important region may
not be conserved at all
The peptide-binding groove of the MHC class I heavy chain.
PDB id : 2vaa
Investigating & visualizing protein structures Consurf
![Page 108: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/108.jpg)
Patch- a spatially continuous cluster of surface residues.
Problems:– Subjectivity of
boundaries. – Difficult to apply on large
datasets
PatchFinder-identification of functional sites
Investigating & visualizing protein structures
![Page 109: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/109.jpg)
1Nimrod et al., 20052Nimrod et al, 2008
3Mayrose et al., 2004
(1) Assignment of conservation scores
(Rate4Site3)
(4) Identification of non-overlapping secondary patches
(2) Identification of exposed residues
(3) Extraction of the surface patch of conserved residues with the highest statistical significance (ML-patch).
Input: 1. Protein Structure 2. Multiple sequence alignment (MSA)
PatchFinder
Investigating & visualizing protein structures
![Page 110: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/110.jpg)
PatchFinder- http://patchfinder.tau.ac.il/
Investigating & visualizing protein structures
![Page 111: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/111.jpg)
Investigating & visualizing protein structures
Summary of structure-relateddatabases & tools
• Secondary structure prediction- PredictProtein, Meta PredictProtein and PSIPRED.
• PDB, SCOP and CATH- collection and classification of structures available by experiment.
• Structure visualization- PyMol
• Conservation analysis- Consurf and Patchfinder
![Page 112: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/112.jpg)
Protein structure prediction
![Page 113: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/113.jpg)
Structure Prediction Approaches
1. Homology (Comparative) Modeling
Based on sequence similarity with a protein for
which a structure has been solved.
2. Threading (Fold Recognition)
Requires a structure similar to a known structure
3. Ab-initio fold prediction
Not based on similarity to a sequence\structure
![Page 114: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/114.jpg)
Ab-initioStructure prediction from “first principals”:
Given only the sequence, try to predict the structure
based on physico-chemical properties
(energy, hydrophobicity etc.)
• When all else fails works for novel folds
• Shows that we understand the process
![Page 115: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/115.jpg)
The Force Field(energy function)
A group of mathematical expressions describing the
potential energy of a molecular system
• Each expression describes a different type of physico-
chemical interaction between atoms in the system:
• Van der Waals forces
• Covalent bonds
• Hydrogen bonds
• Charges
• Hydrophobic effects
Non-bonded terms
![Page 116: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/116.jpg)
Approaches to Ab-initio Prediction1. Molecular Dynamics
• Simulates the forces that governs the protein within water.• Since proteins usually naturally fold, this would lead to the
native protein structure.
Problems:• Thousands of atoms• Huge number of time steps to reach folded protein
feasible only for very small proteins
![Page 117: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/117.jpg)
Approaches to Ab-initio Prediction
2. Minimal Energy
Assumption: the folded form is the minimal energy conformation of a protein
Main principals:• Define an energy function.• Search for 3D conformation that minimize energy.
![Page 118: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/118.jpg)
Ab-initio
2. Minimal Energy
• Use of simplified energy function
• Search methods for minimal energy conformation:
– Greedy search
– Simulated annealing
– …
![Page 119: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/119.jpg)
• Current methods (e.g. Rosetta) primarily utilize the fact that although we are far from observing all protein folds, we probably have seen nearly all sub-structures:
Ab-initio
Moult J. Philos. Trans. R. Soc. B. 361:453–458 (2006)
Local sequence-structure relationships:
• A library of known sub-structures (fragments less than 10 residues) is created.
• A range of possible conformations for each fragment in the query protein are selected.
![Page 120: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/120.jpg)
Ab-initio
Moult J. Philos. Trans. R. Soc. B. 361:453–458 (2006)
Non-local sequence-structure relationships:
• The primary nonlocal interactions considered are hydrophobic burial, electrostatics, main-chain hydrogen bonding etc.
Structures that are consistent with both the local and non-local interactions are generated by minimizing the non-local interaction energy in the space definedby the local structure distributions.
![Page 121: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/121.jpg)
Ab-initio - Example
Moult J. Philos. Trans. R. Soc. B. 361:453–458 (2006)
![Page 122: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/122.jpg)
Given a sequence and a library of folds, thread the sequence
through each fold. Take the one with the highest score.
• Method will fail if new protein does not belong to any fold in
the library.
• Score of the threading is computed based on known
physical chemistry properties and statistics of amino acids.
Fold Recognition(Threading)
![Page 123: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/123.jpg)
EEabab A C D E …..
A -3 -1 0 0 ..C -1 -4 1 2 ..D 0 1 5 6 ..E 0 2 6 7 ... . . . .
ACCECADAAC -3-1-4-4-1-4-3-3=-23
• structural templatestructural template
• neighbor definitionneighbor definition
• energy functionenergy function
11
22
33
44
55
66
77
1010
88
99
AA
CC
CC
EE
CC
AA
DDAA
AA
CC
E Eji, positions
ba ji
Threading: example
![Page 124: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/124.jpg)
MAHFPGFGQSLLFGYPVYVFGD...
Potential fold
...
1) ... 56) ... n)
...
-10 ... -123 ... 20.5
Find best fold for a protein sequence: Fold recognition (threading)
![Page 125: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/125.jpg)
GenTHREADER
• Align the query sequence with each template (requires some sequence homology!)
• Assess the alignment by:– Sequence alignment score– Pairwise potentials– Solvation function
• Record lengths of: alignment, query, template
• Using Neural Network the overall score is computed.Jones DT et al. J. Mol. Biol. 287: 797-815(1999)
![Page 126: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/126.jpg)
GenTHREADER
Jones DT et al. J. Mol. Biol. 287: 797-815(1999)
![Page 127: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/127.jpg)
I-TASSER- Hybrid Approach
• In a recent wide blind experiment, CASP7, I-TASSER generated the best 3D structure predictions among all automated servers.
•Based on the secondary-structure threading and the iterative implementation of the Threading ASSEmbly Refinement (TASSER) program.
•For predicting the biological function of the protein, the I-TASSER server matches the predicted 3D models to the proteins in 3 independent libraries which consist of proteins of known enzyme classification (EC) number, gene ontology (GO) vocabulary, and ligand-binding sites.
![Page 128: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/128.jpg)
I-TASSER
![Page 129: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/129.jpg)
Test Case:Rosetta Vs. TASSER
Grey: Crystal structure of Beta-nnnn:
Purple: Rosetta prediction, starting from homology modeling
Green: TASSER predcition
![Page 130: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/130.jpg)
Homology Modeling – Basic Idea
Triophospate ismoerases44.7% sequence identity0.95 RMSD
1. A protein structure is defined by its amino acid sequence.
2. Closely related sequences adopt highly similar structures, distantly related sequences may still fold into similar structures.
3. Three-dimensional structure of
proteins from the same family is
more conserved than their
primary sequences.
![Page 131: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/131.jpg)
General Scheme
1. Searching for structures related to the query sequence
2. Selecting templates
3. Aligning query sequence with template structures
4. Building a model for the query using information from the template structures
5. Evaluating the model
Fiser A et al. Methods in Enzymology 374: 461-491(2004)
![Page 132: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/132.jpg)
General Scheme
![Page 133: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/133.jpg)
Homology modeling requires handling structures & sequences
• Query- only the protein sequence is available- usually found at the UniProt database
• Template- after identification, both structural and sequence-related data should be found- UniPort (or NCBI databases), RCSB and PDBsum
![Page 134: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/134.jpg)
Homology modeling- query-template alignment
Different levels of similarity between the template & query initiate various computational approaches:
![Page 135: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/135.jpg)
Evolutionary Conservation
Homology modeling- model evaluation
http://consurf.tau.ac.il
![Page 136: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/136.jpg)
http://consurf.tau.ac.il
Homology modeling- model evaluation
Evolutionary Conservation
![Page 137: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/137.jpg)
http://consurf.tau.ac.il
Homology modeling- model evaluation
Evolutionary Conservation
![Page 138: Bioinformatics of proteins: Sequence, structure and the ‘symbiosis’ between them](https://reader031.fdocuments.in/reader031/viewer/2022013012/5681440f550346895db0abbc/html5/thumbnails/138.jpg)
Homology Modeling
• The accuracy of the model depends on its sequence identity with the template: