Protein 3D structure and classification database

Protein structure and modelingPrestonUniversityIslam abad

1

Introduction • Protein • Function of proteins • Enzymes • Structures• Catalysts • Transportation • Regulation • Signaling

2

Amino acids • Amino acids basic units of proteins • Chiral carbon • Side chain • Hydrogen group • Amino group • Carboxylic group

3

Protein sequences: amino acids

4

Codes for amino acids

8

Protein

• Primary structure • Secondary structure• Tertiary structure • Quaternary structure

9

Secondary structures • Structures formed via introduction of hydrogen bonding in the

linear polypeptide chain • Alpha helices • Beta sheets

10

Alpha helices

• Right hand-coiled or spiral conformation (helix) in which every backbone N-H group donates a hydrogen bond to the backbone C=O group of the amino acid four residues earlier ( hydrogen bonding).

• Among types of local structure in proteins, the α-helix is the most regular and the most predictable from sequence, as well as the most prevalent.

11

Beta sheets • The β sheet (also β-pleated sheet) is the second form of

regular secondary structure in proteins. It is less common than the alpha helix.

• Beta sheets consist of beta strands connected laterally by at least two or three backbone hydrogen bonds, forming a generally twisted, pleated sheet.

• A beta strand (also β strand) is a stretch of polypeptide chain typically 3 to 10 amino acids long with backbone in an almost fully extended conformation.

13

Parallel beta sheet

Anti parallel beta sheet

14

Tertiary structure • The term protein tertiary structure refers to a protein's geometric shape. The tertiary structure will have a single polypeptide chain "backbone" with one or more protein

secondary structures, the protein domains.

15

Conformational parameters for secondary structure of a protein • Dihedral angles: in proteins the A.A joint is specific in its

orientation which determines the conformation of the protein. • The conformation of the protein could then be elucidated via

the angles in the parent chain and not the side chain of the protein

• The angle Phi φ is present at the C alpha to Nitrogen of amino group in the polypeptide

• The angle Psi ψ is present at the C alpha to carbon of carboxylic group in the polypeptide

• The angles phi and psi should be considered as 180 degrees when the polypeptide is in fully extended conformation 17

Ramachandran plot

18

• There are certain permitted values for these angles.

• As if the values are not appropriate there might be steric hindrance and the conformation might get distorted.

• The protein might also get non functional

• A Ramachandran plot can be used in two different ways.• One is to show in theory which values, or conformations, of

the ψ and φ angles are possible for an amino-acid residue in a protein .

• A second is to show the empirical distribution of data points observed in a single structure in usage for structure validation, or else in a database of many structures

• Either case is usually shown against outlines for the theoretically favored regions.

19

Ramachandran Plot

20

Hydropathy plot • A hydropathy plot is a quantitative analysis of the degree of

hydrophobicity or hydrophilicity of amino acids of a protein.• It is used to characterize or identify possible structure or

domains of a protein.

• If more hydrophobic residues are present in a plot this means that the protein is a trans membrane protein and domain refers to the inner side of the membrane that spans the membrane multiple times.

21

• The plot has amino acid sequence of a protein on its x-axis• Degree of hydrophobicity and hydrophilicity on its y-axis• There is a number of methods to measure the degree of

interaction of polar solvents such as water with specific amino acids.

• For instance, the Kyte-Doolittle scale indicates hydrophobic amino acids, whereas the Hopp-Woods scale measures hydrophilic residues.

22

• Analyzing the shape of the plot gives information about partial structure of the protein.

• For instance, if a stretch of about 20 amino acids shows positive for hydrophobicity, these amino acids may be part of alpha-helix spanning across a lipid bilayer, which is composed of hydrophobic fatty acids.

• On the converse, amino acids with high hydrophilicity indicate that these residues are in contact with solvent, or water, and that they are therefore likely to reside on the outer surface of the protein.

• Expasy protscale - could be used to construct a hydropathy plot instantaneously

23

Expasy – protscale

24

28

aquaporin

Methods of protein structure and modeling

Threading or fold recognition

Ab initio/ De novo method

29

1)Threading • There might be a structural similarity in two proteins with

almost less than ten percent of the sequence similarity • When sequence based comparison methods are not much

efficient to recognize the folds and domains in the target sequence then we proceed with the threading

• Threading is the method by which a library of unique structures is searched for structure analogues to the target sequence, and is based on the theory that there may be only a distinct number of folds

30

Basic components of foldingRepresentation

of the query sequence

Representation of the protein

structural models

Objective function

Aligning a sequence to a

model

Selecting a model from a

library

31

Representation of the query sequence

• Similar protein sequence leads to the similar protein structure • Sequences similar to the query sequence are carrying

information about the 3D structure of the query sequence • The algorithms are also there to develop the different

representation

32

Representation of the protein structural models

• Protein structure is determined by all the non hydrogen atoms in their 3D conformation

• The 3D coordinates in the soft wares used for threading purpose are more well suited to the abstract protein structure and give almost a view which is just like the original 3D protein structures

33

Objective function

• The 3D data deposited in the databases like PDB is analyzed via the different statistical protocols

• These analyzed data are now referred to as knowledge based potentials or empirical potentials

• In the case of non-linear models the other name is contact potentials etc

34

Aligning a sequence to a model

• The goal of threading alignment algorithm is to find an optimal match for the query sequence to the best suited sample protein sequence

• The sequence structure algorithms can be done to find the best suited match

35

Selecting a model from a library

• The different models which result as a base of alignments of the sequences and structures would lead to multiple results

• The best result with the highest score would be selected to model the protein structure

36

2) Ab initio method • Ab initio structure prediction leads to the protein structure

determination by the protein sequence alone • The free energy estimation of all the molecules present in the

amino acid sequence of the protein is also done independently

• The two key components of the de novo methods are the procedure for the efficiently carrying the conformational search and the free energy estimation function used for evaluating the possible conformations.

37

Ab-initio method

Advantages • Ab-initio approach can

be applied to model any sequence

Disadvantages • Low resolution models • Limited number of

residues of less than 100 amino acids could be modeled only

38

Thanks

39

Signal peptide prediction • A signal peptide which is also sometimes referred to as signal

sequence, leader sequence or leader peptide is a short 5-30 amino acids long peptide present at the N-terminus of the majority of newly synthesized proteins.

• These proteins are destined towards the secretory pathway.• These proteins include those that reside either inside certain

organelles (the endoplasmic reticulum, Golgi or endosomes), secreted from the cell, or inserted into most cellular membranes.

• Signal peptide version 4 has been used to detect the presence of the signal peptides

40

Protein 3D structure and classification database

Education

Transcript of Protein 3D structure and classification database