Protein Secondary Structure Prediction 1

25
Protein Secondary Structure Protein Secondary Structure Prediction Prediction K.Akila, K.Akila, Lecturer, Lecturer, Department of Bioinformatics Department of Bioinformatics Jamal Mohamed College, Jamal Mohamed College, Trichy. Trichy.

description

it is very useful for biology students.

Transcript of Protein Secondary Structure Prediction 1

Page 1: Protein Secondary Structure Prediction 1

Protein Secondary Structure Protein Secondary Structure PredictionPrediction

K.Akila,K.Akila,Lecturer,Lecturer,

Department of BioinformaticsDepartment of BioinformaticsJamal Mohamed College,Jamal Mohamed College,

Trichy.Trichy.

Page 2: Protein Secondary Structure Prediction 1

ProteinsProteins Proteins play a crucial role in virtually all biological Proteins play a crucial role in virtually all biological

processes with a broad range of functions. processes with a broad range of functions. The activity of an enzyme or the function of a The activity of an enzyme or the function of a

protein is governed by the three-dimensional protein is governed by the three-dimensional structure.structure.

H11_MOUSEhistocompatibility antigen

VE2_BPV1Bovine DNA-binding domain

Page 3: Protein Secondary Structure Prediction 1

20 amino acids - the building blocks20 amino acids - the building blocks

Clickable map at: http://www.russell.embl-heidelberg.de/aas/

Page 4: Protein Secondary Structure Prediction 1

Secondary structure = spatial arrangement of amino-acid residues that are adjacent in the primary structure

Page 5: Protein Secondary Structure Prediction 1

Reasons for Predicting Secondary Reasons for Predicting Secondary StructureStructure

Starting point for prediction of tertiary and Starting point for prediction of tertiary and quaternary structure.quaternary structure.

Insight into biological function of protein.Insight into biological function of protein. Facilitate alignment for homology modeling Facilitate alignment for homology modeling

of distantly related proteins.of distantly related proteins. Insight for data analysis/mutagenesis Insight for data analysis/mutagenesis

experiments when structure is not known.experiments when structure is not known. Since secondary structure is local, just need Since secondary structure is local, just need

amino acid sequence.amino acid sequence.

Page 6: Protein Secondary Structure Prediction 1

Use of amino acid properties in Use of amino acid properties in prediction schemesprediction schemes

Use of amino acid properties in Use of amino acid properties in prediction schemesprediction schemes

Prediction function

Sequence

Other inputs

Prediction

Propensity function

Sequence

Other inputs

Vector of propensities

Page 7: Protein Secondary Structure Prediction 1

Primary structurePrimary structurePrimary structure refers to the "linear" sequence of amino acids.

Page 8: Protein Secondary Structure Prediction 1

Types of SecondaryTypes of SecondaryStructuresStructures

α Helicesα Helices β Sheetsβ Sheets LoopsLoops CoilsCoils

Page 9: Protein Secondary Structure Prediction 1

α Helixα Helix

Most abundant secondaryMost abundant secondary structurestructure 3.6 amino acids per turn3.6 amino acids per turn Hydrogen bond formed Hydrogen bond formed

between every fourth between every fourth residereside

Average length: 10 amino Average length: 10 amino acids, or 3 turnsacids, or 3 turns

Varies from 5 to 40 amino Varies from 5 to 40 amino acidsacids

Page 10: Protein Secondary Structure Prediction 1

Contd.Contd.

Normally found on the surface of protein cores.Normally found on the surface of protein cores. Interact with aqueous environmentInteract with aqueous environment – – Inner facing side has hydrophobic amino acids.Inner facing side has hydrophobic amino acids. – – Outer-facing side has hydrophilic amino acids.Outer-facing side has hydrophilic amino acids.

Every third amino acid tends to be hydrophobic.Every third amino acid tends to be hydrophobic. Pattern can be detected computationally.Pattern can be detected computationally. Rich in alanine (A), gutamic acid (E), leucine(L), and Rich in alanine (A), gutamic acid (E), leucine(L), and

methionine (M).methionine (M). Poor in proline (P), glycine (G), tyrosine (Y),and serine (S).Poor in proline (P), glycine (G), tyrosine (Y),and serine (S).

Page 11: Protein Secondary Structure Prediction 1

β Sheetβ Sheet

Hydrogen bonds between 5-10 Hydrogen bonds between 5-10 consecutive amino acids in one portion consecutive amino acids in one portion of the chain with another 5-10 farther of the chain with another 5-10 farther down the chaindown the chain

Interacting regions may be adjacent with Interacting regions may be adjacent with a short loop, or far apart with other a short loop, or far apart with other structures in between.structures in between.

Slight counterclockwise rotationSlight counterclockwise rotation -Alpha carbons (as well as R side -Alpha carbons (as well as R side

groups) alternate above and below the groups) alternate above and below the sheetsheet

- Prediction difficult, due to wide range - Prediction difficult, due to wide range of φ and ψ angles.of φ and ψ angles.

Page 12: Protein Secondary Structure Prediction 1

LoopLoop

Regions between α helices and βsheets.Regions between α helices and βsheets. Various lengths and three-dimensional Various lengths and three-dimensional

configurations.configurations. Located on surface of the structure.Located on surface of the structure. Hairpin loops: complete turn in the polypeptide Hairpin loops: complete turn in the polypeptide

chain, (anti-parallel β sheets).chain, (anti-parallel β sheets). More variable sequence structure.More variable sequence structure. Tend to have charged and polar amino acids.Tend to have charged and polar amino acids. Frequently a component of active sites.Frequently a component of active sites.

Page 13: Protein Secondary Structure Prediction 1

• Historically first structure prediction methods predicted secondary structure

• Can be used to improve alignment accuracy

• Can be used to detect domain boundaries within proteins with remote sequence homology

• Often the first step towards 3D structure prediction

• Informative for mutagenesis studies

Secondary structure Secondary structure predictionprediction

Page 14: Protein Secondary Structure Prediction 1

Contd.,Contd.,Contd.,Contd.,

In either case, amino acid propensities In either case, amino acid propensities should be useful for predicting secondary should be useful for predicting secondary structurestructure

Two classical methods that use previously Two classical methods that use previously determined propensities:determined propensities:– Chou-FasmanChou-Fasman– Garnier-Osguthorpe-RobsonGarnier-Osguthorpe-Robson

Page 15: Protein Secondary Structure Prediction 1

Chou-Fasman methodChou-Fasman methodChou-Fasman methodChou-Fasman method

Uses table of conformational parameters Uses table of conformational parameters (propensities) determined primarily from (propensities) determined primarily from measurements of secondary structure by measurements of secondary structure by CD spectroscopyCD spectroscopy

Table consists of one “likelihood” for each Table consists of one “likelihood” for each structure for each amino acidstructure for each amino acid

Page 16: Protein Secondary Structure Prediction 1

Chou-Fasman propensities Chou-Fasman propensities (partial table)(partial table)

Chou-Fasman propensities Chou-Fasman propensities (partial table)(partial table)

Amino Acid P P Pt

Glu 1.51 0.37 0.74Met 1.45 1.05 0.60Ala 1.42 0.83 0.66Val 1.06 1.70 0.50Ile 1.08 1.60 0.50Tyr 0.69 1.47 1.14Pro 0.57 0.55 1.52Gly 0.57 0.75 1.56

Page 17: Protein Secondary Structure Prediction 1

Chou-Fasman methodChou-Fasman methodChou-Fasman methodChou-Fasman method

A prediction is made for each type of A prediction is made for each type of structure for each amino acidstructure for each amino acid– Can result in ambiguity if a region has high Can result in ambiguity if a region has high

propensities for both helix and sheet (higher propensities for both helix and sheet (higher value usually chosen, with exceptions)value usually chosen, with exceptions)

Page 18: Protein Secondary Structure Prediction 1

Chou-Fasman methodChou-Fasman methodChou-Fasman methodChou-Fasman method

Calculation rules are somewhat Calculation rules are somewhat ad hocad hoc Example: Method for helixExample: Method for helix

– Search for nucleating region where 4 out of 6 Search for nucleating region where 4 out of 6 a.a. have Pa.a. have P > 1.03 > 1.03

– Extend until 4 consecutive a.a. have an average Extend until 4 consecutive a.a. have an average PP < 1.00 < 1.00

– If region is at least 6 a.a. long, has an average If region is at least 6 a.a. long, has an average PP > 1.03, and average P > 1.03, and average P > average P > average P consider region to be helixconsider region to be helix

Page 19: Protein Secondary Structure Prediction 1

Accuracy of Chou-Fasman Accuracy of Chou-Fasman predictionspredictions

Sequences whose 3D structures are known Sequences whose 3D structures are known are processed so that each residue is are processed so that each residue is “assigned” to a given secondary structure “assigned” to a given secondary structure class by looking at the backbone anglesclass by looking at the backbone angles

Three classes most often used (Three classes most often used (helix=H, helix=H, sheet=E, turn=Csheet=E, turn=C) but sometimes use four ) but sometimes use four classes (classes (helix, sheet, turn, loophelix, sheet, turn, loop))

Page 20: Protein Secondary Structure Prediction 1

Confusion matrix for Chou-Confusion matrix for Chou-Fasman method on 78 proteinsFasman method on 78 proteins

PredictedTrue

H E C Unknown

H 47.5 3.0 4.3 45.2

E 20.8 16.8 7.1 55.4

C 6.4 3.6 38.0 52.0

Data from Z-Y Zhu, Protein Engineering 8:103-109, 1995

Average accuracy = 54.4

Page 21: Protein Secondary Structure Prediction 1

Garnier-Osguthorpe-RobsonGarnier-Osguthorpe-RobsonGarnier-Osguthorpe-RobsonGarnier-Osguthorpe-Robson

Uses table of propensities calculated Uses table of propensities calculated primarily from structures determined by X-primarily from structures determined by X-ray crystallographyray crystallography

Table consists of one “likelihood” for each Table consists of one “likelihood” for each structure for each amino acid for each structure for each amino acid for each position in a 17 amino acid windowposition in a 17 amino acid window

Page 22: Protein Secondary Structure Prediction 1

Garnier-Osguthorpe-RobsonGarnier-Osguthorpe-RobsonGarnier-Osguthorpe-RobsonGarnier-Osguthorpe-Robson

Analogous to searching for “features” with a 17 Analogous to searching for “features” with a 17 amino acid wide frequency matrixamino acid wide frequency matrix

One matrix for each “feature”One matrix for each “feature” -helix-helix -sheet-sheet– turnturn– coilcoil

Highest scoring “feature” is found at each Highest scoring “feature” is found at each locationlocation

Page 23: Protein Secondary Structure Prediction 1

Accuracy of predictionsAccuracy of predictionsAccuracy of predictionsAccuracy of predictions

GOR much better at recognizing GOR much better at recognizing -sheets -sheets Both methods are only about 55-65% Both methods are only about 55-65%

accurate.accurate.

Page 24: Protein Secondary Structure Prediction 1
Page 25: Protein Secondary Structure Prediction 1