Chau Fasman using MATLAB

5
Experiment 7 Aim: Write a program to implement Chou-fasman algorithm Equipment Required : Computer with internet connection and Matlab installed Learning Objectives : To acquaint students with the Programming skills, to write program Theory: The Chou–Fasman method is an empirical technique for the prediction of secondary structures in proteins, originally developed in the 1970s by Peter Y. Chou and Gerald D. Fasman. The method is based on analyses of the relative frequencies of each amino acid in alpha helices, beta sheets, and turns based on known protein structures solved with X-ray crystallography. From these frequencies a set of probability parameters were derived for the appearance of each amino acid in each secondary structure type, and these parameters are used to predict the probability that a given sequence of amino acids would form a helix, a beta strand, or a turn in a protein. The method is at most about 50– 60% accurate in identifying correct secondary structures, which is significantly less accurate than the modern machine learning– based techniques. The Chou–Fasman method takes into account only the probability that each individual amino acid will appear in a helix, strand, or turn. Algorithm: The Chou–Fasman method predicts helices and strands in a similar fashion, first searching linearly through the sequence for a "nucleation" region of high helix or strand probability and then extending the region until a subsequent four-residue window carries a probability of less than 1. As originally described, four out of any six contiguous amino acids were sufficient to nucleate helix, and three out of any contiguous five were sufficient for a sheet. The probability thresholds for helix and strand nucleations are constant but not necessarily equal;

description

MATLAB (matrix laboratory) is a multi-paradigm numerical computing environment and fourth-generation programming language.The Chou–Fasman method is an empirical technique for the prediction of secondary structures in proteins, originally developed in the 1970s by Peter Y. Chou and Gerald D. Fasman

Transcript of Chau Fasman using MATLAB

Experiment 7Aim: Write a program to implement Chou-fasman algorithm

Equipment Required: Computer with internet connection and Matlab installed

Learning Objectives: To acquaint students with the Programming skills, to write program

Theory: The ChouFasman method is an empirical technique for the prediction of secondary structures in proteins, originally developed in the 1970s by Peter Y. Chou and Gerald D. Fasman.The method is based on analyses of the relative frequencies of each amino acid in alpha helices, beta sheets, and turns based on known protein structures solved with X-ray crystallography. From these frequencies a set of probability parameters were derived for the appearance of each amino acid in each secondary structure type, and these parameters are used to predict the probability that a given sequence of amino acids would form a helix, a beta strand, or a turn in a protein. The method is at most about 5060% accurate in identifying correct secondary structures, which is significantly less accurate than the modern machine learningbased techniques.

The ChouFasman method takes into account only the probability that each individual amino acid will appear in a helix, strand, or turn.

Algorithm:The ChouFasman method predicts helices and strands in a similar fashion, first searching linearly through the sequence for a "nucleation" region of high helix or strand probability and then extending the region until a subsequent four-residue window carries a probability of less than 1. As originally described, four out of any six contiguous amino acids were sufficient to nucleate helix, and three out of any contiguous five were sufficient for a sheet. The probability thresholds for helix and strand nucleations are constant but not necessarily equal; originally 1.03 was set as the helix cutoff and 1.00 for the strand cutoff.

Turns are also evaluated in four-residue windows, but are calculated using a multi-step procedure because many turn regions contain amino acids that could also appear in helix or sheet regions. Four-residue turns also have their own characteristic amino acids;prolineandglycineare both common in turns. A turn is predicted only if the turn probability is greater than the helix or sheet probabilitiesanda probability value based on the positions of particular amino acids in the turn exceeds a predetermined threshold. The turn probability p(t) is determined as:

wherejis the position of the amino acid in the four-residue window. If p(t) exceeds an arbitrary cutoff value (originally 7.5e3), the mean of the p(j)'s exceeds 1, and p(t) exceeds the alpha helix and beta sheet probabilities for that window, then a turn is predicted. If the first two conditions are met but the probability of a beta sheet p(b) exceeds p(t), then a sheet is predicted instead.Procedure: 1. Make a table of Chou-Fasman Parameters i.e. derived probability parameters of each amino acid residue.2. Save the table by name P_table3. Run Matlab4. Call the table by using P_Table>> P_table5. Enter the sequence seq= 'ala,arg,pro,val,iso,leu,lys,met'6. Run the program formed to find the structure.7. Analyse the result.

Program:seq= 'ala,arg,pro,val,iso,leu,lys,met's1= [Alanine(1)+Arginine(1)+Proline(1)+Valine(1)+Isoleucine(1)+Leucine(1)]d1=s1/6s2= [Arginine(1)+Proline(1)+Valine(1)+Isoleucine(1)+Leucine(1)+Lysine(1)]d2=s2/6s3= [Proline(1)+Valine(1)+Isoleucine(1)+Leucine(1)+Lysine(1)+Methionine(1)]d3=s3/6s4= [Alanine(2)+Arginine(2)+Proline(2)+Valine(2)+Isoleucine(2)+Leucine(2)]d4=s4/6s5= [Arginine(2)+Proline(2)+Valine(2)+Isoleucine(2)+Leucine(2)+Lysine(2)]d5=s5/6s6= [Proline(2)+Valine(2)+Isoleucine(2)+Leucine(2)+Lysine(2)+Methionine(2)]d6=s6/6if (d1||d2||d3>1.03) reply = 'helix'elseif (d4||d5||d6>1.00) reply= 'sheet' else reply= 'turns' end

Required Results: 1. The program is written2. The program is successfully executed.

seq =

ala,arg,pro,val,iso,leu,lys,met

s1 =

6.5200

d1 =

1.0867

s2 =

6.2400

d2 =

1.0400

s3 =

6.7100

d3 =

1.1183

s4 =

6.9100

d4 =

1.1517

s5 =

6.8200

d5 =

1.1367

s6 =

6.9400

d6 =

1.1567

Predicted_structure =

helix

Learning outcomes: