Protein Structure Prediction Using Neural Networks · Protein Structure Prediction Using Neural...

Protein Structure PredictionUsing Neural Networks

Martha MercaldiKasia Wilamowska

Literature ReviewDecember 16, 2003

The Protein Folding Problem

Evolution of Neural Networks

• Neural networks originally designed to approximate connections betweenneurons in the brain

Evolution of Neural Networks

Why use Neural Nets for ProteinFolding?

• Successful applications in:– Secondary structure prediction

– Solvent access

• No “inherent shortcoming” yet found

• Can incorporate evolutionary informationvia multiple alignments

• Detect previous misclassifications

Protein Secondary Structure PredictionBased on Denoeux Belief Neural Network

• Purpose– Using neural nets, effectively predict the

secondary structure of proteins.

• Current best for secondary structureprediction is SSpro8 with accuracy in therange of 62-63%


• Input to the system– Can choose to use DNA or

amino acid sequences

– SSpro8 uses amino acidsequences

– The authors’ system,UTMPred, uses DNA

• Output - forms consistingof alpha helices, betasheets and loopsexpanded to eightstructure forms

Protein Secondary Structure Forms

Protein Secondary Structure Prediction Based on Denoeux Belief Neural Network

x - input tothe network


y – numberof inputnodes


p – prototype,representing a number ofk nearest previouslytrained input to the currenttested input


i – number of prototypesin y-dimensional featurespace


s – the activation function


M – number of outputclasses


ujq – a weight which

represents the degree ofmembership forprototype j to outputclass q


mjq – the BAA mass,

product of weight mjq and

activation function sj


hjq – the conjunctive

combination of the BBA’s



• The DBNN input are DNAsequences converted tobinary format prior to use

• The sequences are:– 88 Escheichia coli proteins

– 25 yeast Saccharomycescerevisiae proteins

– 166 mammalian proteins(80 of which are human)


• The input window sizefor UTMPred is set to7 codons, whichresults in 84 inputnodes and 8 outputnotes which representthe expandedstructural forms.


• UTMPred used 200 prototypes and after the training wascompleted, the system was able to predict H and Eforms with accuracy above 75%. At the same time, thesystem had difficulty predicting form I, due to a smallamount of data in the training samples.

Assignment of Protein Sequence to Functional FamilyUsing Neural Network and Dempster-Shafer Theory

• Purpose– Using neural networks, efficiently predict

protein function

• Using databases such as Prosite, Pfam,and Prints, either query the databases formotifs within a protein in question, orquery for an absence or presence ofarbitrary combinations of motifs.


• Given a training set,induce a classifier able toassign novel proteinsequences to one of theprotein familiesrepresented in thetraining set

• Once trained, theclassifier will be able topredict novel proteins intospecific functionalfamilies based on itsknowledge of the trainingset


• Input data– From the Prosite database containing over 1100

entries. Each entry describes a function shared bysome proteins. In the experiment one Prositedocumentation entry corresponded to a protein class,and each protein class could, in turn, be characterizedby one or more motif patterns/profiles. Only motifsconsidered significant matches by profileScan werechosen.

• DBNN was used as the classifier.


• 585 proteins belonging toone of ten classes wereused, out of whichsubsets of varying sizewere picked randomly tobecome the training set.

• Once the DBNN wastrained, all 585 proteinswere used as the test setto determine accuracy

With only 10% of the total trainingsamples, DBNN could be constructed toclassify proteins with a 95% accuracy.


• The number of falsepositives generated byDBNN were significantlylower than those resultingfrom a Prosite search.

• As the size of the data setapproaches 100%, thefalse positives discoveredby DBNN approacheszero.

The number of false positives resultingfrom the use of the DBNN trained usingtraining sets of different sizes.


• A second data set of 73protein sequences drawn fromfive classes were used to builda DBNN classifier

• Using the DBNN classifier builtby random sized datasets, theoutput exceeded 96%accuracy when the training setwas greater or equal to 22

• Once the input contained morethan 80% (58 or moresequences) of the dataset, allsequences were correctlypredicted Result of classifying proteins

containing common motifs

Future Work

• Ultimate solution to “protein folding” will probablybe a hybrid

• Neural networks likely to be included due to theirsuccessful application to related problems– Secondary structure– Solvent access– Distance between residues in final structure– Protein interface recognition

• In addition, neural nets can combine knowledgefrom multiple sources

Bibliography

• B. Rost. “Neural networks for protein structure prediction: hype orhit?” Artificial intelligence and heuristic methods for bioinformatics(2003): 34-50.

• S.N.V. Arjunan, S. Deris, R.M. Illias. “Protein Secondary StructurePrediction Based on Denoeux Belief Neural Network.” ICAIETProceedings (2002): 554-560.

• N.M.Zaki, S. Deris, S.N.V. Arjunan. “Assignment of ProteinSequence to Functional Family Using Neural Network andDempster-Shafer Theory” Journal of Theoretics 5-1 (2003).

Background information

• S.N.V Arkimam, S. Deris, R.M.Illias. “Prediction of ProteinSecondary Structure” Jurnal Teknologi 35(C) (2001): 81-90.

• T.Wessels, C.W. Omlin.”Refining Hidden Markov Models withRecurrent Neural Networks”.

Protein Structure Prediction Using Neural Networks · Protein Structure Prediction Using Neural...

Documents

Transcript of Protein Structure Prediction Using Neural Networks · Protein Structure Prediction Using Neural...