Research Article An Alternative Approach to Protein...

11
Hindawi Publishing Corporation BioMed Research International Volume 2013, Article ID 583045, 10 pages http://dx.doi.org/10.1155/2013/583045 Research Article An Alternative Approach to Protein Folding Yeona Kang and Charles M. Fortmann Department of Materials Science and Engineering, Stony Brook University, Stony Brook, NY 11794-2275, USA Correspondence should be addressed to Charles M. Fortmann; [email protected] Received 5 April 2013; Revised 20 June 2013; Accepted 31 July 2013 Academic Editor: Rita Casadio Copyright © 2013 Y. Kang and C. M. Fortmann. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. A diffusion theory-based, all-physical ab initio protein folding simulation is described and applied. e model is based upon the driſt and diffusion of protein substructures relative to one another in the multiple energy fields present. Without templates or statistical inputs, the simulations were run at physiologic and ambient temperatures (including pH). Around 100 protein secondary structures were surveyed, and twenty tertiary structures were determined. Greater than 70% of the secondary core structures with over 80% alpha helices were correctly identified on protein ranging from 30 to 200 amino-acid sequence. e driſt-diffusion model predicted tertiary structures with RMSD values in the 3–5 Angstroms range for proteins ranging 30 to 150 amino acids. ese predictions are among the best for an all ab initio protein simulation. Simulations could be run entirely on a desktop computer in minutes; however, more accurate tertiary structures were obtained using molecular dynamic energy relaxation. e driſt-diffusion model generated realistic energy versus time traces. Rapid secondary structures followed by a slow compacting towards lower energy tertiary structures occurred aſter an initial incubation period in agreement with observations. 1. Introduction is work introduces an ab initio physical driſt and diffusion- based protein structure prediction simulation that runs on a desktop PC. e protein folding dynamic is one of the most important problems in biology (e.g., see [1, 2]). Given the numerous manuscripts, journals, and volumes that have been dedicated to the techniques, the progress, and the importance of this work, it is not possible to give fair review here. at being said, the authors recognize the fantastic progress made in statistical and homolog based approaches and the advances these have contributed to all aspects of protein science. However, there are many cases where appropriate homologs are not available and/or where protein structure must be predicted in environments that differ markedly from those used to obtain homolog experimental structures. Also, many protein folding simulations such as molecular dynam- ics (MD) require large amounts of CPU time for protein structure folding and/or prediction and require templates (or homologs) for initiation [3]. For cases without appropriate homologs, an ab initio model with reasonable accuracy is valuable. Several methods to assess the performance of protein structure prediction have evolved. One is the testing of a sufficient number of cases to demonstrate the performance of a given approach. Another is the (CASP) Critical Assess- ment of Techniques for Protein Structure Prediction com- petition [4]. CASP scores the various models by comparing experimentally discovered structures to those obtained by the competition organizers. is work employed CASP9 as a template-free or ab initio modeling [4] and was ranked among the middle scoring groups. Here, a protein folding and structure prediction model based on the first principle forces (energy gradients) and physical kinetics including the driſt and diffusion of residues and/or protein substructures relative to one another, is described. 2. Theoretical Backgrounds 2.1. Multiple Energy Considerations. Physical kinetics and the near equilibrium descriptions are fundamental to chemical engineering and materials science (see, e.g., Barratt et al. [5] and Moore and Pearson [6]). Here, these principles are applied to the protein folding dynamic and the protein structure prediction. e underlying assumption being that the change from one ambient to another is carried out slowly

Transcript of Research Article An Alternative Approach to Protein...

Page 1: Research Article An Alternative Approach to Protein Foldingdownloads.hindawi.com/journals/bmri/2013/583045.pdf · An Alternative Approach to Protein Folding ... e electrostatic force

Hindawi Publishing CorporationBioMed Research InternationalVolume 2013 Article ID 583045 10 pageshttpdxdoiorg1011552013583045

Research ArticleAn Alternative Approach to Protein Folding

Yeona Kang and Charles M Fortmann

Department of Materials Science and Engineering Stony Brook University Stony Brook NY 11794-2275 USA

Correspondence should be addressed to Charles M Fortmann charlesfortmannstonybrookedu

Received 5 April 2013 Revised 20 June 2013 Accepted 31 July 2013

Academic Editor Rita Casadio

Copyright copy 2013 Y Kang and C M FortmannThis is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in anymedium provided the originalwork is properly cited

A diffusion theory-based all-physical ab initio protein folding simulation is described and appliedThemodel is based upon the driftand diffusion of protein substructures relative to one another in the multiple energy fields present Without templates or statisticalinputs the simulations were run at physiologic and ambient temperatures (including pH) Around 100 protein secondary structureswere surveyed and twenty tertiary structures were determined Greater than 70 of the secondary core structures with over 80alpha helices were correctly identified on protein ranging from 30 to 200 amino-acid sequenceThe drift-diffusionmodel predictedtertiary structures with RMSD values in the 3ndash5 Angstroms range for proteins ranging 30 to 150 amino acids These predictionsare among the best for an all ab initio protein simulation Simulations could be run entirely on a desktop computer in minuteshowever more accurate tertiary structures were obtained using molecular dynamic energy relaxation The drift-diffusion modelgenerated realistic energy versus time traces Rapid secondary structures followed by a slow compacting towards lower energytertiary structures occurred after an initial incubation period in agreement with observations

1 Introduction

This work introduces an ab initio physical drift and diffusion-based protein structure prediction simulation that runs on adesktop PC The protein folding dynamic is one of the mostimportant problems in biology (eg see [1 2]) Given thenumerousmanuscripts journals and volumes that have beendedicated to the techniques the progress and the importanceof this work it is not possible to give fair review hereThat being said the authors recognize the fantastic progressmade in statistical and homolog based approaches and theadvances these have contributed to all aspects of proteinscience However there are many cases where appropriatehomologs are not available andor where protein structuremust be predicted in environments that differ markedly fromthose used to obtain homolog experimental structures Alsomany protein folding simulations such as molecular dynam-ics (MD) require large amounts of CPU time for proteinstructure folding andor prediction and require templates (orhomologs) for initiation [3] For cases without appropriatehomologs an ab initio model with reasonable accuracy isvaluable

Several methods to assess the performance of proteinstructure prediction have evolved One is the testing of

a sufficient number of cases to demonstrate the performanceof a given approach Another is the (CASP) Critical Assess-ment of Techniques for Protein Structure Prediction com-petition [4] CASP scores the various models by comparingexperimentally discovered structures to those obtained bythe competition organizers This work employed CASP9 asa template-free or ab initio modeling [4] and was rankedamong the middle scoring groups

Here a protein folding and structure prediction modelbased on the first principle forces (energy gradients) andphysical kinetics including the drift and diffusion of residuesandor protein substructures relative to one another isdescribed

2 Theoretical Backgrounds

21 Multiple Energy Considerations Physical kinetics and thenear equilibrium descriptions are fundamental to chemicalengineering and materials science (see eg Barratt et al[5] and Moore and Pearson [6]) Here these principlesare applied to the protein folding dynamic and the proteinstructure prediction The underlying assumption being thatthe change from one ambient to another is carried out slowly

2 BioMed Research International

enough to justify near equilibrium mechanics Challengesinclude relative subprotein structures motion in two or moreenergy fields

In steady state and near equilibrium the diffusion-drivendispersion and force-driven drift must balance to precludeunnatural energy accumulations In protein structure twoor more energy fields almost always simultaneously actResolving particle motion in two or more energy fieldswas previously described [7] The work showed that relativemotion induced in global protein entropy change could beincorporated into electric field mobility [7]

120583 =119863nabla ln (119899119901)

nabla 119881119875 minus 119878119879 + sum119895 120601119871119895119899119895 + 119896119879nabla ln (119899119901)

(1)

where 119863 is the diffusivity 119896 is the Boltzmann constantand 119879 is temperature in Kelvin It was found that only thelargest energy changes (eg the product of global entropyand temperature) made significant contribution to thismobility where 119899119901 is an identifiable protein substructuralspecies or atom concentration and the summation is overall species 119899119895 having a related (nonelectrostatic) energy (egglobal entropy-temperature product or stain energy) 120601119871119895 subject to change upon folding

As ambient conditions change the protein energy (or freeenergy) becomes greater than that at equilibrium Relativeenergies and forces (gradients of the relative energy) act oneach member of the protein These forces in turn result in adrift speed defined by

drift speed = 120583119865 (2)

where 119865 is the force on a particular atom or site and 120583 is thedrift mobility (1)

22 Force Considerations While global entropy change wasincorporated intomobility it is necessary to consider the sev-eral other energy gradients (force) inducing motion Forceswere summed and subprotein structures were allowed tomove relative to one another In this work the net charge andhydrophobicity of each side chain of residues were assignedto the nearest backbone atom (see Figure 4)

Four explicit forces were considered the electrostatic(Coulombrsquos) force (applied to charges and dipoles) elec-trostatic displacement force (as defined below) generalizeddiffusive force (a thermal force as defined below) and globalentropy change Summed forces about each alpha carbon (inpivot bond) were cataloged and the greatest was chosen toproduce torque andmotion about the identified alpha carbonatom After the drift-diffusion determination of tertiarystructure the molecular dynamics (MD) model was used forfinal structure determination via energy relaxation

The electrostatic force between two charged bodies(1199021 and 1199022) is easily determined from Coulombrsquos law usingan appropriate dielectric constant (3) This research foundthat the dielectric constant of water 120598119908 yielded good results

where 120598119908 is sim781205980 (where 1205980 is the permittivity of vacuum)Therefore the Coulomb force is

119865elect =11990211199022

41205871205981199081199032 (3)

Kang et al previously described the electrostatic displace-ment force in the context as used here [7] and as applied inneural ionic transport where the electrostatic displacementforce arises from the attraction between mobile (eg liquid)polar media and an electric field [8] The polar media aredrawn towards an increasing electric field thereby producinga force that acts to sweep mobile uncharged nonpolar media(eg fixed ion electric field liquid water and a nonchargedresidue) producing attraction The attraction of polar mediaproduces a corresponding displacement force that alwaysmoves nonpolar protein regions towards a lower electricfield region (ie the nonpolar regions are pushed away fromstrong electric field by the inrush of water or other polarmedia toward strong field)

Simplified models for interaction of water nonpolarmedia and electric field with respect to protein structurehave been developed For example the generalized born (GB)model [9] has been used to track the electric field energyandor solvation energy with respect to protein charge andpartial charge Since this work follows a folding protein thevarious protein charge and partial charge move with respectto one another Therefore of the cases examined by Bashfordand Case [9] the case in which two proteins moved withrespect to one anothermost closely relatesThe kinetic model(KM) approach enabled a simple physical approach to theseconsiderations (see Kang et al [7])

Water dielectric constant is reasonable for distant chargedspecies since the predominant media separating these iswater However in the case of secondary structure andcollapsed protein structure the small distances betweencharged species are likely to have significant protein contenthaving a low dielectric constant Nonetheless the simulationproduces secondary structure with a high degree of accuracyIn the secondary structure generating subroutine describedin the next section parameters are fine-tuned to produce thebest agreement with experiment and even higher accuracyis obtained Typically the highest accuracy results for thetertiary structure determinationwere obtained using the fine-tuned secondary structure subroutine to identify secondarystructure location followed by a KM simulation collapse ofthe denatured protein with secondary structures and finallya full molecular dynamic relaxation (based on AMBER10)of the 3D structure in which local dielectric property isgenerated and used As noted elsewhere the use of AMBER10(and its more accurate dielectric consideration) increasedaccuracy by about 3

There is an energy associated with the electric fieldimbedded in a dielectricmedia Forces can be generatedwhencharge or nonpolar structure motion induces a reduction inthis energyTherefore it is necessary to define a displacementenergy and a displacement force described in terms ofthe electric field energy 119882 for water-filled regions (withpermittivity 120598119908) relative to the region filled with a nonpolarstructure (assumed to have the permittivity of vacuum 1205980)

BioMed Research International 3

2

0

minus2

minus4

minus6

minus8

minus10

minus12

minus14

minus16

minus18

minus200 05 1 15

Distance (nm)

Forc

e (10

minus1

eVc

m)

(A) Blue electrostatic force(B) First step area bubble force 1(C) Second step area bubble force 2

(A)

(B)(C)

Figure 1 Displacement force the electrostatic force (A) the forcemap representing direct contact of two nonpolar residues (B) andextension of displacement force (C) corresponding to conveyanceof the displacement force via equilibrium nonpolar interaction

Following Jackson [10] this energy consideration can bequantified

119882 = minus1

2int1198811

(120598119908 minus 1205980) ∙ 0119889119881 (4)

asymp minus1

2(120598119908 minus 1205980) ∙ 0Δ119881 (5)

asymp minus120573100381610038161003816100381611990210038161003816100381610038162

1199034 (6)

Force = minus120597119882120597119903

= minus4120573100381610038161003816100381611990210038161003816100381610038162

1199035 (7)

where the integration is over the volume of the nonpolarprotein regionThe nonpolar volume and the other constantsare collected into single term 120573 (as seen in (6) and (7))without loss of accuracy The magnitude of the displacementforces acting upon a neutral alpha helix-sized object underthe influence of an electronic point charge at sim01 nm iscomparable to theCoulomb force between twoopposite pointcharges at a similar distance Figure 1 shows the displacementforce values based on the distance between two nonpolarregions relative to electrostatic force

The third force considered here is thermal force Ther-mal force is related to the average motion or speed ofone part of the protein relative to another arising fromdiffusion There are several possible formulations of thethermal force The thermal diffusion speed is ] = radic119863120591 rarr

radic120583119896119879120591 (since 120583 = 119863119896119879) The force needed to produce

the same speed through drift is radic119896119879120583120591 where drift speed =120583nabla120601eff and nabla120601eff is an effective force

The concept of thermal force can be understood inthe context of generalized force (eg see Glicksman [11])whereby a concentration gradient and the correspondingdiffusive currents offset an applied force This principal iswidely applied for example in the separation of isotopeswhere the inverse the force needed to halt a diffusion-drivenconcentration change is considered

The forth force global entropy change was consideredHere the global entropy is taken to be the aggregate by allprotein entropy terms that change under folding A changein global entropy is estimated by considering a proposedchange in protein structure The global entropy values usedhere (generally decreasing on compacting) were determinedby fitting to observation (see Kang et al [7]) The estimatedentropy change was in turn used to determine a newmobilityA proposed protein change was computed by allowing a forceto act producing a speed consistent with the new mobilityThe resultant speed was allowed to act for one time step

Total force in protein structure can be defined by the sum-mation of three forces (8) The forth force global entropy-temperature product is incorporated into the mobility Thebest structure predictions were obtained when the globalentropy diminished on folding For example at room tem-perature an entropy-temperature gradient (2nd term in thedenominator of (1)) of asymp104 eV(Kcm) was typical

The remaining three forces can be expressed as

119865 = (11990211199022

41205871205981199032) + (

120573100381610038161003816100381611990210038161003816100381610038162

1199035) + radic

119896119879

120583120591 (8)

997888rarr (11990211199022

41205871205981199032) + (

120573100381610038161003816100381611990210038161003816100381610038162

1199035) + 120574radic119896119879 (9)

where defining the quantity radic1120583120591 rarr 120574 helps clean up theequations

The relation among charge water and hydrophobicresidue association is quantified in (9) The first term of(9) (Coulomb) quantifies the force between charges andorcharge and polar structure

It is well known that hydrophobic residues aggregationand water surface reduction relate to alpha helix formationThese hydrophobic interactions are characterized in terms ofthe aforementioned displacement force However in carefulexaminations of natural protein structures there exist manyprotein regions containing large numbers of hydrophobic(nonpolar) residues which do not bond (aggregation) anddo not form alpha helices This is understood in terms ofthe displacement force whereby charge attracts polar waterand in turn the polar water pushes nonpolar hydrophobicresidues apart

The repulsive force (second term of (9)) is pushing un-charged nonpolar structures away from the charged regions(by incoming polar water) causing the hydrophobic residuesto remain separated Therefore when sufficient charge existsin the intervening protein segments (eg charged residues)the repulsive component (second term of (9)) generates

4 BioMed Research International

1

09

08

07

06

05

04

03

02

01

008070605040302010

Radius (nm)

Prob

abili

ty d

ensit

y

1 bar100 bar

1000 barElectrostatic case

Figure 2The probability of a nonpolar region in water as a functionof pressure and electric field (as indicated) is shown Increasingelectric field significantly reduces the nonpolar region probability

a repulsive force sufficient to block the more outlying hydro-phobic residues fromapproaching one another Figure 2 illus-trates this electric fieldwith induced blocking Additionally itwas found that extremely large residue sizes could also inhibitalpha helix formation

The thermal force tends to randomize the net force ortorque acting on any given backbone atom pair Howeverearly in the folding the strong force between nearly spacedhydrophobic and electrostatic pairs overwhelms the thermalforce Consequently the simulation tends to hang up (repeat-edly returns to move on) a particular strong force generatingpair After a particular atom pair generates the largest torquefor five sequential time steps the simulation freezes themotion about this particular backbone atompair by removingits force (or torque) from the next time step query Therebythe simulation preserves the structure generated in five timesteps relative to the torques generated by the nearly spacedhydrophobic andor electrostatic pairs

It was found that almost all of the frozen backboneatomic pairs were part of secondary structures and that theseformed early in the folding process In this regard thesestructures are related to autonomous folding units [12] Inturn the relationship between strong forces on nearly spacedhydrophobic and electrostatic pairs can be directly correlatedto secondary structure as considered in the next section

Equation (9) is useful on all size scales On the smallestsize scale (9) can be the potential to generate secondarystructure On larger size scales the expression guides the gen-eration of tertiary structures by describing the force betweencharged regions andhydrophobic secondary structures Fromthe insights provided by this description a set of rules gov-erning secondary structure formation were developed

One of the attributes of the presented physical modelis that the workings that produce structure are open toinspection and understanding Accordingly examination ofsecondary structure in some cases time step by time steprevealed the nearly balanced interplay of the hydrophobicversus electrostatic elements of (9) Consequently two pathsfor improved secondary structure were presented First theconceptually simpler is the fine-tuning of the force equa-tions The second is that the recognition of nearly balancedhydrophobic and electrostatic forces imply secondary struc-ture formation Therefore an algorithm was developed tosearch for large nearly balanced force In this writing thesecond approach (described in the following section) is themore accurate Using the second approach requires the use ofa graphic program (such as Pymol) to insert the appropriatestructure into the 3D structure

3 A Streamlined Secondary StructurePrediction Model

Historically secondary structure prediction has advancedover the past five decades Secondary structure predictionintroduced in the 1960s and early 1970s [13ndash15] focused onidentifying alpha helices Beta sheet identification also relyingon statistical analysis [14 15] began in the 1970s Evolutionaryconservationmethods exploited the simultaneous assessmentof many homologous sequences to determine probabilisticrelation between protein sequence and secondary structure[16 17] Larger experimental structure databases and mod-ern machine learning methods have achieved sim80 overallsecondary structure prediction accuracy in globular proteins[18]

In the presented model the relative location of residuesand the hydrophobicpolar and charge characteristics ofeach residue are the only input elements The hydrophobiccharacter as a function of ambient can be found in theliterature (see eg [19 page 14 Table 12]) Also amino acidcharge and partial charge can be determined using popularsoftware suites including the (MOE) Molecular OperatingEnvironment software package which employs the standardAMBER 10 parameter set to calculate the force field [20] Insome cases (not reported here) extreme ambients such asvery low pH require appeal tomore sophisticated commercialsoftware suites

The algorithm for secondary structure prediction sys-tematically steps through an arbitrary amino acid sequenceWhen hydrophilic amino acids are encountered a mechan-ical set of queries determines if secondary structure ispresent These queries relate to the hydrophilic character ofthe following neighboring residues The simulation used bythis work employs secondary structure search beginning atthe occurrence of a hydrophilic residue after hydrophobicresidue and ending at the next hydrophobic residue Thecharge size and polarity of the intervening amino acidresidues determine both a secondary structure region and itstype A delicate balance of charge polarity and hydrophobiccharacter determines the secondary structures of protein

Upon encountering a hydrophilic amino acid (119899119894)a scan bracket is opened The following in the sequence

BioMed Research International 5

Table 1 Physical conditions from alpha helix identification

Case sum119899

119894=1119902119894 Π

119898

119895=119896119902119895 Other 119902119894 sum

119899

119894=1ℎ119894

119899 = 1

0 lt 119886 lt 02 gt0lt minus05 =0

1199021 gt 09 lt minus031 lt 119886 lt 05 lt0

119899 = 3gt1 1199022 = 1199023

|119886| lt 05 lt minus60

119899 = 5

03 lt 119886 lt 05

|119886| gt 10

119902119894 gt 0

|119902119894| gt 06

119902119894 partial charge of 119905th residue ℎ119894 hydrophobicity of 119894th residue119899 number of residue

(119899119894+1 119899119894+6) are queried unless the finding of a hydrophilicresidue ends the search (ie the end point (119899119894+119895) occurs atthe next hydrophilic residue) If no hydrophilic residues areencountered within the 119894 + 6 nearest neighboring residuesthe algorithms denote the region as unstructured and thenmove on to the next hydrophilic residue in the sequencewhere a new search begins Importantly in all cases wherea second hydrophilic residue (thus ending a search) isfound within the next six nearest neighbors a secondarystructure alpha helix will form in accordance with the rulessummarized in Table 1

Determination of the secondary structure type withina given sequence meeting the six nearest neighbors rulefollows a simple hierarchy First if the stringent conditionsfor hydrophobic collapse are unopposed by (sufficiently large)dielectric displacement forces an alpha helix will formSecond whenever alpha helix formation is blocked (egby the dielectric displacement force induced by interveningcharged residue) a beta sheet region will form Thereforealpha helix regions can be transformed into a beta sheet bymutations resulting in one or more charged residue(s) in thecritical areas between hydrophobic residues The formationof fibrils in Alzheimerrsquos patients is an example of a case wherea small mutation causes an alpha helix collapse [21]

The summation of charges (Table 1 column 2) deter-mines the overall charge and therefore the magnitudeof the electric field exerts repulsive force on hydrophobicresidues Theis force opposes hydrophobic residue aggrega-tion and therefore alpha helix cannot form The product ofcharge column 3 of Table 1 combined with the summationof charge relates to the magnitude of electrostatic attrac-tive (or repulsive) force The summation of hydrophobiccharacter (column 5 of Table 1 sum119899

1ℎ119894) is an indication of

the net hydrophobic attractive force operating within theregion Using the equalities and inequalities shown in Table 1very good predictions of secondary structure were obtained(Table 2) It is apparent that the algorithm is sensitive tovery small changes in charge andhydrophobic characteristicsThe above procedure for determining an alpha helix can beused to determine whether there is an alpha helix region inany portion of interest on the given amino acid sequence

In order to obtain all of the alpha helix regions the procedurecan be repeated by scanning through the entire amino acidsequenceThereafter the determination of a beta sheet regioncan be initiated

For determining beta sheet a residue on the aminoacid sequence is first selected If the residue is denotedunstructured (ie it is not previously determined to belongto an alpha helix or beta sheet region) the next residue isselected The procedure is repeated until an unstructuredresidue 119899119894 is encountered A scanning bracket is openedusing this unstructured residue as a starting residue and itsnext 4 consecutive residues (119899119894+1 through 119899119894+4) are queriedto determine if they are all unstructured If the answer isno the procedure is stopped Otherwise a scanning bracketof residues 119899119894 through 119899119894+4 is established Beta sheet deter-mination is then performed based on the summation of themagnitude of charges and the summation of the hydrophobiccharacter of each residue in the 5-residue bracket Forexample such a 5-residue bracket is determined to be a betasheet when sum119894+4

119894|119902119894| minus sum

119894+1

119894ℎ119894 lt 03 and sum119894+1

119894ℎ119894 gt 01 The

above procedure for beta sheet determination can be repeatedfor the entire amino acid sequence to obtain all of the betasheet structures on the sequence

Figure 3 shows secondary structure prediction to experi-ment for a wide cross-section of proteins The experimentalstructure was obtained from the Rost and Sander result[22] The algorithm found essentially all of the secondarystructures and correctly determined their character Theaccuracy of this procedure has been tested on hundreds ofproteins producing accuracies of sim70 plusmn 9 for alpha helixand sim66 plusmn 14 for beta sheet identification (residue-by-residue comparison between model and experimental forproper secondary structure assignment) Furthermore coresecondary structure identificationwas even greatersim75plusmn 7and sim70 plusmn 12 for helices and beta sheet respectively Theseare among themost accurate secondary structure predictionsto date

Table 2 compares the accuracy of the KM model (thekinetic model described here) relative to other popularmodels described in the literatureThe kinetic model demon-strated state-of-the-art accuracy in overall secondary struc-ture prediction and excellent alpha helix prediction Thecommercial PSIPRED [23] (is based on a statistical approach)also predicted secondary structure regionsrsquo size and locationextremely well in some cases but in others failed to identifythe presence of secondary structure The kinetic model gen-erally identified the presence of almost all of the structures butthe start and end points varied from those of the experiment

4 Tertiary Structure

While various forms of backbone atom tagging have beendone previously for example in coarse grain models theseapproaches differ from that used here As stated above thetagging used here involves assigning residue hydrophobicityand charge to the backbone atom bonded to the residue

In cases where the physical conditions are stronglyindicative of a particular secondary structure (eg alphahelix) the program directly inserts this secondary structure

6 BioMed Research International

Table 2 Comparison between KM and PSIPREDKM KM KM PSI PSI PSI

PDB all 120573 120572 all 120573 1205721hdn 738 785 8226 89 88 931ubq 7632 7273 9411 82 73 771vii 722 non 7895 80 non 1002nmq 6418 6538 non 60 54 non1pba 7404 non 7333 68 non 701aps 847 80 90 80 83 871aey 673 778 non 44 35 non1coa 4531 7105 6153 81 60 921fkb 6161 7297 100 88 90 1001mjc 7101 8064 non 69 70 non1nyf 5973 40 70 62 54 01pks 65 5357 6475 59 48 01shg 632 6123 non 42 25 non1srl 6556 75 100 53 37 01ten 7444 68 non 80 85 non1ycc 634 5858 6001 51 0 532ci2 6626 6434 7984 47 28 0Avg 6759 6799 7998 6676 5533 5169KM means the developed model and PSI means PSIPRED which is developed by the University of London Other names mean the percentage of predictionof total structure alpha-helix and beta-sheet with respect to two different methods To compute prediction rate NMR structure of each protein was used

without wasting computational resource to move atoms tothe correct position one-by-one That is the secondarystructure program is used to identify the strong propensityfor secondary structure formation and where indicated thisstructure is directly generated by graphical software forinstance Pymol and inserted into the tertiary structureOnce a secondary structure is so generated and inserted itis assumed immutable For the purpose of continued tertiarystructure folding the inserted immutable secondary struc-ture is tagged with its appropriate summed partial chargesand hydrophobicity of each secondary structure region Theportion of the protein not belonging to any determined alphahelix or beta sheet that is the unstructured portion can bebuilt as a linear chain or in an arbitrary physically permissibleconformation During tertiary structure simulation deter-mined secondary structures inner residues were frozen Dur-ing tertiary structure folding simulation secondary structureregions are treated as one residue

As illustrated in Figure 4 the relative motion of one partof a protein relative to the other parts is determined byallowing the alpha carbon bond pair having the largest nettorque (sum of actual force and the random effective thermalforce multiplied by the appropriate lever arm length) to movein each time step

The protein was allowed to drift and diffuse toward alower energy in accordance with the procedures describedabove The Markov simulation used here is summarizedin Figure 5 Starting with linear amino acid sequences theprotein molecule is allowed to drift and diffuse via rotationabout the torque rotation angle of pivot bonds as seen inFigure 4 The simulation allows the strongest force (sum ofthe actual force and the effective thermal force) to operate on

the protein for a time period 119905 using a mobility consistentwith (1) In cases where the global energy involves density-dependent entropy directions that increase entropy proceedwith a higher mobility thanmoves in directions that decreaseentropy

Figure 6 shows a comparison between simulated 3Dstructure of Villin (1VII) and the experiment The kineticmodel in this example produced near state-of-the-art foldedstructure with RMSD values for the backbone atoms of testproteins while using less than a minute of desktop computerCPU time In Table 3 various protein structures rangingfrom 30 to 157 amino acids were determined using thekinetic model Average RMSD value for this series relative toexperiment was found to be 49 plusmn 108 A

The largest protein reported here was the human proteintyrosine phosphatome with 320 amino acids The kineticmodel produced a RMSD value of sim8 A This value isreasonable relative to other current ab initio methods for aprotein of this size

It was found that 2sim3 improved RMSD values couldbe attained using a molecular dynamics structure relaxationThe finalMD relaxation step employed no statistical methodsor templates This relaxation employed nominal AMBER10default parameters for dielectric constant and other param-eters There was no attempt to optimize these parametersTherefore these energy relaxations could be carried outquickly (running AMBER on a supercomputer)

The energy reduction acting during folding could easilybe determined by tracking the force and distance (average)that accompany all folds or motions throughout the sim-ulation Figure 7 shows a typical energy versus time traceThe energy was calculated by assuming an arbitrary initial

BioMed Research International 7

0

10

20

30

40

50

60

70

80

90

0 50 100 150 200 250 300 350 400 450

Total structure prediction values

ldquoTotalrdquo

Number of residue

Pred

ictio

n (

)

(a)

0

20

40

60

80

100

120

0 50 100 150 200 250 300 350 400 450

Alpha structure prediction

ldquoAlpha helixrdquo

Number of residue

Pred

ictio

n (

)

(b)

0

20

40

60

80

100

120

0 50 100 150 200 250 300 350 400 450

Beta sheet structure prediction

Beta sheet

Number of residue

Pred

ictio

n (

)

(c)

Figure 3 The accuracy of the drift-diffusion model secondary prediction accuracy (percent of proteins tested) versus the size in number ofresidues (a) shows the accuracy for total protein core structures versus protein size (b) shows the accuracy of core alpha helix versus proteinsize and (c) shows the accuracy versus protein size Also shown in each case is the least mean square best fit to the data sets Data base sizesim100 proteins

energy and subtracting the work done 119865 sdot 120579 sdot 1199030 in each timestep where 119865 is the force applied (the sum of hydrophobicelectrostatic and thermal forces) 120579 is the calculated anglechange and 1199030 is the average lever arm length of the proteinInitially the energy swings are large and random but as thefolding proceeds these energy fluctuations are diminishedSince the folding is conducted at room temperature completecessation of motion does not occur

5 Result

A highly accurate secondary structure method has beendescribed The presented physical model predicts secondary

structure at least as well as advanced statistical basedmethodsrequiring known template structures The extension of themodel to tertiary structure dynamics with near state-of-the-art accuracy has also been described It was found thatthermal and repulsive electrostatic forces are sufficient toprevent unrealistic protein collapse

This high-speed physical model provides folding tra-jectory in real time with sensitivity to the environmentThereby doors to faster identification of function and agreater understanding of biological pathways are openedWhile some statisticalmethods canmatch both the speed andaccuracy of the kineticmodel it is important to recognize thatthe relating of structure to function can only be guided by anaccurate physical description of the forces that shape proteins

8 BioMed Research International

Torque forces

Pivotbond

applied to lever arms

120579

R

120579 asymp120583120513120601energy 120591

r

Figure 4 Illustrated is the angle rotation (120579 or 119877) about analpha carbon backbone bond pair induced by the resolved torqueoperating on the bond In this figure each circle represents 119862120572 atomfor each amino acid and big circles illustrate side chains which havestrong charges andor large hydrophobic forces

Table 3 RMSD result for tertiary structures

Proteins No RES Score for 2D RMSD of 3D2MHU 30 733 31VII 35 722 481CBH 36 753 433RNT 54 6165 411DUR 55 7445 371OVO 56 539 581BW6 56 70 612NMQ 57 6418 582UTG 70 6285 381UBQ 76 7632 542PCY 99 611 45CYT 103 6894 477RSA 124 737 511PAB 127 7404 422CCY 128 8156 632SNS 149 5402 671AAQ 157 73 62AVG 8306 6994 494A tertiary structure prediction result based on the kinetic model RMSDvalues are based on each backbone atom position comparison for tertiarystructure prediction Also a score of 2D means the matching percentage forthe secondary structure determination

6 Discussion and Conclusion

The era where protein folding can be tracked as a functionof ambient condition without templates or other a prioriknowledge has begun The accuracy of the presented all-physical model rivals the best statistical methods in sec-ondary structure and secondary core structure prediction

The fine-tuning of a secondary structure algorithm can beimproved by controlling some environment conditions forinstance pH Tertiary structure predictions do not match thebest statistical andor the best molecular dynamics models(when provided with suitable templates) However the accu-racy is amongst the best ab initiomethods Further increasedaccuracy (2sim3) could be obtained using a finishing MD-based energy relaxation step

For example the Villin headpiece a well-studied mod-erately small protein has been studied as a function oftemperature Hansmann and coworkers [24ndash26] achievedvery small RSMD values in the range of sim30 A (sim18 A for thecore region and 37 A for the entire protein) when comparedwith the NMR determined structure for this protein Incomparison the all-physical model presented here achievedRMSD values only in the 37 A range

It is also important to recognize that no model canexceed the accuracy of the measurements used to determinethe experimental structure The Villin headpiece proteinexperiment has a sim18 A inherent uncertainty It is oftenchallenging to obtain all of the ambient conditions (includingtemperature pH and process steps that may alter residuecharge) used for a given experimental structure determi-nation On the other hand the model can be used totrack structure change due to small changes in ambient(eg pH or temperature) Some MD-based modelings avoidthis problem by using a seed or template protein structureobtained under similar experimental conditions Howeversuch proceduremay restrict themodeling applications in realworld situations The kinetic model is expected to continueto improve and become a useful tool for the investigationof protein structure especially where there is little a prioristructure knowledge a need to elucidate the folding pathwayandor required high speed

Owing to the speed of the physical model the proteinfolding dynamic can be traced Here the protein energy isseen to randomly vary initially folioed by secondary for-mation and finally convergence to a more definite structurewith decreasing energy variations as folding precedes Inagreement with experiment the simulations tend towarddefinite structure but do not reach complete stasis

Alternative high-speed computational methods such asEVFold [27] enablewidespread access to fast prediction basedon statistical analysis of homologous sequences Even thoughthe two approaches are based on very different paradigmstogether and with other contributions there is an emergingand improving ability to predict protein structures withoutthe need to become experts of a particular approach

To illustrate guidance provided by the physical modelconsider the mechanisms by which the chaperone-subunitcomplex of a diverse group of Gram-negative bacteria [28]gains passage through the Papc usher channel at the outermembrane Here an approaching chaperone complex trig-gers a response by which outer membrane usher passageunblocks Applying the discussed concepts to the block-ing structure it is recognized that the highly hydrophobicblocking structure (since it is comprised of alpha helices)always moves towards lower electric field (second term onthe right of (8)) The usher channel interior (beta barrel) is

BioMed Research International 9

Start

Secondary structure generation

Assign secondary structure properties

to C120572-atom (charges

hydrophobicity)

Structural forces

calculation

Calculate potential angle

to rotate

Bond rotation

Check global entropy based on

volume

End

Figure 5 Illustrated is a simplified flow chart of the Markov simulation used in physical-kinetic simulation for tertiary structure

Figure 6 Comparison of a tertiary structure of Villin (1VII) asdetermined by experiment (NMR structure blue) with a modelgenerated structure (simulation result black)

highly charged giving rise to the possibility that the blockingstructure is moved to the side of the channel where theelectrical field is smallest and whenever an approachingcharged structure in combination with the channel chargeproduces an appropriate small combined electric field Suchinsights may lead to new strategies for drug delivery

Acknowledgments

The authors thank John Coleman Carlos Simmerling andProfessor Tanassi for many useful discussions The authorsalso thank Solar Physics Inc NYS Sensor Center for

0

1000

2000

3000

4000

5000

6000Free energytime

ldquoFree energytimerdquo

1 101 201 301 401 501 601 701 801 901 1001 1101 1201 1301

Figure 7 Free energy trajectory (calMol) during Villin simulation

Advanced Technology and Idalia Solar Technologies forpartial support of this work and Marc Dee for his assistancewith paper preparation

References

[1] K A Dill S B Ozkan T R Weikl J D Chodera and V AVoelz ldquoThe protein folding problem when will it be solvedrdquoCurrent Opinion in Structural Biology vol 17 no 3 pp 342ndash3462007

[2] A C Clark ldquoProtein folding are we there yetrdquo Archives ofBiochemistry and Biophysics vol 469 no 1 pp 1ndash3 2008

10 BioMed Research International

[3] C Hardin T V Pogorelov and Z Luthey-Schulten ldquoAb initioprotein structure predictionrdquo Current Opinion in StructuralBiology vol 12 no 2 pp 176ndash181 2002

[4] httppredictioncenterorgcasp9[5] C R Barrett W D Nix Tetelman and S Alan Principles of

Engineering Materials Prentice Hall New York NY USA 1973[6] J W Moore and R G Pearson Kinetics and Mechanisms John

Wiley amp Sons New York NY USA 3rd edition 1981[7] Y Kang E Jaen and C M Fortmann ldquoEinstein relations for

energy coupled particle systemsrdquoApplied Physics Letters vol 88no 11 Article ID 112110 2006

[8] Y Kang and C M Fortmann ldquoA structural basis for theHodgkin and Huxley relationrdquo Applied Physics Letters vol 91no 22 Article ID 223903 2007

[9] D Bashford and D A Case ldquoGeneralized born models ofmacromolecular solvation effectsrdquo Annual Review of PhysicalChemistry vol 51 pp 129ndash152 2000

[10] J D Jackson Classical Electrodynamics Wiley amp Sons NewYork NY USA 3rd edition 1999

[11] M E Glicksman Di Usion in Solids John Wiley amp Sons NewYork NY USA 2000

[12] Z Peng and L C Wu ldquoAutonomous protein folding unitsrdquoAdvanced in Protein Chemistry vol 53 pp 1ndash30 2000

[13] P Y Chou and G D Fasman ldquoPrediction of protein conforma-tionrdquo Biochemistry vol 13 no 2 pp 222ndash245 1974

[14] V I Lim ldquoStructural principles of the globular Establishinghomologies in protein sequencesrdquo Journal of Molecular Biologyvol 88 pp 857ndash873 1984

[15] J Garnier D J Osguthorpe and B Robson ldquoAnalysis of theaccuracy and implications of simple methods for predicting thesecondary structure of globular proteinsrdquo Journal of MolecularBiology vol 120 no 1 pp 97ndash120 1978

[16] G Deleage and B Roux ldquoAn algorithm for protein secondarystructure prediction based on class predictionrdquo Protein Engi-neering vol 1 no 4 pp 289ndash294 1987

[17] S R Presnell B I Cohen and F E Cohen ldquoA segment-based approach to protein secondary structure predictionrdquoBiochemistry vol 31 no 4 pp 983ndash993 1992

[18] H H L Howard Holley l and M Karplus ldquoProtein secondarystructure prediction with a neural networkrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 86 no 1 pp 152ndash156 1989

[19] R A CopelandMethods for Protein Analysis A practical Guideto laboratory protocols Chapman amp Hall New York NY USA1994

[20] MOE Chemical Computing Group 1010 Sherbrooke St WSuite 910 Montreal Quebec Canada G3A 2R7 httpwwwchemcompcom

[21] MGuo PMGormanM Rico A Chakrabartty andDV Lau-rents ldquoCharge substitution shows that repulsive electrostaticinteractions impede the oligomerization of Alzheimer amyloidpeptidesrdquo FEBS Letters vol 579 no 17 pp 3574ndash3578 2005

[22] B Rost and C Sander ldquoPrediction of protein secondary struc-ture at better than 70 accuracyrdquo Journal of Molecular Biologyvol 232 no 2 pp 584ndash599 1993

[23] httpbioinfcsuclacukpsipred[24] C-Y Lin C-K Hu and U H E Hansmann ldquoParallel temper-

ing simulations of HP-36rdquo Proteins vol 52 no 3 pp 436ndash4452003

[25] B S Kinnear M F Jarrold and U H E Hansmann ldquoAll-atomgeneralized-ensemble simulations of small proteinsrdquo Journal ofMolecular Graphics and Modelling vol 22 no 5 pp 397ndash4032004

[26] W Kwak andU H E Hansmann ldquoEfficient sampling of proteinstructures by model hoppingrdquo Physical Review Letters vol 95no 13 Article ID 138102 4 pages 2005

[27] D S Marks L J Colwell R Sheridan et al ldquoProtein 3D struc-ture computed from evolutionary sequence variationrdquo PLoSONE vol 6 no 12 Article ID e28766 2011

[28] H Remaut C Tang N S Henderson et al ldquoFiber formationacross the bacterial outer membrane by the chaperoneusherpathwayrdquo Cell vol 133 no 4 pp 640ndash652 2008

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 2: Research Article An Alternative Approach to Protein Foldingdownloads.hindawi.com/journals/bmri/2013/583045.pdf · An Alternative Approach to Protein Folding ... e electrostatic force

2 BioMed Research International

enough to justify near equilibrium mechanics Challengesinclude relative subprotein structures motion in two or moreenergy fields

In steady state and near equilibrium the diffusion-drivendispersion and force-driven drift must balance to precludeunnatural energy accumulations In protein structure twoor more energy fields almost always simultaneously actResolving particle motion in two or more energy fieldswas previously described [7] The work showed that relativemotion induced in global protein entropy change could beincorporated into electric field mobility [7]

120583 =119863nabla ln (119899119901)

nabla 119881119875 minus 119878119879 + sum119895 120601119871119895119899119895 + 119896119879nabla ln (119899119901)

(1)

where 119863 is the diffusivity 119896 is the Boltzmann constantand 119879 is temperature in Kelvin It was found that only thelargest energy changes (eg the product of global entropyand temperature) made significant contribution to thismobility where 119899119901 is an identifiable protein substructuralspecies or atom concentration and the summation is overall species 119899119895 having a related (nonelectrostatic) energy (egglobal entropy-temperature product or stain energy) 120601119871119895 subject to change upon folding

As ambient conditions change the protein energy (or freeenergy) becomes greater than that at equilibrium Relativeenergies and forces (gradients of the relative energy) act oneach member of the protein These forces in turn result in adrift speed defined by

drift speed = 120583119865 (2)

where 119865 is the force on a particular atom or site and 120583 is thedrift mobility (1)

22 Force Considerations While global entropy change wasincorporated intomobility it is necessary to consider the sev-eral other energy gradients (force) inducing motion Forceswere summed and subprotein structures were allowed tomove relative to one another In this work the net charge andhydrophobicity of each side chain of residues were assignedto the nearest backbone atom (see Figure 4)

Four explicit forces were considered the electrostatic(Coulombrsquos) force (applied to charges and dipoles) elec-trostatic displacement force (as defined below) generalizeddiffusive force (a thermal force as defined below) and globalentropy change Summed forces about each alpha carbon (inpivot bond) were cataloged and the greatest was chosen toproduce torque andmotion about the identified alpha carbonatom After the drift-diffusion determination of tertiarystructure the molecular dynamics (MD) model was used forfinal structure determination via energy relaxation

The electrostatic force between two charged bodies(1199021 and 1199022) is easily determined from Coulombrsquos law usingan appropriate dielectric constant (3) This research foundthat the dielectric constant of water 120598119908 yielded good results

where 120598119908 is sim781205980 (where 1205980 is the permittivity of vacuum)Therefore the Coulomb force is

119865elect =11990211199022

41205871205981199081199032 (3)

Kang et al previously described the electrostatic displace-ment force in the context as used here [7] and as applied inneural ionic transport where the electrostatic displacementforce arises from the attraction between mobile (eg liquid)polar media and an electric field [8] The polar media aredrawn towards an increasing electric field thereby producinga force that acts to sweep mobile uncharged nonpolar media(eg fixed ion electric field liquid water and a nonchargedresidue) producing attraction The attraction of polar mediaproduces a corresponding displacement force that alwaysmoves nonpolar protein regions towards a lower electricfield region (ie the nonpolar regions are pushed away fromstrong electric field by the inrush of water or other polarmedia toward strong field)

Simplified models for interaction of water nonpolarmedia and electric field with respect to protein structurehave been developed For example the generalized born (GB)model [9] has been used to track the electric field energyandor solvation energy with respect to protein charge andpartial charge Since this work follows a folding protein thevarious protein charge and partial charge move with respectto one another Therefore of the cases examined by Bashfordand Case [9] the case in which two proteins moved withrespect to one anothermost closely relatesThe kinetic model(KM) approach enabled a simple physical approach to theseconsiderations (see Kang et al [7])

Water dielectric constant is reasonable for distant chargedspecies since the predominant media separating these iswater However in the case of secondary structure andcollapsed protein structure the small distances betweencharged species are likely to have significant protein contenthaving a low dielectric constant Nonetheless the simulationproduces secondary structure with a high degree of accuracyIn the secondary structure generating subroutine describedin the next section parameters are fine-tuned to produce thebest agreement with experiment and even higher accuracyis obtained Typically the highest accuracy results for thetertiary structure determinationwere obtained using the fine-tuned secondary structure subroutine to identify secondarystructure location followed by a KM simulation collapse ofthe denatured protein with secondary structures and finallya full molecular dynamic relaxation (based on AMBER10)of the 3D structure in which local dielectric property isgenerated and used As noted elsewhere the use of AMBER10(and its more accurate dielectric consideration) increasedaccuracy by about 3

There is an energy associated with the electric fieldimbedded in a dielectricmedia Forces can be generatedwhencharge or nonpolar structure motion induces a reduction inthis energyTherefore it is necessary to define a displacementenergy and a displacement force described in terms ofthe electric field energy 119882 for water-filled regions (withpermittivity 120598119908) relative to the region filled with a nonpolarstructure (assumed to have the permittivity of vacuum 1205980)

BioMed Research International 3

2

0

minus2

minus4

minus6

minus8

minus10

minus12

minus14

minus16

minus18

minus200 05 1 15

Distance (nm)

Forc

e (10

minus1

eVc

m)

(A) Blue electrostatic force(B) First step area bubble force 1(C) Second step area bubble force 2

(A)

(B)(C)

Figure 1 Displacement force the electrostatic force (A) the forcemap representing direct contact of two nonpolar residues (B) andextension of displacement force (C) corresponding to conveyanceof the displacement force via equilibrium nonpolar interaction

Following Jackson [10] this energy consideration can bequantified

119882 = minus1

2int1198811

(120598119908 minus 1205980) ∙ 0119889119881 (4)

asymp minus1

2(120598119908 minus 1205980) ∙ 0Δ119881 (5)

asymp minus120573100381610038161003816100381611990210038161003816100381610038162

1199034 (6)

Force = minus120597119882120597119903

= minus4120573100381610038161003816100381611990210038161003816100381610038162

1199035 (7)

where the integration is over the volume of the nonpolarprotein regionThe nonpolar volume and the other constantsare collected into single term 120573 (as seen in (6) and (7))without loss of accuracy The magnitude of the displacementforces acting upon a neutral alpha helix-sized object underthe influence of an electronic point charge at sim01 nm iscomparable to theCoulomb force between twoopposite pointcharges at a similar distance Figure 1 shows the displacementforce values based on the distance between two nonpolarregions relative to electrostatic force

The third force considered here is thermal force Ther-mal force is related to the average motion or speed ofone part of the protein relative to another arising fromdiffusion There are several possible formulations of thethermal force The thermal diffusion speed is ] = radic119863120591 rarr

radic120583119896119879120591 (since 120583 = 119863119896119879) The force needed to produce

the same speed through drift is radic119896119879120583120591 where drift speed =120583nabla120601eff and nabla120601eff is an effective force

The concept of thermal force can be understood inthe context of generalized force (eg see Glicksman [11])whereby a concentration gradient and the correspondingdiffusive currents offset an applied force This principal iswidely applied for example in the separation of isotopeswhere the inverse the force needed to halt a diffusion-drivenconcentration change is considered

The forth force global entropy change was consideredHere the global entropy is taken to be the aggregate by allprotein entropy terms that change under folding A changein global entropy is estimated by considering a proposedchange in protein structure The global entropy values usedhere (generally decreasing on compacting) were determinedby fitting to observation (see Kang et al [7]) The estimatedentropy change was in turn used to determine a newmobilityA proposed protein change was computed by allowing a forceto act producing a speed consistent with the new mobilityThe resultant speed was allowed to act for one time step

Total force in protein structure can be defined by the sum-mation of three forces (8) The forth force global entropy-temperature product is incorporated into the mobility Thebest structure predictions were obtained when the globalentropy diminished on folding For example at room tem-perature an entropy-temperature gradient (2nd term in thedenominator of (1)) of asymp104 eV(Kcm) was typical

The remaining three forces can be expressed as

119865 = (11990211199022

41205871205981199032) + (

120573100381610038161003816100381611990210038161003816100381610038162

1199035) + radic

119896119879

120583120591 (8)

997888rarr (11990211199022

41205871205981199032) + (

120573100381610038161003816100381611990210038161003816100381610038162

1199035) + 120574radic119896119879 (9)

where defining the quantity radic1120583120591 rarr 120574 helps clean up theequations

The relation among charge water and hydrophobicresidue association is quantified in (9) The first term of(9) (Coulomb) quantifies the force between charges andorcharge and polar structure

It is well known that hydrophobic residues aggregationand water surface reduction relate to alpha helix formationThese hydrophobic interactions are characterized in terms ofthe aforementioned displacement force However in carefulexaminations of natural protein structures there exist manyprotein regions containing large numbers of hydrophobic(nonpolar) residues which do not bond (aggregation) anddo not form alpha helices This is understood in terms ofthe displacement force whereby charge attracts polar waterand in turn the polar water pushes nonpolar hydrophobicresidues apart

The repulsive force (second term of (9)) is pushing un-charged nonpolar structures away from the charged regions(by incoming polar water) causing the hydrophobic residuesto remain separated Therefore when sufficient charge existsin the intervening protein segments (eg charged residues)the repulsive component (second term of (9)) generates

4 BioMed Research International

1

09

08

07

06

05

04

03

02

01

008070605040302010

Radius (nm)

Prob

abili

ty d

ensit

y

1 bar100 bar

1000 barElectrostatic case

Figure 2The probability of a nonpolar region in water as a functionof pressure and electric field (as indicated) is shown Increasingelectric field significantly reduces the nonpolar region probability

a repulsive force sufficient to block the more outlying hydro-phobic residues fromapproaching one another Figure 2 illus-trates this electric fieldwith induced blocking Additionally itwas found that extremely large residue sizes could also inhibitalpha helix formation

The thermal force tends to randomize the net force ortorque acting on any given backbone atom pair Howeverearly in the folding the strong force between nearly spacedhydrophobic and electrostatic pairs overwhelms the thermalforce Consequently the simulation tends to hang up (repeat-edly returns to move on) a particular strong force generatingpair After a particular atom pair generates the largest torquefor five sequential time steps the simulation freezes themotion about this particular backbone atompair by removingits force (or torque) from the next time step query Therebythe simulation preserves the structure generated in five timesteps relative to the torques generated by the nearly spacedhydrophobic andor electrostatic pairs

It was found that almost all of the frozen backboneatomic pairs were part of secondary structures and that theseformed early in the folding process In this regard thesestructures are related to autonomous folding units [12] Inturn the relationship between strong forces on nearly spacedhydrophobic and electrostatic pairs can be directly correlatedto secondary structure as considered in the next section

Equation (9) is useful on all size scales On the smallestsize scale (9) can be the potential to generate secondarystructure On larger size scales the expression guides the gen-eration of tertiary structures by describing the force betweencharged regions andhydrophobic secondary structures Fromthe insights provided by this description a set of rules gov-erning secondary structure formation were developed

One of the attributes of the presented physical modelis that the workings that produce structure are open toinspection and understanding Accordingly examination ofsecondary structure in some cases time step by time steprevealed the nearly balanced interplay of the hydrophobicversus electrostatic elements of (9) Consequently two pathsfor improved secondary structure were presented First theconceptually simpler is the fine-tuning of the force equa-tions The second is that the recognition of nearly balancedhydrophobic and electrostatic forces imply secondary struc-ture formation Therefore an algorithm was developed tosearch for large nearly balanced force In this writing thesecond approach (described in the following section) is themore accurate Using the second approach requires the use ofa graphic program (such as Pymol) to insert the appropriatestructure into the 3D structure

3 A Streamlined Secondary StructurePrediction Model

Historically secondary structure prediction has advancedover the past five decades Secondary structure predictionintroduced in the 1960s and early 1970s [13ndash15] focused onidentifying alpha helices Beta sheet identification also relyingon statistical analysis [14 15] began in the 1970s Evolutionaryconservationmethods exploited the simultaneous assessmentof many homologous sequences to determine probabilisticrelation between protein sequence and secondary structure[16 17] Larger experimental structure databases and mod-ern machine learning methods have achieved sim80 overallsecondary structure prediction accuracy in globular proteins[18]

In the presented model the relative location of residuesand the hydrophobicpolar and charge characteristics ofeach residue are the only input elements The hydrophobiccharacter as a function of ambient can be found in theliterature (see eg [19 page 14 Table 12]) Also amino acidcharge and partial charge can be determined using popularsoftware suites including the (MOE) Molecular OperatingEnvironment software package which employs the standardAMBER 10 parameter set to calculate the force field [20] Insome cases (not reported here) extreme ambients such asvery low pH require appeal tomore sophisticated commercialsoftware suites

The algorithm for secondary structure prediction sys-tematically steps through an arbitrary amino acid sequenceWhen hydrophilic amino acids are encountered a mechan-ical set of queries determines if secondary structure ispresent These queries relate to the hydrophilic character ofthe following neighboring residues The simulation used bythis work employs secondary structure search beginning atthe occurrence of a hydrophilic residue after hydrophobicresidue and ending at the next hydrophobic residue Thecharge size and polarity of the intervening amino acidresidues determine both a secondary structure region and itstype A delicate balance of charge polarity and hydrophobiccharacter determines the secondary structures of protein

Upon encountering a hydrophilic amino acid (119899119894)a scan bracket is opened The following in the sequence

BioMed Research International 5

Table 1 Physical conditions from alpha helix identification

Case sum119899

119894=1119902119894 Π

119898

119895=119896119902119895 Other 119902119894 sum

119899

119894=1ℎ119894

119899 = 1

0 lt 119886 lt 02 gt0lt minus05 =0

1199021 gt 09 lt minus031 lt 119886 lt 05 lt0

119899 = 3gt1 1199022 = 1199023

|119886| lt 05 lt minus60

119899 = 5

03 lt 119886 lt 05

|119886| gt 10

119902119894 gt 0

|119902119894| gt 06

119902119894 partial charge of 119905th residue ℎ119894 hydrophobicity of 119894th residue119899 number of residue

(119899119894+1 119899119894+6) are queried unless the finding of a hydrophilicresidue ends the search (ie the end point (119899119894+119895) occurs atthe next hydrophilic residue) If no hydrophilic residues areencountered within the 119894 + 6 nearest neighboring residuesthe algorithms denote the region as unstructured and thenmove on to the next hydrophilic residue in the sequencewhere a new search begins Importantly in all cases wherea second hydrophilic residue (thus ending a search) isfound within the next six nearest neighbors a secondarystructure alpha helix will form in accordance with the rulessummarized in Table 1

Determination of the secondary structure type withina given sequence meeting the six nearest neighbors rulefollows a simple hierarchy First if the stringent conditionsfor hydrophobic collapse are unopposed by (sufficiently large)dielectric displacement forces an alpha helix will formSecond whenever alpha helix formation is blocked (egby the dielectric displacement force induced by interveningcharged residue) a beta sheet region will form Thereforealpha helix regions can be transformed into a beta sheet bymutations resulting in one or more charged residue(s) in thecritical areas between hydrophobic residues The formationof fibrils in Alzheimerrsquos patients is an example of a case wherea small mutation causes an alpha helix collapse [21]

The summation of charges (Table 1 column 2) deter-mines the overall charge and therefore the magnitudeof the electric field exerts repulsive force on hydrophobicresidues Theis force opposes hydrophobic residue aggrega-tion and therefore alpha helix cannot form The product ofcharge column 3 of Table 1 combined with the summationof charge relates to the magnitude of electrostatic attrac-tive (or repulsive) force The summation of hydrophobiccharacter (column 5 of Table 1 sum119899

1ℎ119894) is an indication of

the net hydrophobic attractive force operating within theregion Using the equalities and inequalities shown in Table 1very good predictions of secondary structure were obtained(Table 2) It is apparent that the algorithm is sensitive tovery small changes in charge andhydrophobic characteristicsThe above procedure for determining an alpha helix can beused to determine whether there is an alpha helix region inany portion of interest on the given amino acid sequence

In order to obtain all of the alpha helix regions the procedurecan be repeated by scanning through the entire amino acidsequenceThereafter the determination of a beta sheet regioncan be initiated

For determining beta sheet a residue on the aminoacid sequence is first selected If the residue is denotedunstructured (ie it is not previously determined to belongto an alpha helix or beta sheet region) the next residue isselected The procedure is repeated until an unstructuredresidue 119899119894 is encountered A scanning bracket is openedusing this unstructured residue as a starting residue and itsnext 4 consecutive residues (119899119894+1 through 119899119894+4) are queriedto determine if they are all unstructured If the answer isno the procedure is stopped Otherwise a scanning bracketof residues 119899119894 through 119899119894+4 is established Beta sheet deter-mination is then performed based on the summation of themagnitude of charges and the summation of the hydrophobiccharacter of each residue in the 5-residue bracket Forexample such a 5-residue bracket is determined to be a betasheet when sum119894+4

119894|119902119894| minus sum

119894+1

119894ℎ119894 lt 03 and sum119894+1

119894ℎ119894 gt 01 The

above procedure for beta sheet determination can be repeatedfor the entire amino acid sequence to obtain all of the betasheet structures on the sequence

Figure 3 shows secondary structure prediction to experi-ment for a wide cross-section of proteins The experimentalstructure was obtained from the Rost and Sander result[22] The algorithm found essentially all of the secondarystructures and correctly determined their character Theaccuracy of this procedure has been tested on hundreds ofproteins producing accuracies of sim70 plusmn 9 for alpha helixand sim66 plusmn 14 for beta sheet identification (residue-by-residue comparison between model and experimental forproper secondary structure assignment) Furthermore coresecondary structure identificationwas even greatersim75plusmn 7and sim70 plusmn 12 for helices and beta sheet respectively Theseare among themost accurate secondary structure predictionsto date

Table 2 compares the accuracy of the KM model (thekinetic model described here) relative to other popularmodels described in the literatureThe kinetic model demon-strated state-of-the-art accuracy in overall secondary struc-ture prediction and excellent alpha helix prediction Thecommercial PSIPRED [23] (is based on a statistical approach)also predicted secondary structure regionsrsquo size and locationextremely well in some cases but in others failed to identifythe presence of secondary structure The kinetic model gen-erally identified the presence of almost all of the structures butthe start and end points varied from those of the experiment

4 Tertiary Structure

While various forms of backbone atom tagging have beendone previously for example in coarse grain models theseapproaches differ from that used here As stated above thetagging used here involves assigning residue hydrophobicityand charge to the backbone atom bonded to the residue

In cases where the physical conditions are stronglyindicative of a particular secondary structure (eg alphahelix) the program directly inserts this secondary structure

6 BioMed Research International

Table 2 Comparison between KM and PSIPREDKM KM KM PSI PSI PSI

PDB all 120573 120572 all 120573 1205721hdn 738 785 8226 89 88 931ubq 7632 7273 9411 82 73 771vii 722 non 7895 80 non 1002nmq 6418 6538 non 60 54 non1pba 7404 non 7333 68 non 701aps 847 80 90 80 83 871aey 673 778 non 44 35 non1coa 4531 7105 6153 81 60 921fkb 6161 7297 100 88 90 1001mjc 7101 8064 non 69 70 non1nyf 5973 40 70 62 54 01pks 65 5357 6475 59 48 01shg 632 6123 non 42 25 non1srl 6556 75 100 53 37 01ten 7444 68 non 80 85 non1ycc 634 5858 6001 51 0 532ci2 6626 6434 7984 47 28 0Avg 6759 6799 7998 6676 5533 5169KM means the developed model and PSI means PSIPRED which is developed by the University of London Other names mean the percentage of predictionof total structure alpha-helix and beta-sheet with respect to two different methods To compute prediction rate NMR structure of each protein was used

without wasting computational resource to move atoms tothe correct position one-by-one That is the secondarystructure program is used to identify the strong propensityfor secondary structure formation and where indicated thisstructure is directly generated by graphical software forinstance Pymol and inserted into the tertiary structureOnce a secondary structure is so generated and inserted itis assumed immutable For the purpose of continued tertiarystructure folding the inserted immutable secondary struc-ture is tagged with its appropriate summed partial chargesand hydrophobicity of each secondary structure region Theportion of the protein not belonging to any determined alphahelix or beta sheet that is the unstructured portion can bebuilt as a linear chain or in an arbitrary physically permissibleconformation During tertiary structure simulation deter-mined secondary structures inner residues were frozen Dur-ing tertiary structure folding simulation secondary structureregions are treated as one residue

As illustrated in Figure 4 the relative motion of one partof a protein relative to the other parts is determined byallowing the alpha carbon bond pair having the largest nettorque (sum of actual force and the random effective thermalforce multiplied by the appropriate lever arm length) to movein each time step

The protein was allowed to drift and diffuse toward alower energy in accordance with the procedures describedabove The Markov simulation used here is summarizedin Figure 5 Starting with linear amino acid sequences theprotein molecule is allowed to drift and diffuse via rotationabout the torque rotation angle of pivot bonds as seen inFigure 4 The simulation allows the strongest force (sum ofthe actual force and the effective thermal force) to operate on

the protein for a time period 119905 using a mobility consistentwith (1) In cases where the global energy involves density-dependent entropy directions that increase entropy proceedwith a higher mobility thanmoves in directions that decreaseentropy

Figure 6 shows a comparison between simulated 3Dstructure of Villin (1VII) and the experiment The kineticmodel in this example produced near state-of-the-art foldedstructure with RMSD values for the backbone atoms of testproteins while using less than a minute of desktop computerCPU time In Table 3 various protein structures rangingfrom 30 to 157 amino acids were determined using thekinetic model Average RMSD value for this series relative toexperiment was found to be 49 plusmn 108 A

The largest protein reported here was the human proteintyrosine phosphatome with 320 amino acids The kineticmodel produced a RMSD value of sim8 A This value isreasonable relative to other current ab initio methods for aprotein of this size

It was found that 2sim3 improved RMSD values couldbe attained using a molecular dynamics structure relaxationThe finalMD relaxation step employed no statistical methodsor templates This relaxation employed nominal AMBER10default parameters for dielectric constant and other param-eters There was no attempt to optimize these parametersTherefore these energy relaxations could be carried outquickly (running AMBER on a supercomputer)

The energy reduction acting during folding could easilybe determined by tracking the force and distance (average)that accompany all folds or motions throughout the sim-ulation Figure 7 shows a typical energy versus time traceThe energy was calculated by assuming an arbitrary initial

BioMed Research International 7

0

10

20

30

40

50

60

70

80

90

0 50 100 150 200 250 300 350 400 450

Total structure prediction values

ldquoTotalrdquo

Number of residue

Pred

ictio

n (

)

(a)

0

20

40

60

80

100

120

0 50 100 150 200 250 300 350 400 450

Alpha structure prediction

ldquoAlpha helixrdquo

Number of residue

Pred

ictio

n (

)

(b)

0

20

40

60

80

100

120

0 50 100 150 200 250 300 350 400 450

Beta sheet structure prediction

Beta sheet

Number of residue

Pred

ictio

n (

)

(c)

Figure 3 The accuracy of the drift-diffusion model secondary prediction accuracy (percent of proteins tested) versus the size in number ofresidues (a) shows the accuracy for total protein core structures versus protein size (b) shows the accuracy of core alpha helix versus proteinsize and (c) shows the accuracy versus protein size Also shown in each case is the least mean square best fit to the data sets Data base sizesim100 proteins

energy and subtracting the work done 119865 sdot 120579 sdot 1199030 in each timestep where 119865 is the force applied (the sum of hydrophobicelectrostatic and thermal forces) 120579 is the calculated anglechange and 1199030 is the average lever arm length of the proteinInitially the energy swings are large and random but as thefolding proceeds these energy fluctuations are diminishedSince the folding is conducted at room temperature completecessation of motion does not occur

5 Result

A highly accurate secondary structure method has beendescribed The presented physical model predicts secondary

structure at least as well as advanced statistical basedmethodsrequiring known template structures The extension of themodel to tertiary structure dynamics with near state-of-the-art accuracy has also been described It was found thatthermal and repulsive electrostatic forces are sufficient toprevent unrealistic protein collapse

This high-speed physical model provides folding tra-jectory in real time with sensitivity to the environmentThereby doors to faster identification of function and agreater understanding of biological pathways are openedWhile some statisticalmethods canmatch both the speed andaccuracy of the kineticmodel it is important to recognize thatthe relating of structure to function can only be guided by anaccurate physical description of the forces that shape proteins

8 BioMed Research International

Torque forces

Pivotbond

applied to lever arms

120579

R

120579 asymp120583120513120601energy 120591

r

Figure 4 Illustrated is the angle rotation (120579 or 119877) about analpha carbon backbone bond pair induced by the resolved torqueoperating on the bond In this figure each circle represents 119862120572 atomfor each amino acid and big circles illustrate side chains which havestrong charges andor large hydrophobic forces

Table 3 RMSD result for tertiary structures

Proteins No RES Score for 2D RMSD of 3D2MHU 30 733 31VII 35 722 481CBH 36 753 433RNT 54 6165 411DUR 55 7445 371OVO 56 539 581BW6 56 70 612NMQ 57 6418 582UTG 70 6285 381UBQ 76 7632 542PCY 99 611 45CYT 103 6894 477RSA 124 737 511PAB 127 7404 422CCY 128 8156 632SNS 149 5402 671AAQ 157 73 62AVG 8306 6994 494A tertiary structure prediction result based on the kinetic model RMSDvalues are based on each backbone atom position comparison for tertiarystructure prediction Also a score of 2D means the matching percentage forthe secondary structure determination

6 Discussion and Conclusion

The era where protein folding can be tracked as a functionof ambient condition without templates or other a prioriknowledge has begun The accuracy of the presented all-physical model rivals the best statistical methods in sec-ondary structure and secondary core structure prediction

The fine-tuning of a secondary structure algorithm can beimproved by controlling some environment conditions forinstance pH Tertiary structure predictions do not match thebest statistical andor the best molecular dynamics models(when provided with suitable templates) However the accu-racy is amongst the best ab initiomethods Further increasedaccuracy (2sim3) could be obtained using a finishing MD-based energy relaxation step

For example the Villin headpiece a well-studied mod-erately small protein has been studied as a function oftemperature Hansmann and coworkers [24ndash26] achievedvery small RSMD values in the range of sim30 A (sim18 A for thecore region and 37 A for the entire protein) when comparedwith the NMR determined structure for this protein Incomparison the all-physical model presented here achievedRMSD values only in the 37 A range

It is also important to recognize that no model canexceed the accuracy of the measurements used to determinethe experimental structure The Villin headpiece proteinexperiment has a sim18 A inherent uncertainty It is oftenchallenging to obtain all of the ambient conditions (includingtemperature pH and process steps that may alter residuecharge) used for a given experimental structure determi-nation On the other hand the model can be used totrack structure change due to small changes in ambient(eg pH or temperature) Some MD-based modelings avoidthis problem by using a seed or template protein structureobtained under similar experimental conditions Howeversuch proceduremay restrict themodeling applications in realworld situations The kinetic model is expected to continueto improve and become a useful tool for the investigationof protein structure especially where there is little a prioristructure knowledge a need to elucidate the folding pathwayandor required high speed

Owing to the speed of the physical model the proteinfolding dynamic can be traced Here the protein energy isseen to randomly vary initially folioed by secondary for-mation and finally convergence to a more definite structurewith decreasing energy variations as folding precedes Inagreement with experiment the simulations tend towarddefinite structure but do not reach complete stasis

Alternative high-speed computational methods such asEVFold [27] enablewidespread access to fast prediction basedon statistical analysis of homologous sequences Even thoughthe two approaches are based on very different paradigmstogether and with other contributions there is an emergingand improving ability to predict protein structures withoutthe need to become experts of a particular approach

To illustrate guidance provided by the physical modelconsider the mechanisms by which the chaperone-subunitcomplex of a diverse group of Gram-negative bacteria [28]gains passage through the Papc usher channel at the outermembrane Here an approaching chaperone complex trig-gers a response by which outer membrane usher passageunblocks Applying the discussed concepts to the block-ing structure it is recognized that the highly hydrophobicblocking structure (since it is comprised of alpha helices)always moves towards lower electric field (second term onthe right of (8)) The usher channel interior (beta barrel) is

BioMed Research International 9

Start

Secondary structure generation

Assign secondary structure properties

to C120572-atom (charges

hydrophobicity)

Structural forces

calculation

Calculate potential angle

to rotate

Bond rotation

Check global entropy based on

volume

End

Figure 5 Illustrated is a simplified flow chart of the Markov simulation used in physical-kinetic simulation for tertiary structure

Figure 6 Comparison of a tertiary structure of Villin (1VII) asdetermined by experiment (NMR structure blue) with a modelgenerated structure (simulation result black)

highly charged giving rise to the possibility that the blockingstructure is moved to the side of the channel where theelectrical field is smallest and whenever an approachingcharged structure in combination with the channel chargeproduces an appropriate small combined electric field Suchinsights may lead to new strategies for drug delivery

Acknowledgments

The authors thank John Coleman Carlos Simmerling andProfessor Tanassi for many useful discussions The authorsalso thank Solar Physics Inc NYS Sensor Center for

0

1000

2000

3000

4000

5000

6000Free energytime

ldquoFree energytimerdquo

1 101 201 301 401 501 601 701 801 901 1001 1101 1201 1301

Figure 7 Free energy trajectory (calMol) during Villin simulation

Advanced Technology and Idalia Solar Technologies forpartial support of this work and Marc Dee for his assistancewith paper preparation

References

[1] K A Dill S B Ozkan T R Weikl J D Chodera and V AVoelz ldquoThe protein folding problem when will it be solvedrdquoCurrent Opinion in Structural Biology vol 17 no 3 pp 342ndash3462007

[2] A C Clark ldquoProtein folding are we there yetrdquo Archives ofBiochemistry and Biophysics vol 469 no 1 pp 1ndash3 2008

10 BioMed Research International

[3] C Hardin T V Pogorelov and Z Luthey-Schulten ldquoAb initioprotein structure predictionrdquo Current Opinion in StructuralBiology vol 12 no 2 pp 176ndash181 2002

[4] httppredictioncenterorgcasp9[5] C R Barrett W D Nix Tetelman and S Alan Principles of

Engineering Materials Prentice Hall New York NY USA 1973[6] J W Moore and R G Pearson Kinetics and Mechanisms John

Wiley amp Sons New York NY USA 3rd edition 1981[7] Y Kang E Jaen and C M Fortmann ldquoEinstein relations for

energy coupled particle systemsrdquoApplied Physics Letters vol 88no 11 Article ID 112110 2006

[8] Y Kang and C M Fortmann ldquoA structural basis for theHodgkin and Huxley relationrdquo Applied Physics Letters vol 91no 22 Article ID 223903 2007

[9] D Bashford and D A Case ldquoGeneralized born models ofmacromolecular solvation effectsrdquo Annual Review of PhysicalChemistry vol 51 pp 129ndash152 2000

[10] J D Jackson Classical Electrodynamics Wiley amp Sons NewYork NY USA 3rd edition 1999

[11] M E Glicksman Di Usion in Solids John Wiley amp Sons NewYork NY USA 2000

[12] Z Peng and L C Wu ldquoAutonomous protein folding unitsrdquoAdvanced in Protein Chemistry vol 53 pp 1ndash30 2000

[13] P Y Chou and G D Fasman ldquoPrediction of protein conforma-tionrdquo Biochemistry vol 13 no 2 pp 222ndash245 1974

[14] V I Lim ldquoStructural principles of the globular Establishinghomologies in protein sequencesrdquo Journal of Molecular Biologyvol 88 pp 857ndash873 1984

[15] J Garnier D J Osguthorpe and B Robson ldquoAnalysis of theaccuracy and implications of simple methods for predicting thesecondary structure of globular proteinsrdquo Journal of MolecularBiology vol 120 no 1 pp 97ndash120 1978

[16] G Deleage and B Roux ldquoAn algorithm for protein secondarystructure prediction based on class predictionrdquo Protein Engi-neering vol 1 no 4 pp 289ndash294 1987

[17] S R Presnell B I Cohen and F E Cohen ldquoA segment-based approach to protein secondary structure predictionrdquoBiochemistry vol 31 no 4 pp 983ndash993 1992

[18] H H L Howard Holley l and M Karplus ldquoProtein secondarystructure prediction with a neural networkrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 86 no 1 pp 152ndash156 1989

[19] R A CopelandMethods for Protein Analysis A practical Guideto laboratory protocols Chapman amp Hall New York NY USA1994

[20] MOE Chemical Computing Group 1010 Sherbrooke St WSuite 910 Montreal Quebec Canada G3A 2R7 httpwwwchemcompcom

[21] MGuo PMGormanM Rico A Chakrabartty andDV Lau-rents ldquoCharge substitution shows that repulsive electrostaticinteractions impede the oligomerization of Alzheimer amyloidpeptidesrdquo FEBS Letters vol 579 no 17 pp 3574ndash3578 2005

[22] B Rost and C Sander ldquoPrediction of protein secondary struc-ture at better than 70 accuracyrdquo Journal of Molecular Biologyvol 232 no 2 pp 584ndash599 1993

[23] httpbioinfcsuclacukpsipred[24] C-Y Lin C-K Hu and U H E Hansmann ldquoParallel temper-

ing simulations of HP-36rdquo Proteins vol 52 no 3 pp 436ndash4452003

[25] B S Kinnear M F Jarrold and U H E Hansmann ldquoAll-atomgeneralized-ensemble simulations of small proteinsrdquo Journal ofMolecular Graphics and Modelling vol 22 no 5 pp 397ndash4032004

[26] W Kwak andU H E Hansmann ldquoEfficient sampling of proteinstructures by model hoppingrdquo Physical Review Letters vol 95no 13 Article ID 138102 4 pages 2005

[27] D S Marks L J Colwell R Sheridan et al ldquoProtein 3D struc-ture computed from evolutionary sequence variationrdquo PLoSONE vol 6 no 12 Article ID e28766 2011

[28] H Remaut C Tang N S Henderson et al ldquoFiber formationacross the bacterial outer membrane by the chaperoneusherpathwayrdquo Cell vol 133 no 4 pp 640ndash652 2008

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 3: Research Article An Alternative Approach to Protein Foldingdownloads.hindawi.com/journals/bmri/2013/583045.pdf · An Alternative Approach to Protein Folding ... e electrostatic force

BioMed Research International 3

2

0

minus2

minus4

minus6

minus8

minus10

minus12

minus14

minus16

minus18

minus200 05 1 15

Distance (nm)

Forc

e (10

minus1

eVc

m)

(A) Blue electrostatic force(B) First step area bubble force 1(C) Second step area bubble force 2

(A)

(B)(C)

Figure 1 Displacement force the electrostatic force (A) the forcemap representing direct contact of two nonpolar residues (B) andextension of displacement force (C) corresponding to conveyanceof the displacement force via equilibrium nonpolar interaction

Following Jackson [10] this energy consideration can bequantified

119882 = minus1

2int1198811

(120598119908 minus 1205980) ∙ 0119889119881 (4)

asymp minus1

2(120598119908 minus 1205980) ∙ 0Δ119881 (5)

asymp minus120573100381610038161003816100381611990210038161003816100381610038162

1199034 (6)

Force = minus120597119882120597119903

= minus4120573100381610038161003816100381611990210038161003816100381610038162

1199035 (7)

where the integration is over the volume of the nonpolarprotein regionThe nonpolar volume and the other constantsare collected into single term 120573 (as seen in (6) and (7))without loss of accuracy The magnitude of the displacementforces acting upon a neutral alpha helix-sized object underthe influence of an electronic point charge at sim01 nm iscomparable to theCoulomb force between twoopposite pointcharges at a similar distance Figure 1 shows the displacementforce values based on the distance between two nonpolarregions relative to electrostatic force

The third force considered here is thermal force Ther-mal force is related to the average motion or speed ofone part of the protein relative to another arising fromdiffusion There are several possible formulations of thethermal force The thermal diffusion speed is ] = radic119863120591 rarr

radic120583119896119879120591 (since 120583 = 119863119896119879) The force needed to produce

the same speed through drift is radic119896119879120583120591 where drift speed =120583nabla120601eff and nabla120601eff is an effective force

The concept of thermal force can be understood inthe context of generalized force (eg see Glicksman [11])whereby a concentration gradient and the correspondingdiffusive currents offset an applied force This principal iswidely applied for example in the separation of isotopeswhere the inverse the force needed to halt a diffusion-drivenconcentration change is considered

The forth force global entropy change was consideredHere the global entropy is taken to be the aggregate by allprotein entropy terms that change under folding A changein global entropy is estimated by considering a proposedchange in protein structure The global entropy values usedhere (generally decreasing on compacting) were determinedby fitting to observation (see Kang et al [7]) The estimatedentropy change was in turn used to determine a newmobilityA proposed protein change was computed by allowing a forceto act producing a speed consistent with the new mobilityThe resultant speed was allowed to act for one time step

Total force in protein structure can be defined by the sum-mation of three forces (8) The forth force global entropy-temperature product is incorporated into the mobility Thebest structure predictions were obtained when the globalentropy diminished on folding For example at room tem-perature an entropy-temperature gradient (2nd term in thedenominator of (1)) of asymp104 eV(Kcm) was typical

The remaining three forces can be expressed as

119865 = (11990211199022

41205871205981199032) + (

120573100381610038161003816100381611990210038161003816100381610038162

1199035) + radic

119896119879

120583120591 (8)

997888rarr (11990211199022

41205871205981199032) + (

120573100381610038161003816100381611990210038161003816100381610038162

1199035) + 120574radic119896119879 (9)

where defining the quantity radic1120583120591 rarr 120574 helps clean up theequations

The relation among charge water and hydrophobicresidue association is quantified in (9) The first term of(9) (Coulomb) quantifies the force between charges andorcharge and polar structure

It is well known that hydrophobic residues aggregationand water surface reduction relate to alpha helix formationThese hydrophobic interactions are characterized in terms ofthe aforementioned displacement force However in carefulexaminations of natural protein structures there exist manyprotein regions containing large numbers of hydrophobic(nonpolar) residues which do not bond (aggregation) anddo not form alpha helices This is understood in terms ofthe displacement force whereby charge attracts polar waterand in turn the polar water pushes nonpolar hydrophobicresidues apart

The repulsive force (second term of (9)) is pushing un-charged nonpolar structures away from the charged regions(by incoming polar water) causing the hydrophobic residuesto remain separated Therefore when sufficient charge existsin the intervening protein segments (eg charged residues)the repulsive component (second term of (9)) generates

4 BioMed Research International

1

09

08

07

06

05

04

03

02

01

008070605040302010

Radius (nm)

Prob

abili

ty d

ensit

y

1 bar100 bar

1000 barElectrostatic case

Figure 2The probability of a nonpolar region in water as a functionof pressure and electric field (as indicated) is shown Increasingelectric field significantly reduces the nonpolar region probability

a repulsive force sufficient to block the more outlying hydro-phobic residues fromapproaching one another Figure 2 illus-trates this electric fieldwith induced blocking Additionally itwas found that extremely large residue sizes could also inhibitalpha helix formation

The thermal force tends to randomize the net force ortorque acting on any given backbone atom pair Howeverearly in the folding the strong force between nearly spacedhydrophobic and electrostatic pairs overwhelms the thermalforce Consequently the simulation tends to hang up (repeat-edly returns to move on) a particular strong force generatingpair After a particular atom pair generates the largest torquefor five sequential time steps the simulation freezes themotion about this particular backbone atompair by removingits force (or torque) from the next time step query Therebythe simulation preserves the structure generated in five timesteps relative to the torques generated by the nearly spacedhydrophobic andor electrostatic pairs

It was found that almost all of the frozen backboneatomic pairs were part of secondary structures and that theseformed early in the folding process In this regard thesestructures are related to autonomous folding units [12] Inturn the relationship between strong forces on nearly spacedhydrophobic and electrostatic pairs can be directly correlatedto secondary structure as considered in the next section

Equation (9) is useful on all size scales On the smallestsize scale (9) can be the potential to generate secondarystructure On larger size scales the expression guides the gen-eration of tertiary structures by describing the force betweencharged regions andhydrophobic secondary structures Fromthe insights provided by this description a set of rules gov-erning secondary structure formation were developed

One of the attributes of the presented physical modelis that the workings that produce structure are open toinspection and understanding Accordingly examination ofsecondary structure in some cases time step by time steprevealed the nearly balanced interplay of the hydrophobicversus electrostatic elements of (9) Consequently two pathsfor improved secondary structure were presented First theconceptually simpler is the fine-tuning of the force equa-tions The second is that the recognition of nearly balancedhydrophobic and electrostatic forces imply secondary struc-ture formation Therefore an algorithm was developed tosearch for large nearly balanced force In this writing thesecond approach (described in the following section) is themore accurate Using the second approach requires the use ofa graphic program (such as Pymol) to insert the appropriatestructure into the 3D structure

3 A Streamlined Secondary StructurePrediction Model

Historically secondary structure prediction has advancedover the past five decades Secondary structure predictionintroduced in the 1960s and early 1970s [13ndash15] focused onidentifying alpha helices Beta sheet identification also relyingon statistical analysis [14 15] began in the 1970s Evolutionaryconservationmethods exploited the simultaneous assessmentof many homologous sequences to determine probabilisticrelation between protein sequence and secondary structure[16 17] Larger experimental structure databases and mod-ern machine learning methods have achieved sim80 overallsecondary structure prediction accuracy in globular proteins[18]

In the presented model the relative location of residuesand the hydrophobicpolar and charge characteristics ofeach residue are the only input elements The hydrophobiccharacter as a function of ambient can be found in theliterature (see eg [19 page 14 Table 12]) Also amino acidcharge and partial charge can be determined using popularsoftware suites including the (MOE) Molecular OperatingEnvironment software package which employs the standardAMBER 10 parameter set to calculate the force field [20] Insome cases (not reported here) extreme ambients such asvery low pH require appeal tomore sophisticated commercialsoftware suites

The algorithm for secondary structure prediction sys-tematically steps through an arbitrary amino acid sequenceWhen hydrophilic amino acids are encountered a mechan-ical set of queries determines if secondary structure ispresent These queries relate to the hydrophilic character ofthe following neighboring residues The simulation used bythis work employs secondary structure search beginning atthe occurrence of a hydrophilic residue after hydrophobicresidue and ending at the next hydrophobic residue Thecharge size and polarity of the intervening amino acidresidues determine both a secondary structure region and itstype A delicate balance of charge polarity and hydrophobiccharacter determines the secondary structures of protein

Upon encountering a hydrophilic amino acid (119899119894)a scan bracket is opened The following in the sequence

BioMed Research International 5

Table 1 Physical conditions from alpha helix identification

Case sum119899

119894=1119902119894 Π

119898

119895=119896119902119895 Other 119902119894 sum

119899

119894=1ℎ119894

119899 = 1

0 lt 119886 lt 02 gt0lt minus05 =0

1199021 gt 09 lt minus031 lt 119886 lt 05 lt0

119899 = 3gt1 1199022 = 1199023

|119886| lt 05 lt minus60

119899 = 5

03 lt 119886 lt 05

|119886| gt 10

119902119894 gt 0

|119902119894| gt 06

119902119894 partial charge of 119905th residue ℎ119894 hydrophobicity of 119894th residue119899 number of residue

(119899119894+1 119899119894+6) are queried unless the finding of a hydrophilicresidue ends the search (ie the end point (119899119894+119895) occurs atthe next hydrophilic residue) If no hydrophilic residues areencountered within the 119894 + 6 nearest neighboring residuesthe algorithms denote the region as unstructured and thenmove on to the next hydrophilic residue in the sequencewhere a new search begins Importantly in all cases wherea second hydrophilic residue (thus ending a search) isfound within the next six nearest neighbors a secondarystructure alpha helix will form in accordance with the rulessummarized in Table 1

Determination of the secondary structure type withina given sequence meeting the six nearest neighbors rulefollows a simple hierarchy First if the stringent conditionsfor hydrophobic collapse are unopposed by (sufficiently large)dielectric displacement forces an alpha helix will formSecond whenever alpha helix formation is blocked (egby the dielectric displacement force induced by interveningcharged residue) a beta sheet region will form Thereforealpha helix regions can be transformed into a beta sheet bymutations resulting in one or more charged residue(s) in thecritical areas between hydrophobic residues The formationof fibrils in Alzheimerrsquos patients is an example of a case wherea small mutation causes an alpha helix collapse [21]

The summation of charges (Table 1 column 2) deter-mines the overall charge and therefore the magnitudeof the electric field exerts repulsive force on hydrophobicresidues Theis force opposes hydrophobic residue aggrega-tion and therefore alpha helix cannot form The product ofcharge column 3 of Table 1 combined with the summationof charge relates to the magnitude of electrostatic attrac-tive (or repulsive) force The summation of hydrophobiccharacter (column 5 of Table 1 sum119899

1ℎ119894) is an indication of

the net hydrophobic attractive force operating within theregion Using the equalities and inequalities shown in Table 1very good predictions of secondary structure were obtained(Table 2) It is apparent that the algorithm is sensitive tovery small changes in charge andhydrophobic characteristicsThe above procedure for determining an alpha helix can beused to determine whether there is an alpha helix region inany portion of interest on the given amino acid sequence

In order to obtain all of the alpha helix regions the procedurecan be repeated by scanning through the entire amino acidsequenceThereafter the determination of a beta sheet regioncan be initiated

For determining beta sheet a residue on the aminoacid sequence is first selected If the residue is denotedunstructured (ie it is not previously determined to belongto an alpha helix or beta sheet region) the next residue isselected The procedure is repeated until an unstructuredresidue 119899119894 is encountered A scanning bracket is openedusing this unstructured residue as a starting residue and itsnext 4 consecutive residues (119899119894+1 through 119899119894+4) are queriedto determine if they are all unstructured If the answer isno the procedure is stopped Otherwise a scanning bracketof residues 119899119894 through 119899119894+4 is established Beta sheet deter-mination is then performed based on the summation of themagnitude of charges and the summation of the hydrophobiccharacter of each residue in the 5-residue bracket Forexample such a 5-residue bracket is determined to be a betasheet when sum119894+4

119894|119902119894| minus sum

119894+1

119894ℎ119894 lt 03 and sum119894+1

119894ℎ119894 gt 01 The

above procedure for beta sheet determination can be repeatedfor the entire amino acid sequence to obtain all of the betasheet structures on the sequence

Figure 3 shows secondary structure prediction to experi-ment for a wide cross-section of proteins The experimentalstructure was obtained from the Rost and Sander result[22] The algorithm found essentially all of the secondarystructures and correctly determined their character Theaccuracy of this procedure has been tested on hundreds ofproteins producing accuracies of sim70 plusmn 9 for alpha helixand sim66 plusmn 14 for beta sheet identification (residue-by-residue comparison between model and experimental forproper secondary structure assignment) Furthermore coresecondary structure identificationwas even greatersim75plusmn 7and sim70 plusmn 12 for helices and beta sheet respectively Theseare among themost accurate secondary structure predictionsto date

Table 2 compares the accuracy of the KM model (thekinetic model described here) relative to other popularmodels described in the literatureThe kinetic model demon-strated state-of-the-art accuracy in overall secondary struc-ture prediction and excellent alpha helix prediction Thecommercial PSIPRED [23] (is based on a statistical approach)also predicted secondary structure regionsrsquo size and locationextremely well in some cases but in others failed to identifythe presence of secondary structure The kinetic model gen-erally identified the presence of almost all of the structures butthe start and end points varied from those of the experiment

4 Tertiary Structure

While various forms of backbone atom tagging have beendone previously for example in coarse grain models theseapproaches differ from that used here As stated above thetagging used here involves assigning residue hydrophobicityand charge to the backbone atom bonded to the residue

In cases where the physical conditions are stronglyindicative of a particular secondary structure (eg alphahelix) the program directly inserts this secondary structure

6 BioMed Research International

Table 2 Comparison between KM and PSIPREDKM KM KM PSI PSI PSI

PDB all 120573 120572 all 120573 1205721hdn 738 785 8226 89 88 931ubq 7632 7273 9411 82 73 771vii 722 non 7895 80 non 1002nmq 6418 6538 non 60 54 non1pba 7404 non 7333 68 non 701aps 847 80 90 80 83 871aey 673 778 non 44 35 non1coa 4531 7105 6153 81 60 921fkb 6161 7297 100 88 90 1001mjc 7101 8064 non 69 70 non1nyf 5973 40 70 62 54 01pks 65 5357 6475 59 48 01shg 632 6123 non 42 25 non1srl 6556 75 100 53 37 01ten 7444 68 non 80 85 non1ycc 634 5858 6001 51 0 532ci2 6626 6434 7984 47 28 0Avg 6759 6799 7998 6676 5533 5169KM means the developed model and PSI means PSIPRED which is developed by the University of London Other names mean the percentage of predictionof total structure alpha-helix and beta-sheet with respect to two different methods To compute prediction rate NMR structure of each protein was used

without wasting computational resource to move atoms tothe correct position one-by-one That is the secondarystructure program is used to identify the strong propensityfor secondary structure formation and where indicated thisstructure is directly generated by graphical software forinstance Pymol and inserted into the tertiary structureOnce a secondary structure is so generated and inserted itis assumed immutable For the purpose of continued tertiarystructure folding the inserted immutable secondary struc-ture is tagged with its appropriate summed partial chargesand hydrophobicity of each secondary structure region Theportion of the protein not belonging to any determined alphahelix or beta sheet that is the unstructured portion can bebuilt as a linear chain or in an arbitrary physically permissibleconformation During tertiary structure simulation deter-mined secondary structures inner residues were frozen Dur-ing tertiary structure folding simulation secondary structureregions are treated as one residue

As illustrated in Figure 4 the relative motion of one partof a protein relative to the other parts is determined byallowing the alpha carbon bond pair having the largest nettorque (sum of actual force and the random effective thermalforce multiplied by the appropriate lever arm length) to movein each time step

The protein was allowed to drift and diffuse toward alower energy in accordance with the procedures describedabove The Markov simulation used here is summarizedin Figure 5 Starting with linear amino acid sequences theprotein molecule is allowed to drift and diffuse via rotationabout the torque rotation angle of pivot bonds as seen inFigure 4 The simulation allows the strongest force (sum ofthe actual force and the effective thermal force) to operate on

the protein for a time period 119905 using a mobility consistentwith (1) In cases where the global energy involves density-dependent entropy directions that increase entropy proceedwith a higher mobility thanmoves in directions that decreaseentropy

Figure 6 shows a comparison between simulated 3Dstructure of Villin (1VII) and the experiment The kineticmodel in this example produced near state-of-the-art foldedstructure with RMSD values for the backbone atoms of testproteins while using less than a minute of desktop computerCPU time In Table 3 various protein structures rangingfrom 30 to 157 amino acids were determined using thekinetic model Average RMSD value for this series relative toexperiment was found to be 49 plusmn 108 A

The largest protein reported here was the human proteintyrosine phosphatome with 320 amino acids The kineticmodel produced a RMSD value of sim8 A This value isreasonable relative to other current ab initio methods for aprotein of this size

It was found that 2sim3 improved RMSD values couldbe attained using a molecular dynamics structure relaxationThe finalMD relaxation step employed no statistical methodsor templates This relaxation employed nominal AMBER10default parameters for dielectric constant and other param-eters There was no attempt to optimize these parametersTherefore these energy relaxations could be carried outquickly (running AMBER on a supercomputer)

The energy reduction acting during folding could easilybe determined by tracking the force and distance (average)that accompany all folds or motions throughout the sim-ulation Figure 7 shows a typical energy versus time traceThe energy was calculated by assuming an arbitrary initial

BioMed Research International 7

0

10

20

30

40

50

60

70

80

90

0 50 100 150 200 250 300 350 400 450

Total structure prediction values

ldquoTotalrdquo

Number of residue

Pred

ictio

n (

)

(a)

0

20

40

60

80

100

120

0 50 100 150 200 250 300 350 400 450

Alpha structure prediction

ldquoAlpha helixrdquo

Number of residue

Pred

ictio

n (

)

(b)

0

20

40

60

80

100

120

0 50 100 150 200 250 300 350 400 450

Beta sheet structure prediction

Beta sheet

Number of residue

Pred

ictio

n (

)

(c)

Figure 3 The accuracy of the drift-diffusion model secondary prediction accuracy (percent of proteins tested) versus the size in number ofresidues (a) shows the accuracy for total protein core structures versus protein size (b) shows the accuracy of core alpha helix versus proteinsize and (c) shows the accuracy versus protein size Also shown in each case is the least mean square best fit to the data sets Data base sizesim100 proteins

energy and subtracting the work done 119865 sdot 120579 sdot 1199030 in each timestep where 119865 is the force applied (the sum of hydrophobicelectrostatic and thermal forces) 120579 is the calculated anglechange and 1199030 is the average lever arm length of the proteinInitially the energy swings are large and random but as thefolding proceeds these energy fluctuations are diminishedSince the folding is conducted at room temperature completecessation of motion does not occur

5 Result

A highly accurate secondary structure method has beendescribed The presented physical model predicts secondary

structure at least as well as advanced statistical basedmethodsrequiring known template structures The extension of themodel to tertiary structure dynamics with near state-of-the-art accuracy has also been described It was found thatthermal and repulsive electrostatic forces are sufficient toprevent unrealistic protein collapse

This high-speed physical model provides folding tra-jectory in real time with sensitivity to the environmentThereby doors to faster identification of function and agreater understanding of biological pathways are openedWhile some statisticalmethods canmatch both the speed andaccuracy of the kineticmodel it is important to recognize thatthe relating of structure to function can only be guided by anaccurate physical description of the forces that shape proteins

8 BioMed Research International

Torque forces

Pivotbond

applied to lever arms

120579

R

120579 asymp120583120513120601energy 120591

r

Figure 4 Illustrated is the angle rotation (120579 or 119877) about analpha carbon backbone bond pair induced by the resolved torqueoperating on the bond In this figure each circle represents 119862120572 atomfor each amino acid and big circles illustrate side chains which havestrong charges andor large hydrophobic forces

Table 3 RMSD result for tertiary structures

Proteins No RES Score for 2D RMSD of 3D2MHU 30 733 31VII 35 722 481CBH 36 753 433RNT 54 6165 411DUR 55 7445 371OVO 56 539 581BW6 56 70 612NMQ 57 6418 582UTG 70 6285 381UBQ 76 7632 542PCY 99 611 45CYT 103 6894 477RSA 124 737 511PAB 127 7404 422CCY 128 8156 632SNS 149 5402 671AAQ 157 73 62AVG 8306 6994 494A tertiary structure prediction result based on the kinetic model RMSDvalues are based on each backbone atom position comparison for tertiarystructure prediction Also a score of 2D means the matching percentage forthe secondary structure determination

6 Discussion and Conclusion

The era where protein folding can be tracked as a functionof ambient condition without templates or other a prioriknowledge has begun The accuracy of the presented all-physical model rivals the best statistical methods in sec-ondary structure and secondary core structure prediction

The fine-tuning of a secondary structure algorithm can beimproved by controlling some environment conditions forinstance pH Tertiary structure predictions do not match thebest statistical andor the best molecular dynamics models(when provided with suitable templates) However the accu-racy is amongst the best ab initiomethods Further increasedaccuracy (2sim3) could be obtained using a finishing MD-based energy relaxation step

For example the Villin headpiece a well-studied mod-erately small protein has been studied as a function oftemperature Hansmann and coworkers [24ndash26] achievedvery small RSMD values in the range of sim30 A (sim18 A for thecore region and 37 A for the entire protein) when comparedwith the NMR determined structure for this protein Incomparison the all-physical model presented here achievedRMSD values only in the 37 A range

It is also important to recognize that no model canexceed the accuracy of the measurements used to determinethe experimental structure The Villin headpiece proteinexperiment has a sim18 A inherent uncertainty It is oftenchallenging to obtain all of the ambient conditions (includingtemperature pH and process steps that may alter residuecharge) used for a given experimental structure determi-nation On the other hand the model can be used totrack structure change due to small changes in ambient(eg pH or temperature) Some MD-based modelings avoidthis problem by using a seed or template protein structureobtained under similar experimental conditions Howeversuch proceduremay restrict themodeling applications in realworld situations The kinetic model is expected to continueto improve and become a useful tool for the investigationof protein structure especially where there is little a prioristructure knowledge a need to elucidate the folding pathwayandor required high speed

Owing to the speed of the physical model the proteinfolding dynamic can be traced Here the protein energy isseen to randomly vary initially folioed by secondary for-mation and finally convergence to a more definite structurewith decreasing energy variations as folding precedes Inagreement with experiment the simulations tend towarddefinite structure but do not reach complete stasis

Alternative high-speed computational methods such asEVFold [27] enablewidespread access to fast prediction basedon statistical analysis of homologous sequences Even thoughthe two approaches are based on very different paradigmstogether and with other contributions there is an emergingand improving ability to predict protein structures withoutthe need to become experts of a particular approach

To illustrate guidance provided by the physical modelconsider the mechanisms by which the chaperone-subunitcomplex of a diverse group of Gram-negative bacteria [28]gains passage through the Papc usher channel at the outermembrane Here an approaching chaperone complex trig-gers a response by which outer membrane usher passageunblocks Applying the discussed concepts to the block-ing structure it is recognized that the highly hydrophobicblocking structure (since it is comprised of alpha helices)always moves towards lower electric field (second term onthe right of (8)) The usher channel interior (beta barrel) is

BioMed Research International 9

Start

Secondary structure generation

Assign secondary structure properties

to C120572-atom (charges

hydrophobicity)

Structural forces

calculation

Calculate potential angle

to rotate

Bond rotation

Check global entropy based on

volume

End

Figure 5 Illustrated is a simplified flow chart of the Markov simulation used in physical-kinetic simulation for tertiary structure

Figure 6 Comparison of a tertiary structure of Villin (1VII) asdetermined by experiment (NMR structure blue) with a modelgenerated structure (simulation result black)

highly charged giving rise to the possibility that the blockingstructure is moved to the side of the channel where theelectrical field is smallest and whenever an approachingcharged structure in combination with the channel chargeproduces an appropriate small combined electric field Suchinsights may lead to new strategies for drug delivery

Acknowledgments

The authors thank John Coleman Carlos Simmerling andProfessor Tanassi for many useful discussions The authorsalso thank Solar Physics Inc NYS Sensor Center for

0

1000

2000

3000

4000

5000

6000Free energytime

ldquoFree energytimerdquo

1 101 201 301 401 501 601 701 801 901 1001 1101 1201 1301

Figure 7 Free energy trajectory (calMol) during Villin simulation

Advanced Technology and Idalia Solar Technologies forpartial support of this work and Marc Dee for his assistancewith paper preparation

References

[1] K A Dill S B Ozkan T R Weikl J D Chodera and V AVoelz ldquoThe protein folding problem when will it be solvedrdquoCurrent Opinion in Structural Biology vol 17 no 3 pp 342ndash3462007

[2] A C Clark ldquoProtein folding are we there yetrdquo Archives ofBiochemistry and Biophysics vol 469 no 1 pp 1ndash3 2008

10 BioMed Research International

[3] C Hardin T V Pogorelov and Z Luthey-Schulten ldquoAb initioprotein structure predictionrdquo Current Opinion in StructuralBiology vol 12 no 2 pp 176ndash181 2002

[4] httppredictioncenterorgcasp9[5] C R Barrett W D Nix Tetelman and S Alan Principles of

Engineering Materials Prentice Hall New York NY USA 1973[6] J W Moore and R G Pearson Kinetics and Mechanisms John

Wiley amp Sons New York NY USA 3rd edition 1981[7] Y Kang E Jaen and C M Fortmann ldquoEinstein relations for

energy coupled particle systemsrdquoApplied Physics Letters vol 88no 11 Article ID 112110 2006

[8] Y Kang and C M Fortmann ldquoA structural basis for theHodgkin and Huxley relationrdquo Applied Physics Letters vol 91no 22 Article ID 223903 2007

[9] D Bashford and D A Case ldquoGeneralized born models ofmacromolecular solvation effectsrdquo Annual Review of PhysicalChemistry vol 51 pp 129ndash152 2000

[10] J D Jackson Classical Electrodynamics Wiley amp Sons NewYork NY USA 3rd edition 1999

[11] M E Glicksman Di Usion in Solids John Wiley amp Sons NewYork NY USA 2000

[12] Z Peng and L C Wu ldquoAutonomous protein folding unitsrdquoAdvanced in Protein Chemistry vol 53 pp 1ndash30 2000

[13] P Y Chou and G D Fasman ldquoPrediction of protein conforma-tionrdquo Biochemistry vol 13 no 2 pp 222ndash245 1974

[14] V I Lim ldquoStructural principles of the globular Establishinghomologies in protein sequencesrdquo Journal of Molecular Biologyvol 88 pp 857ndash873 1984

[15] J Garnier D J Osguthorpe and B Robson ldquoAnalysis of theaccuracy and implications of simple methods for predicting thesecondary structure of globular proteinsrdquo Journal of MolecularBiology vol 120 no 1 pp 97ndash120 1978

[16] G Deleage and B Roux ldquoAn algorithm for protein secondarystructure prediction based on class predictionrdquo Protein Engi-neering vol 1 no 4 pp 289ndash294 1987

[17] S R Presnell B I Cohen and F E Cohen ldquoA segment-based approach to protein secondary structure predictionrdquoBiochemistry vol 31 no 4 pp 983ndash993 1992

[18] H H L Howard Holley l and M Karplus ldquoProtein secondarystructure prediction with a neural networkrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 86 no 1 pp 152ndash156 1989

[19] R A CopelandMethods for Protein Analysis A practical Guideto laboratory protocols Chapman amp Hall New York NY USA1994

[20] MOE Chemical Computing Group 1010 Sherbrooke St WSuite 910 Montreal Quebec Canada G3A 2R7 httpwwwchemcompcom

[21] MGuo PMGormanM Rico A Chakrabartty andDV Lau-rents ldquoCharge substitution shows that repulsive electrostaticinteractions impede the oligomerization of Alzheimer amyloidpeptidesrdquo FEBS Letters vol 579 no 17 pp 3574ndash3578 2005

[22] B Rost and C Sander ldquoPrediction of protein secondary struc-ture at better than 70 accuracyrdquo Journal of Molecular Biologyvol 232 no 2 pp 584ndash599 1993

[23] httpbioinfcsuclacukpsipred[24] C-Y Lin C-K Hu and U H E Hansmann ldquoParallel temper-

ing simulations of HP-36rdquo Proteins vol 52 no 3 pp 436ndash4452003

[25] B S Kinnear M F Jarrold and U H E Hansmann ldquoAll-atomgeneralized-ensemble simulations of small proteinsrdquo Journal ofMolecular Graphics and Modelling vol 22 no 5 pp 397ndash4032004

[26] W Kwak andU H E Hansmann ldquoEfficient sampling of proteinstructures by model hoppingrdquo Physical Review Letters vol 95no 13 Article ID 138102 4 pages 2005

[27] D S Marks L J Colwell R Sheridan et al ldquoProtein 3D struc-ture computed from evolutionary sequence variationrdquo PLoSONE vol 6 no 12 Article ID e28766 2011

[28] H Remaut C Tang N S Henderson et al ldquoFiber formationacross the bacterial outer membrane by the chaperoneusherpathwayrdquo Cell vol 133 no 4 pp 640ndash652 2008

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 4: Research Article An Alternative Approach to Protein Foldingdownloads.hindawi.com/journals/bmri/2013/583045.pdf · An Alternative Approach to Protein Folding ... e electrostatic force

4 BioMed Research International

1

09

08

07

06

05

04

03

02

01

008070605040302010

Radius (nm)

Prob

abili

ty d

ensit

y

1 bar100 bar

1000 barElectrostatic case

Figure 2The probability of a nonpolar region in water as a functionof pressure and electric field (as indicated) is shown Increasingelectric field significantly reduces the nonpolar region probability

a repulsive force sufficient to block the more outlying hydro-phobic residues fromapproaching one another Figure 2 illus-trates this electric fieldwith induced blocking Additionally itwas found that extremely large residue sizes could also inhibitalpha helix formation

The thermal force tends to randomize the net force ortorque acting on any given backbone atom pair Howeverearly in the folding the strong force between nearly spacedhydrophobic and electrostatic pairs overwhelms the thermalforce Consequently the simulation tends to hang up (repeat-edly returns to move on) a particular strong force generatingpair After a particular atom pair generates the largest torquefor five sequential time steps the simulation freezes themotion about this particular backbone atompair by removingits force (or torque) from the next time step query Therebythe simulation preserves the structure generated in five timesteps relative to the torques generated by the nearly spacedhydrophobic andor electrostatic pairs

It was found that almost all of the frozen backboneatomic pairs were part of secondary structures and that theseformed early in the folding process In this regard thesestructures are related to autonomous folding units [12] Inturn the relationship between strong forces on nearly spacedhydrophobic and electrostatic pairs can be directly correlatedto secondary structure as considered in the next section

Equation (9) is useful on all size scales On the smallestsize scale (9) can be the potential to generate secondarystructure On larger size scales the expression guides the gen-eration of tertiary structures by describing the force betweencharged regions andhydrophobic secondary structures Fromthe insights provided by this description a set of rules gov-erning secondary structure formation were developed

One of the attributes of the presented physical modelis that the workings that produce structure are open toinspection and understanding Accordingly examination ofsecondary structure in some cases time step by time steprevealed the nearly balanced interplay of the hydrophobicversus electrostatic elements of (9) Consequently two pathsfor improved secondary structure were presented First theconceptually simpler is the fine-tuning of the force equa-tions The second is that the recognition of nearly balancedhydrophobic and electrostatic forces imply secondary struc-ture formation Therefore an algorithm was developed tosearch for large nearly balanced force In this writing thesecond approach (described in the following section) is themore accurate Using the second approach requires the use ofa graphic program (such as Pymol) to insert the appropriatestructure into the 3D structure

3 A Streamlined Secondary StructurePrediction Model

Historically secondary structure prediction has advancedover the past five decades Secondary structure predictionintroduced in the 1960s and early 1970s [13ndash15] focused onidentifying alpha helices Beta sheet identification also relyingon statistical analysis [14 15] began in the 1970s Evolutionaryconservationmethods exploited the simultaneous assessmentof many homologous sequences to determine probabilisticrelation between protein sequence and secondary structure[16 17] Larger experimental structure databases and mod-ern machine learning methods have achieved sim80 overallsecondary structure prediction accuracy in globular proteins[18]

In the presented model the relative location of residuesand the hydrophobicpolar and charge characteristics ofeach residue are the only input elements The hydrophobiccharacter as a function of ambient can be found in theliterature (see eg [19 page 14 Table 12]) Also amino acidcharge and partial charge can be determined using popularsoftware suites including the (MOE) Molecular OperatingEnvironment software package which employs the standardAMBER 10 parameter set to calculate the force field [20] Insome cases (not reported here) extreme ambients such asvery low pH require appeal tomore sophisticated commercialsoftware suites

The algorithm for secondary structure prediction sys-tematically steps through an arbitrary amino acid sequenceWhen hydrophilic amino acids are encountered a mechan-ical set of queries determines if secondary structure ispresent These queries relate to the hydrophilic character ofthe following neighboring residues The simulation used bythis work employs secondary structure search beginning atthe occurrence of a hydrophilic residue after hydrophobicresidue and ending at the next hydrophobic residue Thecharge size and polarity of the intervening amino acidresidues determine both a secondary structure region and itstype A delicate balance of charge polarity and hydrophobiccharacter determines the secondary structures of protein

Upon encountering a hydrophilic amino acid (119899119894)a scan bracket is opened The following in the sequence

BioMed Research International 5

Table 1 Physical conditions from alpha helix identification

Case sum119899

119894=1119902119894 Π

119898

119895=119896119902119895 Other 119902119894 sum

119899

119894=1ℎ119894

119899 = 1

0 lt 119886 lt 02 gt0lt minus05 =0

1199021 gt 09 lt minus031 lt 119886 lt 05 lt0

119899 = 3gt1 1199022 = 1199023

|119886| lt 05 lt minus60

119899 = 5

03 lt 119886 lt 05

|119886| gt 10

119902119894 gt 0

|119902119894| gt 06

119902119894 partial charge of 119905th residue ℎ119894 hydrophobicity of 119894th residue119899 number of residue

(119899119894+1 119899119894+6) are queried unless the finding of a hydrophilicresidue ends the search (ie the end point (119899119894+119895) occurs atthe next hydrophilic residue) If no hydrophilic residues areencountered within the 119894 + 6 nearest neighboring residuesthe algorithms denote the region as unstructured and thenmove on to the next hydrophilic residue in the sequencewhere a new search begins Importantly in all cases wherea second hydrophilic residue (thus ending a search) isfound within the next six nearest neighbors a secondarystructure alpha helix will form in accordance with the rulessummarized in Table 1

Determination of the secondary structure type withina given sequence meeting the six nearest neighbors rulefollows a simple hierarchy First if the stringent conditionsfor hydrophobic collapse are unopposed by (sufficiently large)dielectric displacement forces an alpha helix will formSecond whenever alpha helix formation is blocked (egby the dielectric displacement force induced by interveningcharged residue) a beta sheet region will form Thereforealpha helix regions can be transformed into a beta sheet bymutations resulting in one or more charged residue(s) in thecritical areas between hydrophobic residues The formationof fibrils in Alzheimerrsquos patients is an example of a case wherea small mutation causes an alpha helix collapse [21]

The summation of charges (Table 1 column 2) deter-mines the overall charge and therefore the magnitudeof the electric field exerts repulsive force on hydrophobicresidues Theis force opposes hydrophobic residue aggrega-tion and therefore alpha helix cannot form The product ofcharge column 3 of Table 1 combined with the summationof charge relates to the magnitude of electrostatic attrac-tive (or repulsive) force The summation of hydrophobiccharacter (column 5 of Table 1 sum119899

1ℎ119894) is an indication of

the net hydrophobic attractive force operating within theregion Using the equalities and inequalities shown in Table 1very good predictions of secondary structure were obtained(Table 2) It is apparent that the algorithm is sensitive tovery small changes in charge andhydrophobic characteristicsThe above procedure for determining an alpha helix can beused to determine whether there is an alpha helix region inany portion of interest on the given amino acid sequence

In order to obtain all of the alpha helix regions the procedurecan be repeated by scanning through the entire amino acidsequenceThereafter the determination of a beta sheet regioncan be initiated

For determining beta sheet a residue on the aminoacid sequence is first selected If the residue is denotedunstructured (ie it is not previously determined to belongto an alpha helix or beta sheet region) the next residue isselected The procedure is repeated until an unstructuredresidue 119899119894 is encountered A scanning bracket is openedusing this unstructured residue as a starting residue and itsnext 4 consecutive residues (119899119894+1 through 119899119894+4) are queriedto determine if they are all unstructured If the answer isno the procedure is stopped Otherwise a scanning bracketof residues 119899119894 through 119899119894+4 is established Beta sheet deter-mination is then performed based on the summation of themagnitude of charges and the summation of the hydrophobiccharacter of each residue in the 5-residue bracket Forexample such a 5-residue bracket is determined to be a betasheet when sum119894+4

119894|119902119894| minus sum

119894+1

119894ℎ119894 lt 03 and sum119894+1

119894ℎ119894 gt 01 The

above procedure for beta sheet determination can be repeatedfor the entire amino acid sequence to obtain all of the betasheet structures on the sequence

Figure 3 shows secondary structure prediction to experi-ment for a wide cross-section of proteins The experimentalstructure was obtained from the Rost and Sander result[22] The algorithm found essentially all of the secondarystructures and correctly determined their character Theaccuracy of this procedure has been tested on hundreds ofproteins producing accuracies of sim70 plusmn 9 for alpha helixand sim66 plusmn 14 for beta sheet identification (residue-by-residue comparison between model and experimental forproper secondary structure assignment) Furthermore coresecondary structure identificationwas even greatersim75plusmn 7and sim70 plusmn 12 for helices and beta sheet respectively Theseare among themost accurate secondary structure predictionsto date

Table 2 compares the accuracy of the KM model (thekinetic model described here) relative to other popularmodels described in the literatureThe kinetic model demon-strated state-of-the-art accuracy in overall secondary struc-ture prediction and excellent alpha helix prediction Thecommercial PSIPRED [23] (is based on a statistical approach)also predicted secondary structure regionsrsquo size and locationextremely well in some cases but in others failed to identifythe presence of secondary structure The kinetic model gen-erally identified the presence of almost all of the structures butthe start and end points varied from those of the experiment

4 Tertiary Structure

While various forms of backbone atom tagging have beendone previously for example in coarse grain models theseapproaches differ from that used here As stated above thetagging used here involves assigning residue hydrophobicityand charge to the backbone atom bonded to the residue

In cases where the physical conditions are stronglyindicative of a particular secondary structure (eg alphahelix) the program directly inserts this secondary structure

6 BioMed Research International

Table 2 Comparison between KM and PSIPREDKM KM KM PSI PSI PSI

PDB all 120573 120572 all 120573 1205721hdn 738 785 8226 89 88 931ubq 7632 7273 9411 82 73 771vii 722 non 7895 80 non 1002nmq 6418 6538 non 60 54 non1pba 7404 non 7333 68 non 701aps 847 80 90 80 83 871aey 673 778 non 44 35 non1coa 4531 7105 6153 81 60 921fkb 6161 7297 100 88 90 1001mjc 7101 8064 non 69 70 non1nyf 5973 40 70 62 54 01pks 65 5357 6475 59 48 01shg 632 6123 non 42 25 non1srl 6556 75 100 53 37 01ten 7444 68 non 80 85 non1ycc 634 5858 6001 51 0 532ci2 6626 6434 7984 47 28 0Avg 6759 6799 7998 6676 5533 5169KM means the developed model and PSI means PSIPRED which is developed by the University of London Other names mean the percentage of predictionof total structure alpha-helix and beta-sheet with respect to two different methods To compute prediction rate NMR structure of each protein was used

without wasting computational resource to move atoms tothe correct position one-by-one That is the secondarystructure program is used to identify the strong propensityfor secondary structure formation and where indicated thisstructure is directly generated by graphical software forinstance Pymol and inserted into the tertiary structureOnce a secondary structure is so generated and inserted itis assumed immutable For the purpose of continued tertiarystructure folding the inserted immutable secondary struc-ture is tagged with its appropriate summed partial chargesand hydrophobicity of each secondary structure region Theportion of the protein not belonging to any determined alphahelix or beta sheet that is the unstructured portion can bebuilt as a linear chain or in an arbitrary physically permissibleconformation During tertiary structure simulation deter-mined secondary structures inner residues were frozen Dur-ing tertiary structure folding simulation secondary structureregions are treated as one residue

As illustrated in Figure 4 the relative motion of one partof a protein relative to the other parts is determined byallowing the alpha carbon bond pair having the largest nettorque (sum of actual force and the random effective thermalforce multiplied by the appropriate lever arm length) to movein each time step

The protein was allowed to drift and diffuse toward alower energy in accordance with the procedures describedabove The Markov simulation used here is summarizedin Figure 5 Starting with linear amino acid sequences theprotein molecule is allowed to drift and diffuse via rotationabout the torque rotation angle of pivot bonds as seen inFigure 4 The simulation allows the strongest force (sum ofthe actual force and the effective thermal force) to operate on

the protein for a time period 119905 using a mobility consistentwith (1) In cases where the global energy involves density-dependent entropy directions that increase entropy proceedwith a higher mobility thanmoves in directions that decreaseentropy

Figure 6 shows a comparison between simulated 3Dstructure of Villin (1VII) and the experiment The kineticmodel in this example produced near state-of-the-art foldedstructure with RMSD values for the backbone atoms of testproteins while using less than a minute of desktop computerCPU time In Table 3 various protein structures rangingfrom 30 to 157 amino acids were determined using thekinetic model Average RMSD value for this series relative toexperiment was found to be 49 plusmn 108 A

The largest protein reported here was the human proteintyrosine phosphatome with 320 amino acids The kineticmodel produced a RMSD value of sim8 A This value isreasonable relative to other current ab initio methods for aprotein of this size

It was found that 2sim3 improved RMSD values couldbe attained using a molecular dynamics structure relaxationThe finalMD relaxation step employed no statistical methodsor templates This relaxation employed nominal AMBER10default parameters for dielectric constant and other param-eters There was no attempt to optimize these parametersTherefore these energy relaxations could be carried outquickly (running AMBER on a supercomputer)

The energy reduction acting during folding could easilybe determined by tracking the force and distance (average)that accompany all folds or motions throughout the sim-ulation Figure 7 shows a typical energy versus time traceThe energy was calculated by assuming an arbitrary initial

BioMed Research International 7

0

10

20

30

40

50

60

70

80

90

0 50 100 150 200 250 300 350 400 450

Total structure prediction values

ldquoTotalrdquo

Number of residue

Pred

ictio

n (

)

(a)

0

20

40

60

80

100

120

0 50 100 150 200 250 300 350 400 450

Alpha structure prediction

ldquoAlpha helixrdquo

Number of residue

Pred

ictio

n (

)

(b)

0

20

40

60

80

100

120

0 50 100 150 200 250 300 350 400 450

Beta sheet structure prediction

Beta sheet

Number of residue

Pred

ictio

n (

)

(c)

Figure 3 The accuracy of the drift-diffusion model secondary prediction accuracy (percent of proteins tested) versus the size in number ofresidues (a) shows the accuracy for total protein core structures versus protein size (b) shows the accuracy of core alpha helix versus proteinsize and (c) shows the accuracy versus protein size Also shown in each case is the least mean square best fit to the data sets Data base sizesim100 proteins

energy and subtracting the work done 119865 sdot 120579 sdot 1199030 in each timestep where 119865 is the force applied (the sum of hydrophobicelectrostatic and thermal forces) 120579 is the calculated anglechange and 1199030 is the average lever arm length of the proteinInitially the energy swings are large and random but as thefolding proceeds these energy fluctuations are diminishedSince the folding is conducted at room temperature completecessation of motion does not occur

5 Result

A highly accurate secondary structure method has beendescribed The presented physical model predicts secondary

structure at least as well as advanced statistical basedmethodsrequiring known template structures The extension of themodel to tertiary structure dynamics with near state-of-the-art accuracy has also been described It was found thatthermal and repulsive electrostatic forces are sufficient toprevent unrealistic protein collapse

This high-speed physical model provides folding tra-jectory in real time with sensitivity to the environmentThereby doors to faster identification of function and agreater understanding of biological pathways are openedWhile some statisticalmethods canmatch both the speed andaccuracy of the kineticmodel it is important to recognize thatthe relating of structure to function can only be guided by anaccurate physical description of the forces that shape proteins

8 BioMed Research International

Torque forces

Pivotbond

applied to lever arms

120579

R

120579 asymp120583120513120601energy 120591

r

Figure 4 Illustrated is the angle rotation (120579 or 119877) about analpha carbon backbone bond pair induced by the resolved torqueoperating on the bond In this figure each circle represents 119862120572 atomfor each amino acid and big circles illustrate side chains which havestrong charges andor large hydrophobic forces

Table 3 RMSD result for tertiary structures

Proteins No RES Score for 2D RMSD of 3D2MHU 30 733 31VII 35 722 481CBH 36 753 433RNT 54 6165 411DUR 55 7445 371OVO 56 539 581BW6 56 70 612NMQ 57 6418 582UTG 70 6285 381UBQ 76 7632 542PCY 99 611 45CYT 103 6894 477RSA 124 737 511PAB 127 7404 422CCY 128 8156 632SNS 149 5402 671AAQ 157 73 62AVG 8306 6994 494A tertiary structure prediction result based on the kinetic model RMSDvalues are based on each backbone atom position comparison for tertiarystructure prediction Also a score of 2D means the matching percentage forthe secondary structure determination

6 Discussion and Conclusion

The era where protein folding can be tracked as a functionof ambient condition without templates or other a prioriknowledge has begun The accuracy of the presented all-physical model rivals the best statistical methods in sec-ondary structure and secondary core structure prediction

The fine-tuning of a secondary structure algorithm can beimproved by controlling some environment conditions forinstance pH Tertiary structure predictions do not match thebest statistical andor the best molecular dynamics models(when provided with suitable templates) However the accu-racy is amongst the best ab initiomethods Further increasedaccuracy (2sim3) could be obtained using a finishing MD-based energy relaxation step

For example the Villin headpiece a well-studied mod-erately small protein has been studied as a function oftemperature Hansmann and coworkers [24ndash26] achievedvery small RSMD values in the range of sim30 A (sim18 A for thecore region and 37 A for the entire protein) when comparedwith the NMR determined structure for this protein Incomparison the all-physical model presented here achievedRMSD values only in the 37 A range

It is also important to recognize that no model canexceed the accuracy of the measurements used to determinethe experimental structure The Villin headpiece proteinexperiment has a sim18 A inherent uncertainty It is oftenchallenging to obtain all of the ambient conditions (includingtemperature pH and process steps that may alter residuecharge) used for a given experimental structure determi-nation On the other hand the model can be used totrack structure change due to small changes in ambient(eg pH or temperature) Some MD-based modelings avoidthis problem by using a seed or template protein structureobtained under similar experimental conditions Howeversuch proceduremay restrict themodeling applications in realworld situations The kinetic model is expected to continueto improve and become a useful tool for the investigationof protein structure especially where there is little a prioristructure knowledge a need to elucidate the folding pathwayandor required high speed

Owing to the speed of the physical model the proteinfolding dynamic can be traced Here the protein energy isseen to randomly vary initially folioed by secondary for-mation and finally convergence to a more definite structurewith decreasing energy variations as folding precedes Inagreement with experiment the simulations tend towarddefinite structure but do not reach complete stasis

Alternative high-speed computational methods such asEVFold [27] enablewidespread access to fast prediction basedon statistical analysis of homologous sequences Even thoughthe two approaches are based on very different paradigmstogether and with other contributions there is an emergingand improving ability to predict protein structures withoutthe need to become experts of a particular approach

To illustrate guidance provided by the physical modelconsider the mechanisms by which the chaperone-subunitcomplex of a diverse group of Gram-negative bacteria [28]gains passage through the Papc usher channel at the outermembrane Here an approaching chaperone complex trig-gers a response by which outer membrane usher passageunblocks Applying the discussed concepts to the block-ing structure it is recognized that the highly hydrophobicblocking structure (since it is comprised of alpha helices)always moves towards lower electric field (second term onthe right of (8)) The usher channel interior (beta barrel) is

BioMed Research International 9

Start

Secondary structure generation

Assign secondary structure properties

to C120572-atom (charges

hydrophobicity)

Structural forces

calculation

Calculate potential angle

to rotate

Bond rotation

Check global entropy based on

volume

End

Figure 5 Illustrated is a simplified flow chart of the Markov simulation used in physical-kinetic simulation for tertiary structure

Figure 6 Comparison of a tertiary structure of Villin (1VII) asdetermined by experiment (NMR structure blue) with a modelgenerated structure (simulation result black)

highly charged giving rise to the possibility that the blockingstructure is moved to the side of the channel where theelectrical field is smallest and whenever an approachingcharged structure in combination with the channel chargeproduces an appropriate small combined electric field Suchinsights may lead to new strategies for drug delivery

Acknowledgments

The authors thank John Coleman Carlos Simmerling andProfessor Tanassi for many useful discussions The authorsalso thank Solar Physics Inc NYS Sensor Center for

0

1000

2000

3000

4000

5000

6000Free energytime

ldquoFree energytimerdquo

1 101 201 301 401 501 601 701 801 901 1001 1101 1201 1301

Figure 7 Free energy trajectory (calMol) during Villin simulation

Advanced Technology and Idalia Solar Technologies forpartial support of this work and Marc Dee for his assistancewith paper preparation

References

[1] K A Dill S B Ozkan T R Weikl J D Chodera and V AVoelz ldquoThe protein folding problem when will it be solvedrdquoCurrent Opinion in Structural Biology vol 17 no 3 pp 342ndash3462007

[2] A C Clark ldquoProtein folding are we there yetrdquo Archives ofBiochemistry and Biophysics vol 469 no 1 pp 1ndash3 2008

10 BioMed Research International

[3] C Hardin T V Pogorelov and Z Luthey-Schulten ldquoAb initioprotein structure predictionrdquo Current Opinion in StructuralBiology vol 12 no 2 pp 176ndash181 2002

[4] httppredictioncenterorgcasp9[5] C R Barrett W D Nix Tetelman and S Alan Principles of

Engineering Materials Prentice Hall New York NY USA 1973[6] J W Moore and R G Pearson Kinetics and Mechanisms John

Wiley amp Sons New York NY USA 3rd edition 1981[7] Y Kang E Jaen and C M Fortmann ldquoEinstein relations for

energy coupled particle systemsrdquoApplied Physics Letters vol 88no 11 Article ID 112110 2006

[8] Y Kang and C M Fortmann ldquoA structural basis for theHodgkin and Huxley relationrdquo Applied Physics Letters vol 91no 22 Article ID 223903 2007

[9] D Bashford and D A Case ldquoGeneralized born models ofmacromolecular solvation effectsrdquo Annual Review of PhysicalChemistry vol 51 pp 129ndash152 2000

[10] J D Jackson Classical Electrodynamics Wiley amp Sons NewYork NY USA 3rd edition 1999

[11] M E Glicksman Di Usion in Solids John Wiley amp Sons NewYork NY USA 2000

[12] Z Peng and L C Wu ldquoAutonomous protein folding unitsrdquoAdvanced in Protein Chemistry vol 53 pp 1ndash30 2000

[13] P Y Chou and G D Fasman ldquoPrediction of protein conforma-tionrdquo Biochemistry vol 13 no 2 pp 222ndash245 1974

[14] V I Lim ldquoStructural principles of the globular Establishinghomologies in protein sequencesrdquo Journal of Molecular Biologyvol 88 pp 857ndash873 1984

[15] J Garnier D J Osguthorpe and B Robson ldquoAnalysis of theaccuracy and implications of simple methods for predicting thesecondary structure of globular proteinsrdquo Journal of MolecularBiology vol 120 no 1 pp 97ndash120 1978

[16] G Deleage and B Roux ldquoAn algorithm for protein secondarystructure prediction based on class predictionrdquo Protein Engi-neering vol 1 no 4 pp 289ndash294 1987

[17] S R Presnell B I Cohen and F E Cohen ldquoA segment-based approach to protein secondary structure predictionrdquoBiochemistry vol 31 no 4 pp 983ndash993 1992

[18] H H L Howard Holley l and M Karplus ldquoProtein secondarystructure prediction with a neural networkrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 86 no 1 pp 152ndash156 1989

[19] R A CopelandMethods for Protein Analysis A practical Guideto laboratory protocols Chapman amp Hall New York NY USA1994

[20] MOE Chemical Computing Group 1010 Sherbrooke St WSuite 910 Montreal Quebec Canada G3A 2R7 httpwwwchemcompcom

[21] MGuo PMGormanM Rico A Chakrabartty andDV Lau-rents ldquoCharge substitution shows that repulsive electrostaticinteractions impede the oligomerization of Alzheimer amyloidpeptidesrdquo FEBS Letters vol 579 no 17 pp 3574ndash3578 2005

[22] B Rost and C Sander ldquoPrediction of protein secondary struc-ture at better than 70 accuracyrdquo Journal of Molecular Biologyvol 232 no 2 pp 584ndash599 1993

[23] httpbioinfcsuclacukpsipred[24] C-Y Lin C-K Hu and U H E Hansmann ldquoParallel temper-

ing simulations of HP-36rdquo Proteins vol 52 no 3 pp 436ndash4452003

[25] B S Kinnear M F Jarrold and U H E Hansmann ldquoAll-atomgeneralized-ensemble simulations of small proteinsrdquo Journal ofMolecular Graphics and Modelling vol 22 no 5 pp 397ndash4032004

[26] W Kwak andU H E Hansmann ldquoEfficient sampling of proteinstructures by model hoppingrdquo Physical Review Letters vol 95no 13 Article ID 138102 4 pages 2005

[27] D S Marks L J Colwell R Sheridan et al ldquoProtein 3D struc-ture computed from evolutionary sequence variationrdquo PLoSONE vol 6 no 12 Article ID e28766 2011

[28] H Remaut C Tang N S Henderson et al ldquoFiber formationacross the bacterial outer membrane by the chaperoneusherpathwayrdquo Cell vol 133 no 4 pp 640ndash652 2008

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 5: Research Article An Alternative Approach to Protein Foldingdownloads.hindawi.com/journals/bmri/2013/583045.pdf · An Alternative Approach to Protein Folding ... e electrostatic force

BioMed Research International 5

Table 1 Physical conditions from alpha helix identification

Case sum119899

119894=1119902119894 Π

119898

119895=119896119902119895 Other 119902119894 sum

119899

119894=1ℎ119894

119899 = 1

0 lt 119886 lt 02 gt0lt minus05 =0

1199021 gt 09 lt minus031 lt 119886 lt 05 lt0

119899 = 3gt1 1199022 = 1199023

|119886| lt 05 lt minus60

119899 = 5

03 lt 119886 lt 05

|119886| gt 10

119902119894 gt 0

|119902119894| gt 06

119902119894 partial charge of 119905th residue ℎ119894 hydrophobicity of 119894th residue119899 number of residue

(119899119894+1 119899119894+6) are queried unless the finding of a hydrophilicresidue ends the search (ie the end point (119899119894+119895) occurs atthe next hydrophilic residue) If no hydrophilic residues areencountered within the 119894 + 6 nearest neighboring residuesthe algorithms denote the region as unstructured and thenmove on to the next hydrophilic residue in the sequencewhere a new search begins Importantly in all cases wherea second hydrophilic residue (thus ending a search) isfound within the next six nearest neighbors a secondarystructure alpha helix will form in accordance with the rulessummarized in Table 1

Determination of the secondary structure type withina given sequence meeting the six nearest neighbors rulefollows a simple hierarchy First if the stringent conditionsfor hydrophobic collapse are unopposed by (sufficiently large)dielectric displacement forces an alpha helix will formSecond whenever alpha helix formation is blocked (egby the dielectric displacement force induced by interveningcharged residue) a beta sheet region will form Thereforealpha helix regions can be transformed into a beta sheet bymutations resulting in one or more charged residue(s) in thecritical areas between hydrophobic residues The formationof fibrils in Alzheimerrsquos patients is an example of a case wherea small mutation causes an alpha helix collapse [21]

The summation of charges (Table 1 column 2) deter-mines the overall charge and therefore the magnitudeof the electric field exerts repulsive force on hydrophobicresidues Theis force opposes hydrophobic residue aggrega-tion and therefore alpha helix cannot form The product ofcharge column 3 of Table 1 combined with the summationof charge relates to the magnitude of electrostatic attrac-tive (or repulsive) force The summation of hydrophobiccharacter (column 5 of Table 1 sum119899

1ℎ119894) is an indication of

the net hydrophobic attractive force operating within theregion Using the equalities and inequalities shown in Table 1very good predictions of secondary structure were obtained(Table 2) It is apparent that the algorithm is sensitive tovery small changes in charge andhydrophobic characteristicsThe above procedure for determining an alpha helix can beused to determine whether there is an alpha helix region inany portion of interest on the given amino acid sequence

In order to obtain all of the alpha helix regions the procedurecan be repeated by scanning through the entire amino acidsequenceThereafter the determination of a beta sheet regioncan be initiated

For determining beta sheet a residue on the aminoacid sequence is first selected If the residue is denotedunstructured (ie it is not previously determined to belongto an alpha helix or beta sheet region) the next residue isselected The procedure is repeated until an unstructuredresidue 119899119894 is encountered A scanning bracket is openedusing this unstructured residue as a starting residue and itsnext 4 consecutive residues (119899119894+1 through 119899119894+4) are queriedto determine if they are all unstructured If the answer isno the procedure is stopped Otherwise a scanning bracketof residues 119899119894 through 119899119894+4 is established Beta sheet deter-mination is then performed based on the summation of themagnitude of charges and the summation of the hydrophobiccharacter of each residue in the 5-residue bracket Forexample such a 5-residue bracket is determined to be a betasheet when sum119894+4

119894|119902119894| minus sum

119894+1

119894ℎ119894 lt 03 and sum119894+1

119894ℎ119894 gt 01 The

above procedure for beta sheet determination can be repeatedfor the entire amino acid sequence to obtain all of the betasheet structures on the sequence

Figure 3 shows secondary structure prediction to experi-ment for a wide cross-section of proteins The experimentalstructure was obtained from the Rost and Sander result[22] The algorithm found essentially all of the secondarystructures and correctly determined their character Theaccuracy of this procedure has been tested on hundreds ofproteins producing accuracies of sim70 plusmn 9 for alpha helixand sim66 plusmn 14 for beta sheet identification (residue-by-residue comparison between model and experimental forproper secondary structure assignment) Furthermore coresecondary structure identificationwas even greatersim75plusmn 7and sim70 plusmn 12 for helices and beta sheet respectively Theseare among themost accurate secondary structure predictionsto date

Table 2 compares the accuracy of the KM model (thekinetic model described here) relative to other popularmodels described in the literatureThe kinetic model demon-strated state-of-the-art accuracy in overall secondary struc-ture prediction and excellent alpha helix prediction Thecommercial PSIPRED [23] (is based on a statistical approach)also predicted secondary structure regionsrsquo size and locationextremely well in some cases but in others failed to identifythe presence of secondary structure The kinetic model gen-erally identified the presence of almost all of the structures butthe start and end points varied from those of the experiment

4 Tertiary Structure

While various forms of backbone atom tagging have beendone previously for example in coarse grain models theseapproaches differ from that used here As stated above thetagging used here involves assigning residue hydrophobicityand charge to the backbone atom bonded to the residue

In cases where the physical conditions are stronglyindicative of a particular secondary structure (eg alphahelix) the program directly inserts this secondary structure

6 BioMed Research International

Table 2 Comparison between KM and PSIPREDKM KM KM PSI PSI PSI

PDB all 120573 120572 all 120573 1205721hdn 738 785 8226 89 88 931ubq 7632 7273 9411 82 73 771vii 722 non 7895 80 non 1002nmq 6418 6538 non 60 54 non1pba 7404 non 7333 68 non 701aps 847 80 90 80 83 871aey 673 778 non 44 35 non1coa 4531 7105 6153 81 60 921fkb 6161 7297 100 88 90 1001mjc 7101 8064 non 69 70 non1nyf 5973 40 70 62 54 01pks 65 5357 6475 59 48 01shg 632 6123 non 42 25 non1srl 6556 75 100 53 37 01ten 7444 68 non 80 85 non1ycc 634 5858 6001 51 0 532ci2 6626 6434 7984 47 28 0Avg 6759 6799 7998 6676 5533 5169KM means the developed model and PSI means PSIPRED which is developed by the University of London Other names mean the percentage of predictionof total structure alpha-helix and beta-sheet with respect to two different methods To compute prediction rate NMR structure of each protein was used

without wasting computational resource to move atoms tothe correct position one-by-one That is the secondarystructure program is used to identify the strong propensityfor secondary structure formation and where indicated thisstructure is directly generated by graphical software forinstance Pymol and inserted into the tertiary structureOnce a secondary structure is so generated and inserted itis assumed immutable For the purpose of continued tertiarystructure folding the inserted immutable secondary struc-ture is tagged with its appropriate summed partial chargesand hydrophobicity of each secondary structure region Theportion of the protein not belonging to any determined alphahelix or beta sheet that is the unstructured portion can bebuilt as a linear chain or in an arbitrary physically permissibleconformation During tertiary structure simulation deter-mined secondary structures inner residues were frozen Dur-ing tertiary structure folding simulation secondary structureregions are treated as one residue

As illustrated in Figure 4 the relative motion of one partof a protein relative to the other parts is determined byallowing the alpha carbon bond pair having the largest nettorque (sum of actual force and the random effective thermalforce multiplied by the appropriate lever arm length) to movein each time step

The protein was allowed to drift and diffuse toward alower energy in accordance with the procedures describedabove The Markov simulation used here is summarizedin Figure 5 Starting with linear amino acid sequences theprotein molecule is allowed to drift and diffuse via rotationabout the torque rotation angle of pivot bonds as seen inFigure 4 The simulation allows the strongest force (sum ofthe actual force and the effective thermal force) to operate on

the protein for a time period 119905 using a mobility consistentwith (1) In cases where the global energy involves density-dependent entropy directions that increase entropy proceedwith a higher mobility thanmoves in directions that decreaseentropy

Figure 6 shows a comparison between simulated 3Dstructure of Villin (1VII) and the experiment The kineticmodel in this example produced near state-of-the-art foldedstructure with RMSD values for the backbone atoms of testproteins while using less than a minute of desktop computerCPU time In Table 3 various protein structures rangingfrom 30 to 157 amino acids were determined using thekinetic model Average RMSD value for this series relative toexperiment was found to be 49 plusmn 108 A

The largest protein reported here was the human proteintyrosine phosphatome with 320 amino acids The kineticmodel produced a RMSD value of sim8 A This value isreasonable relative to other current ab initio methods for aprotein of this size

It was found that 2sim3 improved RMSD values couldbe attained using a molecular dynamics structure relaxationThe finalMD relaxation step employed no statistical methodsor templates This relaxation employed nominal AMBER10default parameters for dielectric constant and other param-eters There was no attempt to optimize these parametersTherefore these energy relaxations could be carried outquickly (running AMBER on a supercomputer)

The energy reduction acting during folding could easilybe determined by tracking the force and distance (average)that accompany all folds or motions throughout the sim-ulation Figure 7 shows a typical energy versus time traceThe energy was calculated by assuming an arbitrary initial

BioMed Research International 7

0

10

20

30

40

50

60

70

80

90

0 50 100 150 200 250 300 350 400 450

Total structure prediction values

ldquoTotalrdquo

Number of residue

Pred

ictio

n (

)

(a)

0

20

40

60

80

100

120

0 50 100 150 200 250 300 350 400 450

Alpha structure prediction

ldquoAlpha helixrdquo

Number of residue

Pred

ictio

n (

)

(b)

0

20

40

60

80

100

120

0 50 100 150 200 250 300 350 400 450

Beta sheet structure prediction

Beta sheet

Number of residue

Pred

ictio

n (

)

(c)

Figure 3 The accuracy of the drift-diffusion model secondary prediction accuracy (percent of proteins tested) versus the size in number ofresidues (a) shows the accuracy for total protein core structures versus protein size (b) shows the accuracy of core alpha helix versus proteinsize and (c) shows the accuracy versus protein size Also shown in each case is the least mean square best fit to the data sets Data base sizesim100 proteins

energy and subtracting the work done 119865 sdot 120579 sdot 1199030 in each timestep where 119865 is the force applied (the sum of hydrophobicelectrostatic and thermal forces) 120579 is the calculated anglechange and 1199030 is the average lever arm length of the proteinInitially the energy swings are large and random but as thefolding proceeds these energy fluctuations are diminishedSince the folding is conducted at room temperature completecessation of motion does not occur

5 Result

A highly accurate secondary structure method has beendescribed The presented physical model predicts secondary

structure at least as well as advanced statistical basedmethodsrequiring known template structures The extension of themodel to tertiary structure dynamics with near state-of-the-art accuracy has also been described It was found thatthermal and repulsive electrostatic forces are sufficient toprevent unrealistic protein collapse

This high-speed physical model provides folding tra-jectory in real time with sensitivity to the environmentThereby doors to faster identification of function and agreater understanding of biological pathways are openedWhile some statisticalmethods canmatch both the speed andaccuracy of the kineticmodel it is important to recognize thatthe relating of structure to function can only be guided by anaccurate physical description of the forces that shape proteins

8 BioMed Research International

Torque forces

Pivotbond

applied to lever arms

120579

R

120579 asymp120583120513120601energy 120591

r

Figure 4 Illustrated is the angle rotation (120579 or 119877) about analpha carbon backbone bond pair induced by the resolved torqueoperating on the bond In this figure each circle represents 119862120572 atomfor each amino acid and big circles illustrate side chains which havestrong charges andor large hydrophobic forces

Table 3 RMSD result for tertiary structures

Proteins No RES Score for 2D RMSD of 3D2MHU 30 733 31VII 35 722 481CBH 36 753 433RNT 54 6165 411DUR 55 7445 371OVO 56 539 581BW6 56 70 612NMQ 57 6418 582UTG 70 6285 381UBQ 76 7632 542PCY 99 611 45CYT 103 6894 477RSA 124 737 511PAB 127 7404 422CCY 128 8156 632SNS 149 5402 671AAQ 157 73 62AVG 8306 6994 494A tertiary structure prediction result based on the kinetic model RMSDvalues are based on each backbone atom position comparison for tertiarystructure prediction Also a score of 2D means the matching percentage forthe secondary structure determination

6 Discussion and Conclusion

The era where protein folding can be tracked as a functionof ambient condition without templates or other a prioriknowledge has begun The accuracy of the presented all-physical model rivals the best statistical methods in sec-ondary structure and secondary core structure prediction

The fine-tuning of a secondary structure algorithm can beimproved by controlling some environment conditions forinstance pH Tertiary structure predictions do not match thebest statistical andor the best molecular dynamics models(when provided with suitable templates) However the accu-racy is amongst the best ab initiomethods Further increasedaccuracy (2sim3) could be obtained using a finishing MD-based energy relaxation step

For example the Villin headpiece a well-studied mod-erately small protein has been studied as a function oftemperature Hansmann and coworkers [24ndash26] achievedvery small RSMD values in the range of sim30 A (sim18 A for thecore region and 37 A for the entire protein) when comparedwith the NMR determined structure for this protein Incomparison the all-physical model presented here achievedRMSD values only in the 37 A range

It is also important to recognize that no model canexceed the accuracy of the measurements used to determinethe experimental structure The Villin headpiece proteinexperiment has a sim18 A inherent uncertainty It is oftenchallenging to obtain all of the ambient conditions (includingtemperature pH and process steps that may alter residuecharge) used for a given experimental structure determi-nation On the other hand the model can be used totrack structure change due to small changes in ambient(eg pH or temperature) Some MD-based modelings avoidthis problem by using a seed or template protein structureobtained under similar experimental conditions Howeversuch proceduremay restrict themodeling applications in realworld situations The kinetic model is expected to continueto improve and become a useful tool for the investigationof protein structure especially where there is little a prioristructure knowledge a need to elucidate the folding pathwayandor required high speed

Owing to the speed of the physical model the proteinfolding dynamic can be traced Here the protein energy isseen to randomly vary initially folioed by secondary for-mation and finally convergence to a more definite structurewith decreasing energy variations as folding precedes Inagreement with experiment the simulations tend towarddefinite structure but do not reach complete stasis

Alternative high-speed computational methods such asEVFold [27] enablewidespread access to fast prediction basedon statistical analysis of homologous sequences Even thoughthe two approaches are based on very different paradigmstogether and with other contributions there is an emergingand improving ability to predict protein structures withoutthe need to become experts of a particular approach

To illustrate guidance provided by the physical modelconsider the mechanisms by which the chaperone-subunitcomplex of a diverse group of Gram-negative bacteria [28]gains passage through the Papc usher channel at the outermembrane Here an approaching chaperone complex trig-gers a response by which outer membrane usher passageunblocks Applying the discussed concepts to the block-ing structure it is recognized that the highly hydrophobicblocking structure (since it is comprised of alpha helices)always moves towards lower electric field (second term onthe right of (8)) The usher channel interior (beta barrel) is

BioMed Research International 9

Start

Secondary structure generation

Assign secondary structure properties

to C120572-atom (charges

hydrophobicity)

Structural forces

calculation

Calculate potential angle

to rotate

Bond rotation

Check global entropy based on

volume

End

Figure 5 Illustrated is a simplified flow chart of the Markov simulation used in physical-kinetic simulation for tertiary structure

Figure 6 Comparison of a tertiary structure of Villin (1VII) asdetermined by experiment (NMR structure blue) with a modelgenerated structure (simulation result black)

highly charged giving rise to the possibility that the blockingstructure is moved to the side of the channel where theelectrical field is smallest and whenever an approachingcharged structure in combination with the channel chargeproduces an appropriate small combined electric field Suchinsights may lead to new strategies for drug delivery

Acknowledgments

The authors thank John Coleman Carlos Simmerling andProfessor Tanassi for many useful discussions The authorsalso thank Solar Physics Inc NYS Sensor Center for

0

1000

2000

3000

4000

5000

6000Free energytime

ldquoFree energytimerdquo

1 101 201 301 401 501 601 701 801 901 1001 1101 1201 1301

Figure 7 Free energy trajectory (calMol) during Villin simulation

Advanced Technology and Idalia Solar Technologies forpartial support of this work and Marc Dee for his assistancewith paper preparation

References

[1] K A Dill S B Ozkan T R Weikl J D Chodera and V AVoelz ldquoThe protein folding problem when will it be solvedrdquoCurrent Opinion in Structural Biology vol 17 no 3 pp 342ndash3462007

[2] A C Clark ldquoProtein folding are we there yetrdquo Archives ofBiochemistry and Biophysics vol 469 no 1 pp 1ndash3 2008

10 BioMed Research International

[3] C Hardin T V Pogorelov and Z Luthey-Schulten ldquoAb initioprotein structure predictionrdquo Current Opinion in StructuralBiology vol 12 no 2 pp 176ndash181 2002

[4] httppredictioncenterorgcasp9[5] C R Barrett W D Nix Tetelman and S Alan Principles of

Engineering Materials Prentice Hall New York NY USA 1973[6] J W Moore and R G Pearson Kinetics and Mechanisms John

Wiley amp Sons New York NY USA 3rd edition 1981[7] Y Kang E Jaen and C M Fortmann ldquoEinstein relations for

energy coupled particle systemsrdquoApplied Physics Letters vol 88no 11 Article ID 112110 2006

[8] Y Kang and C M Fortmann ldquoA structural basis for theHodgkin and Huxley relationrdquo Applied Physics Letters vol 91no 22 Article ID 223903 2007

[9] D Bashford and D A Case ldquoGeneralized born models ofmacromolecular solvation effectsrdquo Annual Review of PhysicalChemistry vol 51 pp 129ndash152 2000

[10] J D Jackson Classical Electrodynamics Wiley amp Sons NewYork NY USA 3rd edition 1999

[11] M E Glicksman Di Usion in Solids John Wiley amp Sons NewYork NY USA 2000

[12] Z Peng and L C Wu ldquoAutonomous protein folding unitsrdquoAdvanced in Protein Chemistry vol 53 pp 1ndash30 2000

[13] P Y Chou and G D Fasman ldquoPrediction of protein conforma-tionrdquo Biochemistry vol 13 no 2 pp 222ndash245 1974

[14] V I Lim ldquoStructural principles of the globular Establishinghomologies in protein sequencesrdquo Journal of Molecular Biologyvol 88 pp 857ndash873 1984

[15] J Garnier D J Osguthorpe and B Robson ldquoAnalysis of theaccuracy and implications of simple methods for predicting thesecondary structure of globular proteinsrdquo Journal of MolecularBiology vol 120 no 1 pp 97ndash120 1978

[16] G Deleage and B Roux ldquoAn algorithm for protein secondarystructure prediction based on class predictionrdquo Protein Engi-neering vol 1 no 4 pp 289ndash294 1987

[17] S R Presnell B I Cohen and F E Cohen ldquoA segment-based approach to protein secondary structure predictionrdquoBiochemistry vol 31 no 4 pp 983ndash993 1992

[18] H H L Howard Holley l and M Karplus ldquoProtein secondarystructure prediction with a neural networkrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 86 no 1 pp 152ndash156 1989

[19] R A CopelandMethods for Protein Analysis A practical Guideto laboratory protocols Chapman amp Hall New York NY USA1994

[20] MOE Chemical Computing Group 1010 Sherbrooke St WSuite 910 Montreal Quebec Canada G3A 2R7 httpwwwchemcompcom

[21] MGuo PMGormanM Rico A Chakrabartty andDV Lau-rents ldquoCharge substitution shows that repulsive electrostaticinteractions impede the oligomerization of Alzheimer amyloidpeptidesrdquo FEBS Letters vol 579 no 17 pp 3574ndash3578 2005

[22] B Rost and C Sander ldquoPrediction of protein secondary struc-ture at better than 70 accuracyrdquo Journal of Molecular Biologyvol 232 no 2 pp 584ndash599 1993

[23] httpbioinfcsuclacukpsipred[24] C-Y Lin C-K Hu and U H E Hansmann ldquoParallel temper-

ing simulations of HP-36rdquo Proteins vol 52 no 3 pp 436ndash4452003

[25] B S Kinnear M F Jarrold and U H E Hansmann ldquoAll-atomgeneralized-ensemble simulations of small proteinsrdquo Journal ofMolecular Graphics and Modelling vol 22 no 5 pp 397ndash4032004

[26] W Kwak andU H E Hansmann ldquoEfficient sampling of proteinstructures by model hoppingrdquo Physical Review Letters vol 95no 13 Article ID 138102 4 pages 2005

[27] D S Marks L J Colwell R Sheridan et al ldquoProtein 3D struc-ture computed from evolutionary sequence variationrdquo PLoSONE vol 6 no 12 Article ID e28766 2011

[28] H Remaut C Tang N S Henderson et al ldquoFiber formationacross the bacterial outer membrane by the chaperoneusherpathwayrdquo Cell vol 133 no 4 pp 640ndash652 2008

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 6: Research Article An Alternative Approach to Protein Foldingdownloads.hindawi.com/journals/bmri/2013/583045.pdf · An Alternative Approach to Protein Folding ... e electrostatic force

6 BioMed Research International

Table 2 Comparison between KM and PSIPREDKM KM KM PSI PSI PSI

PDB all 120573 120572 all 120573 1205721hdn 738 785 8226 89 88 931ubq 7632 7273 9411 82 73 771vii 722 non 7895 80 non 1002nmq 6418 6538 non 60 54 non1pba 7404 non 7333 68 non 701aps 847 80 90 80 83 871aey 673 778 non 44 35 non1coa 4531 7105 6153 81 60 921fkb 6161 7297 100 88 90 1001mjc 7101 8064 non 69 70 non1nyf 5973 40 70 62 54 01pks 65 5357 6475 59 48 01shg 632 6123 non 42 25 non1srl 6556 75 100 53 37 01ten 7444 68 non 80 85 non1ycc 634 5858 6001 51 0 532ci2 6626 6434 7984 47 28 0Avg 6759 6799 7998 6676 5533 5169KM means the developed model and PSI means PSIPRED which is developed by the University of London Other names mean the percentage of predictionof total structure alpha-helix and beta-sheet with respect to two different methods To compute prediction rate NMR structure of each protein was used

without wasting computational resource to move atoms tothe correct position one-by-one That is the secondarystructure program is used to identify the strong propensityfor secondary structure formation and where indicated thisstructure is directly generated by graphical software forinstance Pymol and inserted into the tertiary structureOnce a secondary structure is so generated and inserted itis assumed immutable For the purpose of continued tertiarystructure folding the inserted immutable secondary struc-ture is tagged with its appropriate summed partial chargesand hydrophobicity of each secondary structure region Theportion of the protein not belonging to any determined alphahelix or beta sheet that is the unstructured portion can bebuilt as a linear chain or in an arbitrary physically permissibleconformation During tertiary structure simulation deter-mined secondary structures inner residues were frozen Dur-ing tertiary structure folding simulation secondary structureregions are treated as one residue

As illustrated in Figure 4 the relative motion of one partof a protein relative to the other parts is determined byallowing the alpha carbon bond pair having the largest nettorque (sum of actual force and the random effective thermalforce multiplied by the appropriate lever arm length) to movein each time step

The protein was allowed to drift and diffuse toward alower energy in accordance with the procedures describedabove The Markov simulation used here is summarizedin Figure 5 Starting with linear amino acid sequences theprotein molecule is allowed to drift and diffuse via rotationabout the torque rotation angle of pivot bonds as seen inFigure 4 The simulation allows the strongest force (sum ofthe actual force and the effective thermal force) to operate on

the protein for a time period 119905 using a mobility consistentwith (1) In cases where the global energy involves density-dependent entropy directions that increase entropy proceedwith a higher mobility thanmoves in directions that decreaseentropy

Figure 6 shows a comparison between simulated 3Dstructure of Villin (1VII) and the experiment The kineticmodel in this example produced near state-of-the-art foldedstructure with RMSD values for the backbone atoms of testproteins while using less than a minute of desktop computerCPU time In Table 3 various protein structures rangingfrom 30 to 157 amino acids were determined using thekinetic model Average RMSD value for this series relative toexperiment was found to be 49 plusmn 108 A

The largest protein reported here was the human proteintyrosine phosphatome with 320 amino acids The kineticmodel produced a RMSD value of sim8 A This value isreasonable relative to other current ab initio methods for aprotein of this size

It was found that 2sim3 improved RMSD values couldbe attained using a molecular dynamics structure relaxationThe finalMD relaxation step employed no statistical methodsor templates This relaxation employed nominal AMBER10default parameters for dielectric constant and other param-eters There was no attempt to optimize these parametersTherefore these energy relaxations could be carried outquickly (running AMBER on a supercomputer)

The energy reduction acting during folding could easilybe determined by tracking the force and distance (average)that accompany all folds or motions throughout the sim-ulation Figure 7 shows a typical energy versus time traceThe energy was calculated by assuming an arbitrary initial

BioMed Research International 7

0

10

20

30

40

50

60

70

80

90

0 50 100 150 200 250 300 350 400 450

Total structure prediction values

ldquoTotalrdquo

Number of residue

Pred

ictio

n (

)

(a)

0

20

40

60

80

100

120

0 50 100 150 200 250 300 350 400 450

Alpha structure prediction

ldquoAlpha helixrdquo

Number of residue

Pred

ictio

n (

)

(b)

0

20

40

60

80

100

120

0 50 100 150 200 250 300 350 400 450

Beta sheet structure prediction

Beta sheet

Number of residue

Pred

ictio

n (

)

(c)

Figure 3 The accuracy of the drift-diffusion model secondary prediction accuracy (percent of proteins tested) versus the size in number ofresidues (a) shows the accuracy for total protein core structures versus protein size (b) shows the accuracy of core alpha helix versus proteinsize and (c) shows the accuracy versus protein size Also shown in each case is the least mean square best fit to the data sets Data base sizesim100 proteins

energy and subtracting the work done 119865 sdot 120579 sdot 1199030 in each timestep where 119865 is the force applied (the sum of hydrophobicelectrostatic and thermal forces) 120579 is the calculated anglechange and 1199030 is the average lever arm length of the proteinInitially the energy swings are large and random but as thefolding proceeds these energy fluctuations are diminishedSince the folding is conducted at room temperature completecessation of motion does not occur

5 Result

A highly accurate secondary structure method has beendescribed The presented physical model predicts secondary

structure at least as well as advanced statistical basedmethodsrequiring known template structures The extension of themodel to tertiary structure dynamics with near state-of-the-art accuracy has also been described It was found thatthermal and repulsive electrostatic forces are sufficient toprevent unrealistic protein collapse

This high-speed physical model provides folding tra-jectory in real time with sensitivity to the environmentThereby doors to faster identification of function and agreater understanding of biological pathways are openedWhile some statisticalmethods canmatch both the speed andaccuracy of the kineticmodel it is important to recognize thatthe relating of structure to function can only be guided by anaccurate physical description of the forces that shape proteins

8 BioMed Research International

Torque forces

Pivotbond

applied to lever arms

120579

R

120579 asymp120583120513120601energy 120591

r

Figure 4 Illustrated is the angle rotation (120579 or 119877) about analpha carbon backbone bond pair induced by the resolved torqueoperating on the bond In this figure each circle represents 119862120572 atomfor each amino acid and big circles illustrate side chains which havestrong charges andor large hydrophobic forces

Table 3 RMSD result for tertiary structures

Proteins No RES Score for 2D RMSD of 3D2MHU 30 733 31VII 35 722 481CBH 36 753 433RNT 54 6165 411DUR 55 7445 371OVO 56 539 581BW6 56 70 612NMQ 57 6418 582UTG 70 6285 381UBQ 76 7632 542PCY 99 611 45CYT 103 6894 477RSA 124 737 511PAB 127 7404 422CCY 128 8156 632SNS 149 5402 671AAQ 157 73 62AVG 8306 6994 494A tertiary structure prediction result based on the kinetic model RMSDvalues are based on each backbone atom position comparison for tertiarystructure prediction Also a score of 2D means the matching percentage forthe secondary structure determination

6 Discussion and Conclusion

The era where protein folding can be tracked as a functionof ambient condition without templates or other a prioriknowledge has begun The accuracy of the presented all-physical model rivals the best statistical methods in sec-ondary structure and secondary core structure prediction

The fine-tuning of a secondary structure algorithm can beimproved by controlling some environment conditions forinstance pH Tertiary structure predictions do not match thebest statistical andor the best molecular dynamics models(when provided with suitable templates) However the accu-racy is amongst the best ab initiomethods Further increasedaccuracy (2sim3) could be obtained using a finishing MD-based energy relaxation step

For example the Villin headpiece a well-studied mod-erately small protein has been studied as a function oftemperature Hansmann and coworkers [24ndash26] achievedvery small RSMD values in the range of sim30 A (sim18 A for thecore region and 37 A for the entire protein) when comparedwith the NMR determined structure for this protein Incomparison the all-physical model presented here achievedRMSD values only in the 37 A range

It is also important to recognize that no model canexceed the accuracy of the measurements used to determinethe experimental structure The Villin headpiece proteinexperiment has a sim18 A inherent uncertainty It is oftenchallenging to obtain all of the ambient conditions (includingtemperature pH and process steps that may alter residuecharge) used for a given experimental structure determi-nation On the other hand the model can be used totrack structure change due to small changes in ambient(eg pH or temperature) Some MD-based modelings avoidthis problem by using a seed or template protein structureobtained under similar experimental conditions Howeversuch proceduremay restrict themodeling applications in realworld situations The kinetic model is expected to continueto improve and become a useful tool for the investigationof protein structure especially where there is little a prioristructure knowledge a need to elucidate the folding pathwayandor required high speed

Owing to the speed of the physical model the proteinfolding dynamic can be traced Here the protein energy isseen to randomly vary initially folioed by secondary for-mation and finally convergence to a more definite structurewith decreasing energy variations as folding precedes Inagreement with experiment the simulations tend towarddefinite structure but do not reach complete stasis

Alternative high-speed computational methods such asEVFold [27] enablewidespread access to fast prediction basedon statistical analysis of homologous sequences Even thoughthe two approaches are based on very different paradigmstogether and with other contributions there is an emergingand improving ability to predict protein structures withoutthe need to become experts of a particular approach

To illustrate guidance provided by the physical modelconsider the mechanisms by which the chaperone-subunitcomplex of a diverse group of Gram-negative bacteria [28]gains passage through the Papc usher channel at the outermembrane Here an approaching chaperone complex trig-gers a response by which outer membrane usher passageunblocks Applying the discussed concepts to the block-ing structure it is recognized that the highly hydrophobicblocking structure (since it is comprised of alpha helices)always moves towards lower electric field (second term onthe right of (8)) The usher channel interior (beta barrel) is

BioMed Research International 9

Start

Secondary structure generation

Assign secondary structure properties

to C120572-atom (charges

hydrophobicity)

Structural forces

calculation

Calculate potential angle

to rotate

Bond rotation

Check global entropy based on

volume

End

Figure 5 Illustrated is a simplified flow chart of the Markov simulation used in physical-kinetic simulation for tertiary structure

Figure 6 Comparison of a tertiary structure of Villin (1VII) asdetermined by experiment (NMR structure blue) with a modelgenerated structure (simulation result black)

highly charged giving rise to the possibility that the blockingstructure is moved to the side of the channel where theelectrical field is smallest and whenever an approachingcharged structure in combination with the channel chargeproduces an appropriate small combined electric field Suchinsights may lead to new strategies for drug delivery

Acknowledgments

The authors thank John Coleman Carlos Simmerling andProfessor Tanassi for many useful discussions The authorsalso thank Solar Physics Inc NYS Sensor Center for

0

1000

2000

3000

4000

5000

6000Free energytime

ldquoFree energytimerdquo

1 101 201 301 401 501 601 701 801 901 1001 1101 1201 1301

Figure 7 Free energy trajectory (calMol) during Villin simulation

Advanced Technology and Idalia Solar Technologies forpartial support of this work and Marc Dee for his assistancewith paper preparation

References

[1] K A Dill S B Ozkan T R Weikl J D Chodera and V AVoelz ldquoThe protein folding problem when will it be solvedrdquoCurrent Opinion in Structural Biology vol 17 no 3 pp 342ndash3462007

[2] A C Clark ldquoProtein folding are we there yetrdquo Archives ofBiochemistry and Biophysics vol 469 no 1 pp 1ndash3 2008

10 BioMed Research International

[3] C Hardin T V Pogorelov and Z Luthey-Schulten ldquoAb initioprotein structure predictionrdquo Current Opinion in StructuralBiology vol 12 no 2 pp 176ndash181 2002

[4] httppredictioncenterorgcasp9[5] C R Barrett W D Nix Tetelman and S Alan Principles of

Engineering Materials Prentice Hall New York NY USA 1973[6] J W Moore and R G Pearson Kinetics and Mechanisms John

Wiley amp Sons New York NY USA 3rd edition 1981[7] Y Kang E Jaen and C M Fortmann ldquoEinstein relations for

energy coupled particle systemsrdquoApplied Physics Letters vol 88no 11 Article ID 112110 2006

[8] Y Kang and C M Fortmann ldquoA structural basis for theHodgkin and Huxley relationrdquo Applied Physics Letters vol 91no 22 Article ID 223903 2007

[9] D Bashford and D A Case ldquoGeneralized born models ofmacromolecular solvation effectsrdquo Annual Review of PhysicalChemistry vol 51 pp 129ndash152 2000

[10] J D Jackson Classical Electrodynamics Wiley amp Sons NewYork NY USA 3rd edition 1999

[11] M E Glicksman Di Usion in Solids John Wiley amp Sons NewYork NY USA 2000

[12] Z Peng and L C Wu ldquoAutonomous protein folding unitsrdquoAdvanced in Protein Chemistry vol 53 pp 1ndash30 2000

[13] P Y Chou and G D Fasman ldquoPrediction of protein conforma-tionrdquo Biochemistry vol 13 no 2 pp 222ndash245 1974

[14] V I Lim ldquoStructural principles of the globular Establishinghomologies in protein sequencesrdquo Journal of Molecular Biologyvol 88 pp 857ndash873 1984

[15] J Garnier D J Osguthorpe and B Robson ldquoAnalysis of theaccuracy and implications of simple methods for predicting thesecondary structure of globular proteinsrdquo Journal of MolecularBiology vol 120 no 1 pp 97ndash120 1978

[16] G Deleage and B Roux ldquoAn algorithm for protein secondarystructure prediction based on class predictionrdquo Protein Engi-neering vol 1 no 4 pp 289ndash294 1987

[17] S R Presnell B I Cohen and F E Cohen ldquoA segment-based approach to protein secondary structure predictionrdquoBiochemistry vol 31 no 4 pp 983ndash993 1992

[18] H H L Howard Holley l and M Karplus ldquoProtein secondarystructure prediction with a neural networkrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 86 no 1 pp 152ndash156 1989

[19] R A CopelandMethods for Protein Analysis A practical Guideto laboratory protocols Chapman amp Hall New York NY USA1994

[20] MOE Chemical Computing Group 1010 Sherbrooke St WSuite 910 Montreal Quebec Canada G3A 2R7 httpwwwchemcompcom

[21] MGuo PMGormanM Rico A Chakrabartty andDV Lau-rents ldquoCharge substitution shows that repulsive electrostaticinteractions impede the oligomerization of Alzheimer amyloidpeptidesrdquo FEBS Letters vol 579 no 17 pp 3574ndash3578 2005

[22] B Rost and C Sander ldquoPrediction of protein secondary struc-ture at better than 70 accuracyrdquo Journal of Molecular Biologyvol 232 no 2 pp 584ndash599 1993

[23] httpbioinfcsuclacukpsipred[24] C-Y Lin C-K Hu and U H E Hansmann ldquoParallel temper-

ing simulations of HP-36rdquo Proteins vol 52 no 3 pp 436ndash4452003

[25] B S Kinnear M F Jarrold and U H E Hansmann ldquoAll-atomgeneralized-ensemble simulations of small proteinsrdquo Journal ofMolecular Graphics and Modelling vol 22 no 5 pp 397ndash4032004

[26] W Kwak andU H E Hansmann ldquoEfficient sampling of proteinstructures by model hoppingrdquo Physical Review Letters vol 95no 13 Article ID 138102 4 pages 2005

[27] D S Marks L J Colwell R Sheridan et al ldquoProtein 3D struc-ture computed from evolutionary sequence variationrdquo PLoSONE vol 6 no 12 Article ID e28766 2011

[28] H Remaut C Tang N S Henderson et al ldquoFiber formationacross the bacterial outer membrane by the chaperoneusherpathwayrdquo Cell vol 133 no 4 pp 640ndash652 2008

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 7: Research Article An Alternative Approach to Protein Foldingdownloads.hindawi.com/journals/bmri/2013/583045.pdf · An Alternative Approach to Protein Folding ... e electrostatic force

BioMed Research International 7

0

10

20

30

40

50

60

70

80

90

0 50 100 150 200 250 300 350 400 450

Total structure prediction values

ldquoTotalrdquo

Number of residue

Pred

ictio

n (

)

(a)

0

20

40

60

80

100

120

0 50 100 150 200 250 300 350 400 450

Alpha structure prediction

ldquoAlpha helixrdquo

Number of residue

Pred

ictio

n (

)

(b)

0

20

40

60

80

100

120

0 50 100 150 200 250 300 350 400 450

Beta sheet structure prediction

Beta sheet

Number of residue

Pred

ictio

n (

)

(c)

Figure 3 The accuracy of the drift-diffusion model secondary prediction accuracy (percent of proteins tested) versus the size in number ofresidues (a) shows the accuracy for total protein core structures versus protein size (b) shows the accuracy of core alpha helix versus proteinsize and (c) shows the accuracy versus protein size Also shown in each case is the least mean square best fit to the data sets Data base sizesim100 proteins

energy and subtracting the work done 119865 sdot 120579 sdot 1199030 in each timestep where 119865 is the force applied (the sum of hydrophobicelectrostatic and thermal forces) 120579 is the calculated anglechange and 1199030 is the average lever arm length of the proteinInitially the energy swings are large and random but as thefolding proceeds these energy fluctuations are diminishedSince the folding is conducted at room temperature completecessation of motion does not occur

5 Result

A highly accurate secondary structure method has beendescribed The presented physical model predicts secondary

structure at least as well as advanced statistical basedmethodsrequiring known template structures The extension of themodel to tertiary structure dynamics with near state-of-the-art accuracy has also been described It was found thatthermal and repulsive electrostatic forces are sufficient toprevent unrealistic protein collapse

This high-speed physical model provides folding tra-jectory in real time with sensitivity to the environmentThereby doors to faster identification of function and agreater understanding of biological pathways are openedWhile some statisticalmethods canmatch both the speed andaccuracy of the kineticmodel it is important to recognize thatthe relating of structure to function can only be guided by anaccurate physical description of the forces that shape proteins

8 BioMed Research International

Torque forces

Pivotbond

applied to lever arms

120579

R

120579 asymp120583120513120601energy 120591

r

Figure 4 Illustrated is the angle rotation (120579 or 119877) about analpha carbon backbone bond pair induced by the resolved torqueoperating on the bond In this figure each circle represents 119862120572 atomfor each amino acid and big circles illustrate side chains which havestrong charges andor large hydrophobic forces

Table 3 RMSD result for tertiary structures

Proteins No RES Score for 2D RMSD of 3D2MHU 30 733 31VII 35 722 481CBH 36 753 433RNT 54 6165 411DUR 55 7445 371OVO 56 539 581BW6 56 70 612NMQ 57 6418 582UTG 70 6285 381UBQ 76 7632 542PCY 99 611 45CYT 103 6894 477RSA 124 737 511PAB 127 7404 422CCY 128 8156 632SNS 149 5402 671AAQ 157 73 62AVG 8306 6994 494A tertiary structure prediction result based on the kinetic model RMSDvalues are based on each backbone atom position comparison for tertiarystructure prediction Also a score of 2D means the matching percentage forthe secondary structure determination

6 Discussion and Conclusion

The era where protein folding can be tracked as a functionof ambient condition without templates or other a prioriknowledge has begun The accuracy of the presented all-physical model rivals the best statistical methods in sec-ondary structure and secondary core structure prediction

The fine-tuning of a secondary structure algorithm can beimproved by controlling some environment conditions forinstance pH Tertiary structure predictions do not match thebest statistical andor the best molecular dynamics models(when provided with suitable templates) However the accu-racy is amongst the best ab initiomethods Further increasedaccuracy (2sim3) could be obtained using a finishing MD-based energy relaxation step

For example the Villin headpiece a well-studied mod-erately small protein has been studied as a function oftemperature Hansmann and coworkers [24ndash26] achievedvery small RSMD values in the range of sim30 A (sim18 A for thecore region and 37 A for the entire protein) when comparedwith the NMR determined structure for this protein Incomparison the all-physical model presented here achievedRMSD values only in the 37 A range

It is also important to recognize that no model canexceed the accuracy of the measurements used to determinethe experimental structure The Villin headpiece proteinexperiment has a sim18 A inherent uncertainty It is oftenchallenging to obtain all of the ambient conditions (includingtemperature pH and process steps that may alter residuecharge) used for a given experimental structure determi-nation On the other hand the model can be used totrack structure change due to small changes in ambient(eg pH or temperature) Some MD-based modelings avoidthis problem by using a seed or template protein structureobtained under similar experimental conditions Howeversuch proceduremay restrict themodeling applications in realworld situations The kinetic model is expected to continueto improve and become a useful tool for the investigationof protein structure especially where there is little a prioristructure knowledge a need to elucidate the folding pathwayandor required high speed

Owing to the speed of the physical model the proteinfolding dynamic can be traced Here the protein energy isseen to randomly vary initially folioed by secondary for-mation and finally convergence to a more definite structurewith decreasing energy variations as folding precedes Inagreement with experiment the simulations tend towarddefinite structure but do not reach complete stasis

Alternative high-speed computational methods such asEVFold [27] enablewidespread access to fast prediction basedon statistical analysis of homologous sequences Even thoughthe two approaches are based on very different paradigmstogether and with other contributions there is an emergingand improving ability to predict protein structures withoutthe need to become experts of a particular approach

To illustrate guidance provided by the physical modelconsider the mechanisms by which the chaperone-subunitcomplex of a diverse group of Gram-negative bacteria [28]gains passage through the Papc usher channel at the outermembrane Here an approaching chaperone complex trig-gers a response by which outer membrane usher passageunblocks Applying the discussed concepts to the block-ing structure it is recognized that the highly hydrophobicblocking structure (since it is comprised of alpha helices)always moves towards lower electric field (second term onthe right of (8)) The usher channel interior (beta barrel) is

BioMed Research International 9

Start

Secondary structure generation

Assign secondary structure properties

to C120572-atom (charges

hydrophobicity)

Structural forces

calculation

Calculate potential angle

to rotate

Bond rotation

Check global entropy based on

volume

End

Figure 5 Illustrated is a simplified flow chart of the Markov simulation used in physical-kinetic simulation for tertiary structure

Figure 6 Comparison of a tertiary structure of Villin (1VII) asdetermined by experiment (NMR structure blue) with a modelgenerated structure (simulation result black)

highly charged giving rise to the possibility that the blockingstructure is moved to the side of the channel where theelectrical field is smallest and whenever an approachingcharged structure in combination with the channel chargeproduces an appropriate small combined electric field Suchinsights may lead to new strategies for drug delivery

Acknowledgments

The authors thank John Coleman Carlos Simmerling andProfessor Tanassi for many useful discussions The authorsalso thank Solar Physics Inc NYS Sensor Center for

0

1000

2000

3000

4000

5000

6000Free energytime

ldquoFree energytimerdquo

1 101 201 301 401 501 601 701 801 901 1001 1101 1201 1301

Figure 7 Free energy trajectory (calMol) during Villin simulation

Advanced Technology and Idalia Solar Technologies forpartial support of this work and Marc Dee for his assistancewith paper preparation

References

[1] K A Dill S B Ozkan T R Weikl J D Chodera and V AVoelz ldquoThe protein folding problem when will it be solvedrdquoCurrent Opinion in Structural Biology vol 17 no 3 pp 342ndash3462007

[2] A C Clark ldquoProtein folding are we there yetrdquo Archives ofBiochemistry and Biophysics vol 469 no 1 pp 1ndash3 2008

10 BioMed Research International

[3] C Hardin T V Pogorelov and Z Luthey-Schulten ldquoAb initioprotein structure predictionrdquo Current Opinion in StructuralBiology vol 12 no 2 pp 176ndash181 2002

[4] httppredictioncenterorgcasp9[5] C R Barrett W D Nix Tetelman and S Alan Principles of

Engineering Materials Prentice Hall New York NY USA 1973[6] J W Moore and R G Pearson Kinetics and Mechanisms John

Wiley amp Sons New York NY USA 3rd edition 1981[7] Y Kang E Jaen and C M Fortmann ldquoEinstein relations for

energy coupled particle systemsrdquoApplied Physics Letters vol 88no 11 Article ID 112110 2006

[8] Y Kang and C M Fortmann ldquoA structural basis for theHodgkin and Huxley relationrdquo Applied Physics Letters vol 91no 22 Article ID 223903 2007

[9] D Bashford and D A Case ldquoGeneralized born models ofmacromolecular solvation effectsrdquo Annual Review of PhysicalChemistry vol 51 pp 129ndash152 2000

[10] J D Jackson Classical Electrodynamics Wiley amp Sons NewYork NY USA 3rd edition 1999

[11] M E Glicksman Di Usion in Solids John Wiley amp Sons NewYork NY USA 2000

[12] Z Peng and L C Wu ldquoAutonomous protein folding unitsrdquoAdvanced in Protein Chemistry vol 53 pp 1ndash30 2000

[13] P Y Chou and G D Fasman ldquoPrediction of protein conforma-tionrdquo Biochemistry vol 13 no 2 pp 222ndash245 1974

[14] V I Lim ldquoStructural principles of the globular Establishinghomologies in protein sequencesrdquo Journal of Molecular Biologyvol 88 pp 857ndash873 1984

[15] J Garnier D J Osguthorpe and B Robson ldquoAnalysis of theaccuracy and implications of simple methods for predicting thesecondary structure of globular proteinsrdquo Journal of MolecularBiology vol 120 no 1 pp 97ndash120 1978

[16] G Deleage and B Roux ldquoAn algorithm for protein secondarystructure prediction based on class predictionrdquo Protein Engi-neering vol 1 no 4 pp 289ndash294 1987

[17] S R Presnell B I Cohen and F E Cohen ldquoA segment-based approach to protein secondary structure predictionrdquoBiochemistry vol 31 no 4 pp 983ndash993 1992

[18] H H L Howard Holley l and M Karplus ldquoProtein secondarystructure prediction with a neural networkrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 86 no 1 pp 152ndash156 1989

[19] R A CopelandMethods for Protein Analysis A practical Guideto laboratory protocols Chapman amp Hall New York NY USA1994

[20] MOE Chemical Computing Group 1010 Sherbrooke St WSuite 910 Montreal Quebec Canada G3A 2R7 httpwwwchemcompcom

[21] MGuo PMGormanM Rico A Chakrabartty andDV Lau-rents ldquoCharge substitution shows that repulsive electrostaticinteractions impede the oligomerization of Alzheimer amyloidpeptidesrdquo FEBS Letters vol 579 no 17 pp 3574ndash3578 2005

[22] B Rost and C Sander ldquoPrediction of protein secondary struc-ture at better than 70 accuracyrdquo Journal of Molecular Biologyvol 232 no 2 pp 584ndash599 1993

[23] httpbioinfcsuclacukpsipred[24] C-Y Lin C-K Hu and U H E Hansmann ldquoParallel temper-

ing simulations of HP-36rdquo Proteins vol 52 no 3 pp 436ndash4452003

[25] B S Kinnear M F Jarrold and U H E Hansmann ldquoAll-atomgeneralized-ensemble simulations of small proteinsrdquo Journal ofMolecular Graphics and Modelling vol 22 no 5 pp 397ndash4032004

[26] W Kwak andU H E Hansmann ldquoEfficient sampling of proteinstructures by model hoppingrdquo Physical Review Letters vol 95no 13 Article ID 138102 4 pages 2005

[27] D S Marks L J Colwell R Sheridan et al ldquoProtein 3D struc-ture computed from evolutionary sequence variationrdquo PLoSONE vol 6 no 12 Article ID e28766 2011

[28] H Remaut C Tang N S Henderson et al ldquoFiber formationacross the bacterial outer membrane by the chaperoneusherpathwayrdquo Cell vol 133 no 4 pp 640ndash652 2008

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 8: Research Article An Alternative Approach to Protein Foldingdownloads.hindawi.com/journals/bmri/2013/583045.pdf · An Alternative Approach to Protein Folding ... e electrostatic force

8 BioMed Research International

Torque forces

Pivotbond

applied to lever arms

120579

R

120579 asymp120583120513120601energy 120591

r

Figure 4 Illustrated is the angle rotation (120579 or 119877) about analpha carbon backbone bond pair induced by the resolved torqueoperating on the bond In this figure each circle represents 119862120572 atomfor each amino acid and big circles illustrate side chains which havestrong charges andor large hydrophobic forces

Table 3 RMSD result for tertiary structures

Proteins No RES Score for 2D RMSD of 3D2MHU 30 733 31VII 35 722 481CBH 36 753 433RNT 54 6165 411DUR 55 7445 371OVO 56 539 581BW6 56 70 612NMQ 57 6418 582UTG 70 6285 381UBQ 76 7632 542PCY 99 611 45CYT 103 6894 477RSA 124 737 511PAB 127 7404 422CCY 128 8156 632SNS 149 5402 671AAQ 157 73 62AVG 8306 6994 494A tertiary structure prediction result based on the kinetic model RMSDvalues are based on each backbone atom position comparison for tertiarystructure prediction Also a score of 2D means the matching percentage forthe secondary structure determination

6 Discussion and Conclusion

The era where protein folding can be tracked as a functionof ambient condition without templates or other a prioriknowledge has begun The accuracy of the presented all-physical model rivals the best statistical methods in sec-ondary structure and secondary core structure prediction

The fine-tuning of a secondary structure algorithm can beimproved by controlling some environment conditions forinstance pH Tertiary structure predictions do not match thebest statistical andor the best molecular dynamics models(when provided with suitable templates) However the accu-racy is amongst the best ab initiomethods Further increasedaccuracy (2sim3) could be obtained using a finishing MD-based energy relaxation step

For example the Villin headpiece a well-studied mod-erately small protein has been studied as a function oftemperature Hansmann and coworkers [24ndash26] achievedvery small RSMD values in the range of sim30 A (sim18 A for thecore region and 37 A for the entire protein) when comparedwith the NMR determined structure for this protein Incomparison the all-physical model presented here achievedRMSD values only in the 37 A range

It is also important to recognize that no model canexceed the accuracy of the measurements used to determinethe experimental structure The Villin headpiece proteinexperiment has a sim18 A inherent uncertainty It is oftenchallenging to obtain all of the ambient conditions (includingtemperature pH and process steps that may alter residuecharge) used for a given experimental structure determi-nation On the other hand the model can be used totrack structure change due to small changes in ambient(eg pH or temperature) Some MD-based modelings avoidthis problem by using a seed or template protein structureobtained under similar experimental conditions Howeversuch proceduremay restrict themodeling applications in realworld situations The kinetic model is expected to continueto improve and become a useful tool for the investigationof protein structure especially where there is little a prioristructure knowledge a need to elucidate the folding pathwayandor required high speed

Owing to the speed of the physical model the proteinfolding dynamic can be traced Here the protein energy isseen to randomly vary initially folioed by secondary for-mation and finally convergence to a more definite structurewith decreasing energy variations as folding precedes Inagreement with experiment the simulations tend towarddefinite structure but do not reach complete stasis

Alternative high-speed computational methods such asEVFold [27] enablewidespread access to fast prediction basedon statistical analysis of homologous sequences Even thoughthe two approaches are based on very different paradigmstogether and with other contributions there is an emergingand improving ability to predict protein structures withoutthe need to become experts of a particular approach

To illustrate guidance provided by the physical modelconsider the mechanisms by which the chaperone-subunitcomplex of a diverse group of Gram-negative bacteria [28]gains passage through the Papc usher channel at the outermembrane Here an approaching chaperone complex trig-gers a response by which outer membrane usher passageunblocks Applying the discussed concepts to the block-ing structure it is recognized that the highly hydrophobicblocking structure (since it is comprised of alpha helices)always moves towards lower electric field (second term onthe right of (8)) The usher channel interior (beta barrel) is

BioMed Research International 9

Start

Secondary structure generation

Assign secondary structure properties

to C120572-atom (charges

hydrophobicity)

Structural forces

calculation

Calculate potential angle

to rotate

Bond rotation

Check global entropy based on

volume

End

Figure 5 Illustrated is a simplified flow chart of the Markov simulation used in physical-kinetic simulation for tertiary structure

Figure 6 Comparison of a tertiary structure of Villin (1VII) asdetermined by experiment (NMR structure blue) with a modelgenerated structure (simulation result black)

highly charged giving rise to the possibility that the blockingstructure is moved to the side of the channel where theelectrical field is smallest and whenever an approachingcharged structure in combination with the channel chargeproduces an appropriate small combined electric field Suchinsights may lead to new strategies for drug delivery

Acknowledgments

The authors thank John Coleman Carlos Simmerling andProfessor Tanassi for many useful discussions The authorsalso thank Solar Physics Inc NYS Sensor Center for

0

1000

2000

3000

4000

5000

6000Free energytime

ldquoFree energytimerdquo

1 101 201 301 401 501 601 701 801 901 1001 1101 1201 1301

Figure 7 Free energy trajectory (calMol) during Villin simulation

Advanced Technology and Idalia Solar Technologies forpartial support of this work and Marc Dee for his assistancewith paper preparation

References

[1] K A Dill S B Ozkan T R Weikl J D Chodera and V AVoelz ldquoThe protein folding problem when will it be solvedrdquoCurrent Opinion in Structural Biology vol 17 no 3 pp 342ndash3462007

[2] A C Clark ldquoProtein folding are we there yetrdquo Archives ofBiochemistry and Biophysics vol 469 no 1 pp 1ndash3 2008

10 BioMed Research International

[3] C Hardin T V Pogorelov and Z Luthey-Schulten ldquoAb initioprotein structure predictionrdquo Current Opinion in StructuralBiology vol 12 no 2 pp 176ndash181 2002

[4] httppredictioncenterorgcasp9[5] C R Barrett W D Nix Tetelman and S Alan Principles of

Engineering Materials Prentice Hall New York NY USA 1973[6] J W Moore and R G Pearson Kinetics and Mechanisms John

Wiley amp Sons New York NY USA 3rd edition 1981[7] Y Kang E Jaen and C M Fortmann ldquoEinstein relations for

energy coupled particle systemsrdquoApplied Physics Letters vol 88no 11 Article ID 112110 2006

[8] Y Kang and C M Fortmann ldquoA structural basis for theHodgkin and Huxley relationrdquo Applied Physics Letters vol 91no 22 Article ID 223903 2007

[9] D Bashford and D A Case ldquoGeneralized born models ofmacromolecular solvation effectsrdquo Annual Review of PhysicalChemistry vol 51 pp 129ndash152 2000

[10] J D Jackson Classical Electrodynamics Wiley amp Sons NewYork NY USA 3rd edition 1999

[11] M E Glicksman Di Usion in Solids John Wiley amp Sons NewYork NY USA 2000

[12] Z Peng and L C Wu ldquoAutonomous protein folding unitsrdquoAdvanced in Protein Chemistry vol 53 pp 1ndash30 2000

[13] P Y Chou and G D Fasman ldquoPrediction of protein conforma-tionrdquo Biochemistry vol 13 no 2 pp 222ndash245 1974

[14] V I Lim ldquoStructural principles of the globular Establishinghomologies in protein sequencesrdquo Journal of Molecular Biologyvol 88 pp 857ndash873 1984

[15] J Garnier D J Osguthorpe and B Robson ldquoAnalysis of theaccuracy and implications of simple methods for predicting thesecondary structure of globular proteinsrdquo Journal of MolecularBiology vol 120 no 1 pp 97ndash120 1978

[16] G Deleage and B Roux ldquoAn algorithm for protein secondarystructure prediction based on class predictionrdquo Protein Engi-neering vol 1 no 4 pp 289ndash294 1987

[17] S R Presnell B I Cohen and F E Cohen ldquoA segment-based approach to protein secondary structure predictionrdquoBiochemistry vol 31 no 4 pp 983ndash993 1992

[18] H H L Howard Holley l and M Karplus ldquoProtein secondarystructure prediction with a neural networkrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 86 no 1 pp 152ndash156 1989

[19] R A CopelandMethods for Protein Analysis A practical Guideto laboratory protocols Chapman amp Hall New York NY USA1994

[20] MOE Chemical Computing Group 1010 Sherbrooke St WSuite 910 Montreal Quebec Canada G3A 2R7 httpwwwchemcompcom

[21] MGuo PMGormanM Rico A Chakrabartty andDV Lau-rents ldquoCharge substitution shows that repulsive electrostaticinteractions impede the oligomerization of Alzheimer amyloidpeptidesrdquo FEBS Letters vol 579 no 17 pp 3574ndash3578 2005

[22] B Rost and C Sander ldquoPrediction of protein secondary struc-ture at better than 70 accuracyrdquo Journal of Molecular Biologyvol 232 no 2 pp 584ndash599 1993

[23] httpbioinfcsuclacukpsipred[24] C-Y Lin C-K Hu and U H E Hansmann ldquoParallel temper-

ing simulations of HP-36rdquo Proteins vol 52 no 3 pp 436ndash4452003

[25] B S Kinnear M F Jarrold and U H E Hansmann ldquoAll-atomgeneralized-ensemble simulations of small proteinsrdquo Journal ofMolecular Graphics and Modelling vol 22 no 5 pp 397ndash4032004

[26] W Kwak andU H E Hansmann ldquoEfficient sampling of proteinstructures by model hoppingrdquo Physical Review Letters vol 95no 13 Article ID 138102 4 pages 2005

[27] D S Marks L J Colwell R Sheridan et al ldquoProtein 3D struc-ture computed from evolutionary sequence variationrdquo PLoSONE vol 6 no 12 Article ID e28766 2011

[28] H Remaut C Tang N S Henderson et al ldquoFiber formationacross the bacterial outer membrane by the chaperoneusherpathwayrdquo Cell vol 133 no 4 pp 640ndash652 2008

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 9: Research Article An Alternative Approach to Protein Foldingdownloads.hindawi.com/journals/bmri/2013/583045.pdf · An Alternative Approach to Protein Folding ... e electrostatic force

BioMed Research International 9

Start

Secondary structure generation

Assign secondary structure properties

to C120572-atom (charges

hydrophobicity)

Structural forces

calculation

Calculate potential angle

to rotate

Bond rotation

Check global entropy based on

volume

End

Figure 5 Illustrated is a simplified flow chart of the Markov simulation used in physical-kinetic simulation for tertiary structure

Figure 6 Comparison of a tertiary structure of Villin (1VII) asdetermined by experiment (NMR structure blue) with a modelgenerated structure (simulation result black)

highly charged giving rise to the possibility that the blockingstructure is moved to the side of the channel where theelectrical field is smallest and whenever an approachingcharged structure in combination with the channel chargeproduces an appropriate small combined electric field Suchinsights may lead to new strategies for drug delivery

Acknowledgments

The authors thank John Coleman Carlos Simmerling andProfessor Tanassi for many useful discussions The authorsalso thank Solar Physics Inc NYS Sensor Center for

0

1000

2000

3000

4000

5000

6000Free energytime

ldquoFree energytimerdquo

1 101 201 301 401 501 601 701 801 901 1001 1101 1201 1301

Figure 7 Free energy trajectory (calMol) during Villin simulation

Advanced Technology and Idalia Solar Technologies forpartial support of this work and Marc Dee for his assistancewith paper preparation

References

[1] K A Dill S B Ozkan T R Weikl J D Chodera and V AVoelz ldquoThe protein folding problem when will it be solvedrdquoCurrent Opinion in Structural Biology vol 17 no 3 pp 342ndash3462007

[2] A C Clark ldquoProtein folding are we there yetrdquo Archives ofBiochemistry and Biophysics vol 469 no 1 pp 1ndash3 2008

10 BioMed Research International

[3] C Hardin T V Pogorelov and Z Luthey-Schulten ldquoAb initioprotein structure predictionrdquo Current Opinion in StructuralBiology vol 12 no 2 pp 176ndash181 2002

[4] httppredictioncenterorgcasp9[5] C R Barrett W D Nix Tetelman and S Alan Principles of

Engineering Materials Prentice Hall New York NY USA 1973[6] J W Moore and R G Pearson Kinetics and Mechanisms John

Wiley amp Sons New York NY USA 3rd edition 1981[7] Y Kang E Jaen and C M Fortmann ldquoEinstein relations for

energy coupled particle systemsrdquoApplied Physics Letters vol 88no 11 Article ID 112110 2006

[8] Y Kang and C M Fortmann ldquoA structural basis for theHodgkin and Huxley relationrdquo Applied Physics Letters vol 91no 22 Article ID 223903 2007

[9] D Bashford and D A Case ldquoGeneralized born models ofmacromolecular solvation effectsrdquo Annual Review of PhysicalChemistry vol 51 pp 129ndash152 2000

[10] J D Jackson Classical Electrodynamics Wiley amp Sons NewYork NY USA 3rd edition 1999

[11] M E Glicksman Di Usion in Solids John Wiley amp Sons NewYork NY USA 2000

[12] Z Peng and L C Wu ldquoAutonomous protein folding unitsrdquoAdvanced in Protein Chemistry vol 53 pp 1ndash30 2000

[13] P Y Chou and G D Fasman ldquoPrediction of protein conforma-tionrdquo Biochemistry vol 13 no 2 pp 222ndash245 1974

[14] V I Lim ldquoStructural principles of the globular Establishinghomologies in protein sequencesrdquo Journal of Molecular Biologyvol 88 pp 857ndash873 1984

[15] J Garnier D J Osguthorpe and B Robson ldquoAnalysis of theaccuracy and implications of simple methods for predicting thesecondary structure of globular proteinsrdquo Journal of MolecularBiology vol 120 no 1 pp 97ndash120 1978

[16] G Deleage and B Roux ldquoAn algorithm for protein secondarystructure prediction based on class predictionrdquo Protein Engi-neering vol 1 no 4 pp 289ndash294 1987

[17] S R Presnell B I Cohen and F E Cohen ldquoA segment-based approach to protein secondary structure predictionrdquoBiochemistry vol 31 no 4 pp 983ndash993 1992

[18] H H L Howard Holley l and M Karplus ldquoProtein secondarystructure prediction with a neural networkrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 86 no 1 pp 152ndash156 1989

[19] R A CopelandMethods for Protein Analysis A practical Guideto laboratory protocols Chapman amp Hall New York NY USA1994

[20] MOE Chemical Computing Group 1010 Sherbrooke St WSuite 910 Montreal Quebec Canada G3A 2R7 httpwwwchemcompcom

[21] MGuo PMGormanM Rico A Chakrabartty andDV Lau-rents ldquoCharge substitution shows that repulsive electrostaticinteractions impede the oligomerization of Alzheimer amyloidpeptidesrdquo FEBS Letters vol 579 no 17 pp 3574ndash3578 2005

[22] B Rost and C Sander ldquoPrediction of protein secondary struc-ture at better than 70 accuracyrdquo Journal of Molecular Biologyvol 232 no 2 pp 584ndash599 1993

[23] httpbioinfcsuclacukpsipred[24] C-Y Lin C-K Hu and U H E Hansmann ldquoParallel temper-

ing simulations of HP-36rdquo Proteins vol 52 no 3 pp 436ndash4452003

[25] B S Kinnear M F Jarrold and U H E Hansmann ldquoAll-atomgeneralized-ensemble simulations of small proteinsrdquo Journal ofMolecular Graphics and Modelling vol 22 no 5 pp 397ndash4032004

[26] W Kwak andU H E Hansmann ldquoEfficient sampling of proteinstructures by model hoppingrdquo Physical Review Letters vol 95no 13 Article ID 138102 4 pages 2005

[27] D S Marks L J Colwell R Sheridan et al ldquoProtein 3D struc-ture computed from evolutionary sequence variationrdquo PLoSONE vol 6 no 12 Article ID e28766 2011

[28] H Remaut C Tang N S Henderson et al ldquoFiber formationacross the bacterial outer membrane by the chaperoneusherpathwayrdquo Cell vol 133 no 4 pp 640ndash652 2008

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 10: Research Article An Alternative Approach to Protein Foldingdownloads.hindawi.com/journals/bmri/2013/583045.pdf · An Alternative Approach to Protein Folding ... e electrostatic force

10 BioMed Research International

[3] C Hardin T V Pogorelov and Z Luthey-Schulten ldquoAb initioprotein structure predictionrdquo Current Opinion in StructuralBiology vol 12 no 2 pp 176ndash181 2002

[4] httppredictioncenterorgcasp9[5] C R Barrett W D Nix Tetelman and S Alan Principles of

Engineering Materials Prentice Hall New York NY USA 1973[6] J W Moore and R G Pearson Kinetics and Mechanisms John

Wiley amp Sons New York NY USA 3rd edition 1981[7] Y Kang E Jaen and C M Fortmann ldquoEinstein relations for

energy coupled particle systemsrdquoApplied Physics Letters vol 88no 11 Article ID 112110 2006

[8] Y Kang and C M Fortmann ldquoA structural basis for theHodgkin and Huxley relationrdquo Applied Physics Letters vol 91no 22 Article ID 223903 2007

[9] D Bashford and D A Case ldquoGeneralized born models ofmacromolecular solvation effectsrdquo Annual Review of PhysicalChemistry vol 51 pp 129ndash152 2000

[10] J D Jackson Classical Electrodynamics Wiley amp Sons NewYork NY USA 3rd edition 1999

[11] M E Glicksman Di Usion in Solids John Wiley amp Sons NewYork NY USA 2000

[12] Z Peng and L C Wu ldquoAutonomous protein folding unitsrdquoAdvanced in Protein Chemistry vol 53 pp 1ndash30 2000

[13] P Y Chou and G D Fasman ldquoPrediction of protein conforma-tionrdquo Biochemistry vol 13 no 2 pp 222ndash245 1974

[14] V I Lim ldquoStructural principles of the globular Establishinghomologies in protein sequencesrdquo Journal of Molecular Biologyvol 88 pp 857ndash873 1984

[15] J Garnier D J Osguthorpe and B Robson ldquoAnalysis of theaccuracy and implications of simple methods for predicting thesecondary structure of globular proteinsrdquo Journal of MolecularBiology vol 120 no 1 pp 97ndash120 1978

[16] G Deleage and B Roux ldquoAn algorithm for protein secondarystructure prediction based on class predictionrdquo Protein Engi-neering vol 1 no 4 pp 289ndash294 1987

[17] S R Presnell B I Cohen and F E Cohen ldquoA segment-based approach to protein secondary structure predictionrdquoBiochemistry vol 31 no 4 pp 983ndash993 1992

[18] H H L Howard Holley l and M Karplus ldquoProtein secondarystructure prediction with a neural networkrdquo Proceedings of theNational Academy of Sciences of the United States of Americavol 86 no 1 pp 152ndash156 1989

[19] R A CopelandMethods for Protein Analysis A practical Guideto laboratory protocols Chapman amp Hall New York NY USA1994

[20] MOE Chemical Computing Group 1010 Sherbrooke St WSuite 910 Montreal Quebec Canada G3A 2R7 httpwwwchemcompcom

[21] MGuo PMGormanM Rico A Chakrabartty andDV Lau-rents ldquoCharge substitution shows that repulsive electrostaticinteractions impede the oligomerization of Alzheimer amyloidpeptidesrdquo FEBS Letters vol 579 no 17 pp 3574ndash3578 2005

[22] B Rost and C Sander ldquoPrediction of protein secondary struc-ture at better than 70 accuracyrdquo Journal of Molecular Biologyvol 232 no 2 pp 584ndash599 1993

[23] httpbioinfcsuclacukpsipred[24] C-Y Lin C-K Hu and U H E Hansmann ldquoParallel temper-

ing simulations of HP-36rdquo Proteins vol 52 no 3 pp 436ndash4452003

[25] B S Kinnear M F Jarrold and U H E Hansmann ldquoAll-atomgeneralized-ensemble simulations of small proteinsrdquo Journal ofMolecular Graphics and Modelling vol 22 no 5 pp 397ndash4032004

[26] W Kwak andU H E Hansmann ldquoEfficient sampling of proteinstructures by model hoppingrdquo Physical Review Letters vol 95no 13 Article ID 138102 4 pages 2005

[27] D S Marks L J Colwell R Sheridan et al ldquoProtein 3D struc-ture computed from evolutionary sequence variationrdquo PLoSONE vol 6 no 12 Article ID e28766 2011

[28] H Remaut C Tang N S Henderson et al ldquoFiber formationacross the bacterial outer membrane by the chaperoneusherpathwayrdquo Cell vol 133 no 4 pp 640ndash652 2008

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 11: Research Article An Alternative Approach to Protein Foldingdownloads.hindawi.com/journals/bmri/2013/583045.pdf · An Alternative Approach to Protein Folding ... e electrostatic force

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology