The folding network of villin headpiece subdomain
-
Upload
janna-mcclain -
Category
Documents
-
view
36 -
download
2
description
Transcript of The folding network of villin headpiece subdomain
The folding network of villin headpiece The folding network of villin headpiece subdomainsubdomain
Hongxing LeiHongxing Lei
Beijing Institute of GenomicsBeijing Institute of Genomics
Chinese Academy of SciencesChinese Academy of Sciences
The Protein Folding ProblemThe Protein Folding Problem
?
The importance of protein foldingThe importance of protein folding
Amyloid diseasesAmyloid diseases AlzheimerAlzheimer’’s disease (AD)s disease (AD) ParkinsonParkinson’’s disease (PD)s disease (PD) HuntingtonHuntington’’s diseases disease Prion diseasesPrion diseases Amyotrophic lateral sclerosis (ALS)Amyotrophic lateral sclerosis (ALS)
Protein structure predictionProtein structure prediction Protein designProtein design
unfolded state
formation ofmicrodomains
diffusion and collision ofmicrodomains native state
formation ofa nucleus
collapse
Folding funnelFolding funnel
Onuchic & Wolynes, COSB 2004, 14:70-75
The challenges in all-atom protein foldingThe challenges in all-atom protein folding
Time scaleTime scale Protein folding: Protein folding: secondsseconds Simulation: Simulation: microsecondmicrosecond Gap: Gap: 101066
Solution: Solution: Ultrafast-folding proteins / Ultrafast-folding proteins / SupercomputersSupercomputers
Energetic accuracyEnergetic accuracy ΔΔGGfoldfold ( (a few kcal/mol, hydrogen bonda few kcal/mol, hydrogen bond)) High accuracy of force fieldHigh accuracy of force field
1998: villin headpiece, 36 amino acids,1998: villin headpiece, 36 amino acids, 3+Å3+Å
2002/2003:2002/2003:– trpcage, 20 amino acids,trpcage, 20 amino acids, 1Å 1Å– Villin headpiece by Folding@HomeVillin headpiece by Folding@Home (3.8Å) (3.8Å)– Villin headpiece by Shen et alVillin headpiece by Shen et al (3.0Å) (3.0Å)– BBA5 by Folding@HomeBBA5 by Folding@Home (2.2-2.5Å) (2.2-2.5Å)
Recently (Scheraga and others)Recently (Scheraga and others)– A few small proteinsA few small proteins 2.0-4.0Å 2.0-4.0Å
Ab initioAb initio all-atom protein folding all-atom protein folding
Villin headpiece subdomain (HP35)Villin headpiece subdomain (HP35)
Review of previous workReview of previous work
Best folded structure from Best folded structure from simulationsimulation
Cα RMSD 0.39 Å
Four states from simulationFour states from simulation
Thermodynamic properties from simulationThermodynamic properties from simulation
The folding pathway of HP35The folding pathway of HP35
Results from 10μs simulations
Folding trajectory #1Folding trajectory #1
Segment foldingSegment folding
Population of native hydrogen Population of native hydrogen bondsbonds
0
30
60
90
hyd
rog
en
bo
nd
occ
up
an
cy (
%)
helix I helix II helix III
Folding landscapeFolding landscape
4 8
4
8
RM
SD
of s
egm
ent B
RMSD of segment A
1.000
50.00
100.0
500.0
1000
5000
10000
4.000E4
6.420E4
Free energy landscapeFree energy landscape
0-2.5 us 2.5-5.0 us
5.0-7.5 us 7.5-10.0 us
Top ten clustersTop ten clusters
5.90 Å, 5.90 Å, 12.42%12.42%
6.33 Å, 6.33 Å, 9.79%9.79%
6.13 Å, 5.12%6.13 Å, 5.12% 3.07 Å, 3.47%3.07 Å, 3.47% 1.50 Å, 1.50 Å, 3.21%3.21%
5.87 Å, 3.16%5.87 Å, 3.16% 5.54 Å, 5.54 Å, 2.68%2.68%
5.65 Å, 2.55%5.65 Å, 2.55% 5.85Å, 2.54%5.85Å, 2.54% 6.22 Å, 6.22 Å, 2.34%2.34%
Folding network (RMSD)Folding network (RMSD)
Folding network (Epot)Folding network (Epot)
Scale free propertyScale free property
0.0 0.8 1.6-3
-2
-1
0lo
g1
0(p
(k))
log10(k)
R2 = 0.786
HubsHubs
Degree: 17RMSD-ALL: 5.98 ÅRMSD-CA: 4.27 ÅRMSD-segment A: 3.96 ÅRMSD-segment B: 1.18 ÅRGYR: 10.80 ÅPopulation: 1735
Degree: 45RMSD-ALL: 7.26 ÅRMSD-CA : 5.90 ÅRMSD-segment A : 5.17 ÅRMSD-segment B : 1.63 ÅRGYR : 9.75 ÅPopulation: 124243
Degree: 24RMSD-ALL: 3.75 ÅRMSD-CA : 1.50 ÅRMSD-segment A : 0.36ÅRMSD-segment B : 0.59 ÅRGYR : 10.17 ÅPopulation: 32090
BottlenecksBottlenecks
Betweenness: 2.78RMSD-ALL: 6.24 ÅRMSD-CA : 5.02 ÅRMSD-segment A: 4.40 ÅRMSD-segment B : 1.53 ÅRGYR : 10.86 ÅPopulation : 550
Betweenness: 4.11RMSD-ALL: 6.63 ÅRMSD-CA : 4.03 ÅRMSD-segment A: 4.64 ÅRMSD-segment B : 1.07 ÅRGYR : 11.02 ÅPopulation : 873
Betweenness: 2.95RMSD-ALL: 5.70 ÅRMSD-CA : 4.34 ÅRMSD-segment A: 3.38 ÅRMSD-segment B : 1.34 ÅRGYR : 10.42 ÅPopulation : 237
Folding trajectory #2Folding trajectory #2
Segment foldingSegment folding
Population of native hydrogen bondsPopulation of native hydrogen bonds
0
30
60
90
helix III helix II helix I
hyd
rog
en
bo
nd
occ
up
an
cy (
%)
4 8
4
8
RM
SD
of se
gm
en
t B
RMSD of segment A
1.000
50.00
100.0
500.0
1000
5000
10000
2.000E4
2.130E4
Folding landscapesFolding landscapes
Free energy landscapeFree energy landscape
0-2.5 us 2.5-5.0 us
5.0-7.5 us 7.5-10.0 us
Top ten clustersTop ten clusters
3.19 Å, 8.54%3.19 Å, 8.54% 2.31 Å, 7.25%2.31 Å, 7.25% 1.71 Å, 6.15%1.71 Å, 6.15% 3.433.43Å, 5.17%Å, 5.17% 1.10 Å, 3.56%1.10 Å, 3.56%
6.79 Å, 1.94%6.79 Å, 1.94% 7.38 Å, 1.88%7.38 Å, 1.88% 3.31 Å, 1.84%3.31 Å, 1.84% 6.85 Å, 1.50%6.85 Å, 1.50% 3.88 Å, 1.42%3.88 Å, 1.42%
Folding network (RMSD)Folding network (RMSD)
Folding network (Epot)Folding network (Epot)
Scale free propertyScale free property
0.0 0.7 1.4-3
-2
-1
0
log
(p(k
))
log(k)
R2 = 0.723
HubsHubs
Degree : 36RMSD-ALL: 3.73 ÅRMSD-CA : 1.71 ÅRMSD-segment A: 0.63 ÅRMSD-segment B : 0.69 ÅRGYR : 10.05 ÅPopulation : 61485
Degree : 31RMSD-ALL: 5.99 ÅRMSD-CA : 3.92 ÅRMSD-segment A: 4.13 ÅRMSD-segment B : 0.97 ÅRGYR : 11.50 ÅPopulation : 2689
Degree : 30RMSD-ALL: 6.83 ÅRMSD-CA : 5.83 ÅRMSD-segment A: 4.88 ÅRMSD-segment B : 1.65 ÅRGYR : 9.93 ÅPopulation : 5991
Degree : 22RMSD-ALL: 6.75 ÅRMSD-CA : 5.13 ÅRMSD-segment A: 5.04 ÅRMSD-segment B : 0.61 ÅRGYR : 12.30 ÅPopulation : 2854
BottlenecksBottlenecks
Betweenness: 2.46RMSD-ALL: 7.23 ÅRMSD-CA : 5.80 ÅRMSD-segment A: 5.17 ÅRMSD-segment B : 0.82 ÅRGYR : 10.63 ÅPopulation : 392
Betweenness: 2.27RMSD-ALL: 6.22 ÅRMSD-CA : 4.50 ÅRMSD-segment A: 4.84 ÅRMSD-segment B : 1.82 ÅRGYR : 10.97 ÅPopulation : 890
Betweenness: 2.48RMSD-ALL: 6.62 ÅRMSD-CA : 4.93 ÅRMSD-segment A: 4.50 ÅRMSD-segment B : 1.13 ÅRGYR : 11.43 ÅPopulation : 260
A SCORING FUNCTION A SCORING FUNCTION FOR STRUCTURE FOR STRUCTURE PREDICTIONPREDICTION
SCORING FUNCTIONSSCORING FUNCTIONS Knowledge-based functionsKnowledge-based functions
(well compacted; surface area; contact (well compacted; surface area; contact order)order)
Physics-based functionsPhysics-based functions
(free energy; potential energy; (free energy; potential energy; hydrogen bond energy; VDW energy)hydrogen bond energy; VDW energy)
OUR SCORING OUR SCORING FUNCTIONFUNCTION
F(E)=EF(E)=ESESE + a*E + a*EFFFF + b*E + b*EHBHB
EESESE= the statistical energy= the statistical energy
EEFFFF= the force field physical energy with GB = the force field physical energy with GB solvation modelsolvation model
EEHBHB= the main chain hydrogen bonding energy= the main chain hydrogen bonding energy a= the coefficient of the force field physical a= the coefficient of the force field physical
energy termenergy term b= the coefficient of the main chain hydrogen b= the coefficient of the main chain hydrogen
bonding energy termbonding energy term
DECOY SETSDECOY SETShttp://depts.washington.edu/baker
pg/decoys/
1.1.a wide variety of different a wide variety of different proteins;proteins;
2.2.close to the native structure;close to the native structure;
3.3.produced by a relatively unbiased produced by a relatively unbiased procedureprocedure
Decoy setsDecoy sets
Training sets ( 14 × 100 )Training sets ( 14 × 100 )
Testing sets ( 13 × 100 )Testing sets ( 13 × 100 ) Group a: contain 3-11 acceptable decoysGroup a: contain 3-11 acceptable decoys
Group b: contain at least 93 acceptable Group b: contain at least 93 acceptable decoysdecoys
RMSD <5Å acceptable decoysTotal : 534, 38.14%
Decoy setsDecoy sets
F(E)=F(E)=EESESE + A*E+ A*EFFFF + + B*EB*EHBHB
Scoring Scoring methodmethod
CCCCaveave--
with RMSD (SD)with RMSD (SD)CcCcaveave
-with TM-score -with TM-score (SD)(SD)
NumberNumber
DFIREDFIRE
0.4730.473 (0.312)(0.312) -0.451-0.451 (0.261)(0.261) 9898
RAPDF RAPDF
0.4970.497 (0.203)(0.203) -0.478-0.478 (0.173)(0.173) 9595
DOPE DOPE
0.5200.520 (0.214)(0.214) -0.442-0.442 (0.243)(0.243) 9393
F(E)=F(E)=EESESE + A*E+ A*EFFFF + B*E + B*EHBHB
F(E)=EF(E)=ESESE + A*+ A*EEFFFF + + B*EB*EHBHB
EEFFFF = the force field physical energy with GB = the force field physical energy with GB solvation modelsolvation model
Two protocols:Two protocols:
only a minimization;only a minimization;
after minimization, a 40 ps molecule dynamic after minimization, a 40 ps molecule dynamic run followed by another minimization.run followed by another minimization.
(The results from both protocols are very similar, and therefore, (The results from both protocols are very similar, and therefore, the use of the less time consuming protocol was adopted. )the use of the less time consuming protocol was adopted. )
F(E)=EF(E)=ESESE + A*+ A*EEFFFF + B*E + B*EHBHB
Scoring Scoring methodmethod
CCCCaveave--
with RMSD (SD)with RMSD (SD)CcCcaveave
-with TM-score (SD)-with TM-score (SD)NumberNumber
AMBER99AMBER99
0.1960.196 (0.204)(0.204) -0.216-0.216 (0.243)(0.243) 7777
OPLS-aa OPLS-aa
0.2110.211 (0.241)(0.241) -0.224-0.224 (0.271)(0.271) 7979
CHARMM27 CHARMM27
0.0140.014 (0.216)(0.216) -0.015-0.015 (0.198)(0.198) 5858
Various force fields in TinkerVarious force fields in Tinker
F(E)=EF(E)=ESESE + A*+ A*EEFFFF + B*E + B*EHBHB
Scoring Scoring methodmethod
CCCCaveave--
with RMSD (SD)with RMSD (SD)CcCcaveave
-with TM-score (SD)-with TM-score (SD)NumberNumber
AMBER03AMBER03
0.3130.313 (0.223)(0.223) -0.331-0.331 (0.232)(0.232) 9797
AMBER99 AMBER99
0.2540.254 (0.162)(0.162) -0.272-0.272 (0.146)(0.146) 8686
AMBER99SBAMBER99SB
0.3420.342 (0.162)(0.162) -0.353-0.353 (0.152)(0.152) 9696
AMBER96 AMBER96
0.2930.293 (0.136)(0.136) -0.325-0.325 (0.157)(0.157) 9090
AMBER94 AMBER94
0.2420.242 (0.227)(0.227) -0.261-0.261 (0.206)(0.206) 8282
AMBER force fieldsAMBER force fields
F(E)=EF(E)=ESESE + A*E+ A*EFFFF + + B*B*EEHBHB
Scoring Scoring methodmethod
CCCCaveave--
with RMSD (SD)with RMSD (SD)CcCcaveave
-with TM-score -with TM-score (SD)(SD)
NumberNumber
DSSPDSSP
0.0190.019 (0.328)(0.328) -0.007-0.007 (0.284)(0.284) 5858
ROSETTA ROSETTA
-0.186-0.186 (0.432)(0.432) 0.1030.103 (0.376)(0.376) 3434
Hydrogen bonding energyHydrogen bonding energy
Parameters from grid searchParameters from grid search
A search to get the maximum number of total A search to get the maximum number of total acceptable decoys among the top 10 list.acceptable decoys among the top 10 list.
Both “a” and “b” were from 0 to 0.5.Both “a” and “b” were from 0 to 0.5. The maximum number of total acceptable The maximum number of total acceptable
decoys was found to be 112 out of the 140 decoys was found to be 112 out of the 140 selections (14*10). selections (14*10).
The corresponding parameters are a = 0.12 The corresponding parameters are a = 0.12 and b = 0.06.and b = 0.06.
The overall 80% acceptable decoys are also The overall 80% acceptable decoys are also significantly higher than the 38.1% in the whole significantly higher than the 38.1% in the whole training sets.training sets.
Scoring Scoring methodmethod
CCCCaveave--
with RMSD (SD)with RMSD (SD)CcCcaveave
-with TM-score -with TM-score (SD)(SD)
NumberNumber
F(E)F(E)
0.5380.538 (0.223)(0.223) -0.476-0.476 (0.248)(0.248) 112112
ROSETTA ROSETTA
0.3990.399 (0.293)(0.293) -0.391-0.391 (0.321)(0.321) 9595
Comparison with Rosetta energyComparison with Rosetta energy
Comparison with Rosetta energyComparison with Rosetta energy
Performance on the training Performance on the training setset
RMSD (Å)
Sco
re
(kc
al/m
ol)
Performance on the training Performance on the training setset
RMSD (Å)
Sco
re
(kc
al/m
ol)
Performance on the testing Performance on the testing setset
Performance on the testing setPerformance on the testing set
RMSD (Å)
Sco
re
(kc
al/m
ol)
AcknowledgementsAcknowledgements