The folding network of villin headpiece subdomain

Post on 30-Dec-2015

39 views 2 download

Tags:

description

The folding network of villin headpiece subdomain. Hongxing Lei Beijing Institute of Genomics Chinese Academy of Sciences. The Protein Folding Problem. ?. The importance of protein folding. Amyloid diseases Alzheimer ’ s disease (AD) Parkinson ’ s disease (PD) Huntington ’ s disease - PowerPoint PPT Presentation

Transcript of The folding network of villin headpiece subdomain

The folding network of villin headpiece The folding network of villin headpiece subdomainsubdomain

Hongxing LeiHongxing Lei

Beijing Institute of GenomicsBeijing Institute of Genomics

Chinese Academy of SciencesChinese Academy of Sciences

The Protein Folding ProblemThe Protein Folding Problem

?

The importance of protein foldingThe importance of protein folding

Amyloid diseasesAmyloid diseases AlzheimerAlzheimer’’s disease (AD)s disease (AD) ParkinsonParkinson’’s disease (PD)s disease (PD) HuntingtonHuntington’’s diseases disease Prion diseasesPrion diseases Amyotrophic lateral sclerosis (ALS)Amyotrophic lateral sclerosis (ALS)

Protein structure predictionProtein structure prediction Protein designProtein design

unfolded state

formation ofmicrodomains

diffusion and collision ofmicrodomains native state

formation ofa nucleus

collapse

Folding funnelFolding funnel

Onuchic & Wolynes, COSB 2004, 14:70-75

The challenges in all-atom protein foldingThe challenges in all-atom protein folding

Time scaleTime scale Protein folding: Protein folding: secondsseconds Simulation: Simulation: microsecondmicrosecond Gap: Gap: 101066

Solution: Solution: Ultrafast-folding proteins / Ultrafast-folding proteins / SupercomputersSupercomputers

Energetic accuracyEnergetic accuracy ΔΔGGfoldfold ( (a few kcal/mol, hydrogen bonda few kcal/mol, hydrogen bond)) High accuracy of force fieldHigh accuracy of force field

1998: villin headpiece, 36 amino acids,1998: villin headpiece, 36 amino acids, 3+Å3+Å

2002/2003:2002/2003:– trpcage, 20 amino acids,trpcage, 20 amino acids, 1Å 1Å– Villin headpiece by Folding@HomeVillin headpiece by Folding@Home (3.8Å) (3.8Å)– Villin headpiece by Shen et alVillin headpiece by Shen et al (3.0Å) (3.0Å)– BBA5 by Folding@HomeBBA5 by Folding@Home (2.2-2.5Å) (2.2-2.5Å)

Recently (Scheraga and others)Recently (Scheraga and others)– A few small proteinsA few small proteins 2.0-4.0Å 2.0-4.0Å

Ab initioAb initio all-atom protein folding all-atom protein folding

Villin headpiece subdomain (HP35)Villin headpiece subdomain (HP35)

Review of previous workReview of previous work

Best folded structure from Best folded structure from simulationsimulation

Cα RMSD 0.39 Å

Four states from simulationFour states from simulation

Thermodynamic properties from simulationThermodynamic properties from simulation

The folding pathway of HP35The folding pathway of HP35

Results from 10μs simulations

Folding trajectory #1Folding trajectory #1

Segment foldingSegment folding

Population of native hydrogen Population of native hydrogen bondsbonds

0

30

60

90

hyd

rog

en

bo

nd

occ

up

an

cy (

%)

helix I helix II helix III

Folding landscapeFolding landscape

4 8

4

8

RM

SD

of s

egm

ent B

RMSD of segment A

1.000

50.00

100.0

500.0

1000

5000

10000

4.000E4

6.420E4

Free energy landscapeFree energy landscape

0-2.5 us 2.5-5.0 us

5.0-7.5 us 7.5-10.0 us

Top ten clustersTop ten clusters

5.90 Å, 5.90 Å, 12.42%12.42%

6.33 Å, 6.33 Å, 9.79%9.79%

6.13 Å, 5.12%6.13 Å, 5.12% 3.07 Å, 3.47%3.07 Å, 3.47% 1.50 Å, 1.50 Å, 3.21%3.21%

5.87 Å, 3.16%5.87 Å, 3.16% 5.54 Å, 5.54 Å, 2.68%2.68%

5.65 Å, 2.55%5.65 Å, 2.55% 5.85Å, 2.54%5.85Å, 2.54% 6.22 Å, 6.22 Å, 2.34%2.34%

Folding network (RMSD)Folding network (RMSD)

Folding network (Epot)Folding network (Epot)

Scale free propertyScale free property

0.0 0.8 1.6-3

-2

-1

0lo

g1

0(p

(k))

log10(k)

R2 = 0.786

HubsHubs

Degree: 17RMSD-ALL: 5.98 ÅRMSD-CA: 4.27 ÅRMSD-segment A: 3.96 ÅRMSD-segment B: 1.18 ÅRGYR: 10.80 ÅPopulation: 1735

Degree: 45RMSD-ALL: 7.26 ÅRMSD-CA : 5.90 ÅRMSD-segment A : 5.17 ÅRMSD-segment B : 1.63 ÅRGYR : 9.75 ÅPopulation: 124243

Degree: 24RMSD-ALL: 3.75 ÅRMSD-CA : 1.50 ÅRMSD-segment A : 0.36ÅRMSD-segment B : 0.59 ÅRGYR : 10.17 ÅPopulation: 32090

BottlenecksBottlenecks

Betweenness: 2.78RMSD-ALL: 6.24 ÅRMSD-CA : 5.02 ÅRMSD-segment A: 4.40 ÅRMSD-segment B : 1.53 ÅRGYR : 10.86 ÅPopulation : 550

Betweenness: 4.11RMSD-ALL: 6.63 ÅRMSD-CA : 4.03 ÅRMSD-segment A: 4.64 ÅRMSD-segment B : 1.07 ÅRGYR : 11.02 ÅPopulation : 873

Betweenness: 2.95RMSD-ALL: 5.70 ÅRMSD-CA : 4.34 ÅRMSD-segment A: 3.38 ÅRMSD-segment B : 1.34 ÅRGYR : 10.42 ÅPopulation : 237

Folding trajectory #2Folding trajectory #2

Segment foldingSegment folding

Population of native hydrogen bondsPopulation of native hydrogen bonds

0

30

60

90

helix III helix II helix I

hyd

rog

en

bo

nd

occ

up

an

cy (

%)

4 8

4

8

RM

SD

of se

gm

en

t B

RMSD of segment A

1.000

50.00

100.0

500.0

1000

5000

10000

2.000E4

2.130E4

Folding landscapesFolding landscapes

Free energy landscapeFree energy landscape

0-2.5 us 2.5-5.0 us

5.0-7.5 us 7.5-10.0 us

Top ten clustersTop ten clusters

3.19 Å, 8.54%3.19 Å, 8.54% 2.31 Å, 7.25%2.31 Å, 7.25% 1.71 Å, 6.15%1.71 Å, 6.15% 3.433.43Å, 5.17%Å, 5.17% 1.10 Å, 3.56%1.10 Å, 3.56%

6.79 Å, 1.94%6.79 Å, 1.94% 7.38 Å, 1.88%7.38 Å, 1.88% 3.31 Å, 1.84%3.31 Å, 1.84% 6.85 Å, 1.50%6.85 Å, 1.50% 3.88 Å, 1.42%3.88 Å, 1.42%

Folding network (RMSD)Folding network (RMSD)

Folding network (Epot)Folding network (Epot)

Scale free propertyScale free property

0.0 0.7 1.4-3

-2

-1

0

log

(p(k

))

log(k)

R2 = 0.723

HubsHubs

Degree : 36RMSD-ALL: 3.73 ÅRMSD-CA : 1.71 ÅRMSD-segment A: 0.63 ÅRMSD-segment B : 0.69 ÅRGYR : 10.05 ÅPopulation : 61485

Degree : 31RMSD-ALL: 5.99 ÅRMSD-CA : 3.92 ÅRMSD-segment A: 4.13 ÅRMSD-segment B : 0.97 ÅRGYR : 11.50 ÅPopulation : 2689

Degree : 30RMSD-ALL: 6.83 ÅRMSD-CA : 5.83 ÅRMSD-segment A: 4.88 ÅRMSD-segment B : 1.65 ÅRGYR : 9.93 ÅPopulation : 5991

Degree : 22RMSD-ALL: 6.75 ÅRMSD-CA : 5.13 ÅRMSD-segment A: 5.04 ÅRMSD-segment B : 0.61 ÅRGYR : 12.30 ÅPopulation : 2854

BottlenecksBottlenecks

Betweenness: 2.46RMSD-ALL: 7.23 ÅRMSD-CA : 5.80 ÅRMSD-segment A: 5.17 ÅRMSD-segment B : 0.82 ÅRGYR : 10.63 ÅPopulation : 392

Betweenness: 2.27RMSD-ALL: 6.22 ÅRMSD-CA : 4.50 ÅRMSD-segment A: 4.84 ÅRMSD-segment B : 1.82 ÅRGYR : 10.97 ÅPopulation : 890

Betweenness: 2.48RMSD-ALL: 6.62 ÅRMSD-CA : 4.93 ÅRMSD-segment A: 4.50 ÅRMSD-segment B : 1.13 ÅRGYR : 11.43 ÅPopulation : 260

A SCORING FUNCTION A SCORING FUNCTION FOR STRUCTURE FOR STRUCTURE PREDICTIONPREDICTION

SCORING FUNCTIONSSCORING FUNCTIONS Knowledge-based functionsKnowledge-based functions

(well compacted; surface area; contact (well compacted; surface area; contact order)order)

Physics-based functionsPhysics-based functions

(free energy; potential energy; (free energy; potential energy; hydrogen bond energy; VDW energy)hydrogen bond energy; VDW energy)

OUR SCORING OUR SCORING FUNCTIONFUNCTION

F(E)=EF(E)=ESESE + a*E + a*EFFFF + b*E + b*EHBHB

EESESE= the statistical energy= the statistical energy

EEFFFF= the force field physical energy with GB = the force field physical energy with GB solvation modelsolvation model

EEHBHB= the main chain hydrogen bonding energy= the main chain hydrogen bonding energy a= the coefficient of the force field physical a= the coefficient of the force field physical

energy termenergy term b= the coefficient of the main chain hydrogen b= the coefficient of the main chain hydrogen

bonding energy termbonding energy term

DECOY SETSDECOY SETShttp://depts.washington.edu/baker

pg/decoys/

1.1.a wide variety of different a wide variety of different proteins;proteins;

2.2.close to the native structure;close to the native structure;

3.3.produced by a relatively unbiased produced by a relatively unbiased procedureprocedure

Decoy setsDecoy sets

Training sets ( 14 × 100 )Training sets ( 14 × 100 )

Testing sets ( 13 × 100 )Testing sets ( 13 × 100 ) Group a: contain 3-11 acceptable decoysGroup a: contain 3-11 acceptable decoys

Group b: contain at least 93 acceptable Group b: contain at least 93 acceptable decoysdecoys

RMSD <5Å acceptable decoysTotal : 534, 38.14%

Decoy setsDecoy sets

F(E)=F(E)=EESESE + A*E+ A*EFFFF + + B*EB*EHBHB

Scoring Scoring methodmethod

CCCCaveave--

with RMSD (SD)with RMSD (SD)CcCcaveave

-with TM-score -with TM-score (SD)(SD)

NumberNumber

DFIREDFIRE

0.4730.473 (0.312)(0.312) -0.451-0.451 (0.261)(0.261) 9898

RAPDF RAPDF

0.4970.497 (0.203)(0.203) -0.478-0.478 (0.173)(0.173) 9595

DOPE DOPE

0.5200.520 (0.214)(0.214) -0.442-0.442 (0.243)(0.243) 9393

F(E)=F(E)=EESESE + A*E+ A*EFFFF + B*E + B*EHBHB

F(E)=EF(E)=ESESE + A*+ A*EEFFFF + + B*EB*EHBHB

EEFFFF = the force field physical energy with GB = the force field physical energy with GB solvation modelsolvation model

Two protocols:Two protocols:

only a minimization;only a minimization;

after minimization, a 40 ps molecule dynamic after minimization, a 40 ps molecule dynamic run followed by another minimization.run followed by another minimization.

(The results from both protocols are very similar, and therefore, (The results from both protocols are very similar, and therefore, the use of the less time consuming protocol was adopted. )the use of the less time consuming protocol was adopted. )

F(E)=EF(E)=ESESE + A*+ A*EEFFFF + B*E + B*EHBHB

Scoring Scoring methodmethod

CCCCaveave--

with RMSD (SD)with RMSD (SD)CcCcaveave

-with TM-score (SD)-with TM-score (SD)NumberNumber

AMBER99AMBER99

0.1960.196 (0.204)(0.204) -0.216-0.216 (0.243)(0.243) 7777

OPLS-aa OPLS-aa

0.2110.211 (0.241)(0.241) -0.224-0.224 (0.271)(0.271) 7979

CHARMM27 CHARMM27

0.0140.014 (0.216)(0.216) -0.015-0.015 (0.198)(0.198) 5858

Various force fields in TinkerVarious force fields in Tinker

F(E)=EF(E)=ESESE + A*+ A*EEFFFF + B*E + B*EHBHB

Scoring Scoring methodmethod

CCCCaveave--

with RMSD (SD)with RMSD (SD)CcCcaveave

-with TM-score (SD)-with TM-score (SD)NumberNumber

AMBER03AMBER03

0.3130.313 (0.223)(0.223) -0.331-0.331 (0.232)(0.232) 9797

AMBER99 AMBER99

0.2540.254 (0.162)(0.162) -0.272-0.272 (0.146)(0.146) 8686

AMBER99SBAMBER99SB

0.3420.342 (0.162)(0.162) -0.353-0.353 (0.152)(0.152) 9696

AMBER96 AMBER96

0.2930.293 (0.136)(0.136) -0.325-0.325 (0.157)(0.157) 9090

AMBER94 AMBER94

0.2420.242 (0.227)(0.227) -0.261-0.261 (0.206)(0.206) 8282

AMBER force fieldsAMBER force fields

F(E)=EF(E)=ESESE + A*E+ A*EFFFF + + B*B*EEHBHB

Scoring Scoring methodmethod

CCCCaveave--

with RMSD (SD)with RMSD (SD)CcCcaveave

-with TM-score -with TM-score (SD)(SD)

NumberNumber

DSSPDSSP

0.0190.019 (0.328)(0.328) -0.007-0.007 (0.284)(0.284) 5858

ROSETTA ROSETTA

-0.186-0.186 (0.432)(0.432) 0.1030.103 (0.376)(0.376) 3434

Hydrogen bonding energyHydrogen bonding energy

Parameters from grid searchParameters from grid search

A search to get the maximum number of total A search to get the maximum number of total acceptable decoys among the top 10 list.acceptable decoys among the top 10 list.

Both “a” and “b” were from 0 to 0.5.Both “a” and “b” were from 0 to 0.5. The maximum number of total acceptable The maximum number of total acceptable

decoys was found to be 112 out of the 140 decoys was found to be 112 out of the 140 selections (14*10). selections (14*10).

The corresponding parameters are a = 0.12 The corresponding parameters are a = 0.12 and b = 0.06.and b = 0.06.

The overall 80% acceptable decoys are also The overall 80% acceptable decoys are also significantly higher than the 38.1% in the whole significantly higher than the 38.1% in the whole training sets.training sets.

Scoring Scoring methodmethod

CCCCaveave--

with RMSD (SD)with RMSD (SD)CcCcaveave

-with TM-score -with TM-score (SD)(SD)

NumberNumber

F(E)F(E)

0.5380.538 (0.223)(0.223) -0.476-0.476 (0.248)(0.248) 112112

ROSETTA ROSETTA

0.3990.399 (0.293)(0.293) -0.391-0.391 (0.321)(0.321) 9595

Comparison with Rosetta energyComparison with Rosetta energy

Comparison with Rosetta energyComparison with Rosetta energy

Performance on the training Performance on the training setset

RMSD (Å)

Sco

re

(kc

al/m

ol)

Performance on the training Performance on the training setset

RMSD (Å)

Sco

re

(kc

al/m

ol)

Performance on the testing Performance on the testing setset

Performance on the testing setPerformance on the testing set

RMSD (Å)

Sco

re

(kc

al/m

ol)

AcknowledgementsAcknowledgements