Principal Component Analysis in MD Simulation
description
Transcript of Principal Component Analysis in MD Simulation
![Page 1: Principal Component Analysis in MD Simulation](https://reader036.fdocuments.in/reader036/viewer/2022062314/568143db550346895db06919/html5/thumbnails/1.jpg)
Principal Component Analysis in MD Simulation
Speaker: ZHOU Chen-Yang
Supervisor: Wu Yun-Dong
![Page 2: Principal Component Analysis in MD Simulation](https://reader036.fdocuments.in/reader036/viewer/2022062314/568143db550346895db06919/html5/thumbnails/2.jpg)
Methods to analyze MD trajectory
• Intuition-based coordinates– RMSD with respect to native state– Fraction of native contacts – Radius of gyration– Other observables
• Advantage– Easy to understand– Convenient to do
• Disadvantage– Inaccurate– Ineffecctive for non-native structures, or without good
reference structure– Depend on previous knowledge
![Page 3: Principal Component Analysis in MD Simulation](https://reader036.fdocuments.in/reader036/viewer/2022062314/568143db550346895db06919/html5/thumbnails/3.jpg)
How to measure conformational change?
What we have to do:
• Reduce dimension • Trajectory is too complicated• Good projection should be able to seperat of noise and signal
• Classification/Clustering• Classify structures to different states
• Algorithms include:• PCA: Principal Component Analysis• MDS: Multi-Dimensional Scaling
If we already have optimal reaction coordinate
Then we have: free energy landscape,
transition pathway, transition rate ...
But usually we don't, and it doesn't come up automatically
![Page 4: Principal Component Analysis in MD Simulation](https://reader036.fdocuments.in/reader036/viewer/2022062314/568143db550346895db06919/html5/thumbnails/4.jpg)
dPCA vs RMSD
The figure represents the free energy landscape of Trp-zip2 at 300K, using Amber force field 99sb*-ildn. Projected to 2nd principal component and RMSD.
![Page 5: Principal Component Analysis in MD Simulation](https://reader036.fdocuments.in/reader036/viewer/2022062314/568143db550346895db06919/html5/thumbnails/5.jpg)
Genaral description of PCA
• The central idea of PCA is to:– reduce the dimension
– retain the variation
• An example:– (x,y) is a randomly generated
dataset• var(x) = 3.2, var(y) = 2.3
– (x,y) is either centered at (0,0) or at (3,3), which are mixed
– PCA generates new coordinate (x',y'), and x' captures most of the variation
• var(x') = 5.5, var(y') = 0.99
![Page 6: Principal Component Analysis in MD Simulation](https://reader036.fdocuments.in/reader036/viewer/2022062314/568143db550346895db06919/html5/thumbnails/6.jpg)
Key question understanding PCA
• In practice, the principal components (PCs) are some linear combination of original coordinates.
• Suppose we have a set of data containing 2 columns X1 and X2. Now we generate a new column of data Z=a1X1+a2X2, what is the variance of Z?
Variance and covarianceExample: Z=X1+X2
Why is it important? Because we are going to project the data set to a new coordinate Z, and our attemp is to choose a (a1, a2) to maximize the variance of Z.
![Page 7: Principal Component Analysis in MD Simulation](https://reader036.fdocuments.in/reader036/viewer/2022062314/568143db550346895db06919/html5/thumbnails/7.jpg)
Z=a1X1+a2X2:
Represented with matrix multiplication:
Covariance Matrix: Σ Coefficients of original
coordinate in PC, α
var(Z)=Var(αX)=α'Σα
Next step: change ato search the maximum of var(Z)
Z=X1+X2:
![Page 8: Principal Component Analysis in MD Simulation](https://reader036.fdocuments.in/reader036/viewer/2022062314/568143db550346895db06919/html5/thumbnails/8.jpg)
Maximize var(Z)
First, we have to normalize a:
Then, maximize var(Z) is to maximize
Differentiate with respect to a1
l is the eigen value and a1 is the corresponding eigen vector of S
eigen value ploted from large to small
Pick first several eigen vector as PC, or actually the coefficient of PCs. Then project data to PCs, and the simplified data could be further analyzed with orther techniques such as clustering.
![Page 9: Principal Component Analysis in MD Simulation](https://reader036.fdocuments.in/reader036/viewer/2022062314/568143db550346895db06919/html5/thumbnails/9.jpg)
PCA in application: Cartesian coordinates
• Cartesian coordinates contain all the imformation
• But often noisy
cPCA: cartesian PCAuse cartesian coordinate
Mu, Y., Nguyen, P. H., & Stock, G. (2005). Proteins, 58(1), 45–52.
Dashed blue line: Cartesian PCA
Comparison of cPCA and dPCA in the analysis of Ala7 MD simulation
Full red line: PCA using dihedral angle
![Page 10: Principal Component Analysis in MD Simulation](https://reader036.fdocuments.in/reader036/viewer/2022062314/568143db550346895db06919/html5/thumbnails/10.jpg)
PCA in application: cPCA, dPCA and pPCA
Advangtage: 1. reduction of dimensionality2. constraint within coordinateProblem with dihedral: 1. dihedral angle is periodic 2. dihedral angle is not linear
In application, people transform dihedral angle to its sin/cos values to do PCA, called dPCA
![Page 11: Principal Component Analysis in MD Simulation](https://reader036.fdocuments.in/reader036/viewer/2022062314/568143db550346895db06919/html5/thumbnails/11.jpg)
Application of dPCA: (Ab16-22)6
Nguyen, P. H., Li, M. S., Stock, G., Straub, J. E., & Thirumalai, D. (2007). PNAS, 104(1), 111–6.
Free-energy diagram projected onto the first two principal components V1 and V2 of the dPCA forthe hexamer.
![Page 12: Principal Component Analysis in MD Simulation](https://reader036.fdocuments.in/reader036/viewer/2022062314/568143db550346895db06919/html5/thumbnails/12.jpg)
dPCA in RNA analysis: flexible choice of internal coordinates
Riccardi, L., Nguyen, P. H., & Stock, G. (2009). JPCB, 113(52), 16660–8.
![Page 13: Principal Component Analysis in MD Simulation](https://reader036.fdocuments.in/reader036/viewer/2022062314/568143db550346895db06919/html5/thumbnails/13.jpg)
• REMD simulation of a short b-hairpin Trp-zip2 using:– ff99sb-ildn– ff99sb*-ildn– ff99sb-ildn-nmr– ff99C, our modified version of ff99sb-ildn
Using dPCA to compare Trp-zip2 potential energy surface in different force field
![Page 14: Principal Component Analysis in MD Simulation](https://reader036.fdocuments.in/reader036/viewer/2022062314/568143db550346895db06919/html5/thumbnails/14.jpg)
Using dPCA to compare Trp-zip2 potential energy surface in different force field
Free energy landscape of Trp-zip2 at 300K, using Amber force field 99sb*-ildn. Projected to 1st and 2nd principal component, using dPCA of turn region. The reason for the extended energy surface is that it cannot form stable hairpin.
Native like turn
Helical structure
![Page 15: Principal Component Analysis in MD Simulation](https://reader036.fdocuments.in/reader036/viewer/2022062314/568143db550346895db06919/html5/thumbnails/15.jpg)
Using dPCA to compare Trp-zip2 potential energy surface in different force field
The figure represents the free energy landscape of Trp-zip2 at 300K, using Amber force field 99sb-ildn. Projected to 1st and 2nd principal component of 99sb*-ildn, using dPCA of turn region.
Native like turn
Helical structure
![Page 16: Principal Component Analysis in MD Simulation](https://reader036.fdocuments.in/reader036/viewer/2022062314/568143db550346895db06919/html5/thumbnails/16.jpg)
Using dPCA to compare Trp-zip2 potential energy surface in different force field
The figure represents the free energy landscape of Trp-zip2 at 300K, using Amber force field 99sb-ildn-nmr. Projected to 1st and 2nd principal component of 99sb*-ildn, using dPCA of turn region. 99sb-ildn-nmr cannot fold the Trp-zip2 hairpin.
Native like turn
Helical structure
![Page 17: Principal Component Analysis in MD Simulation](https://reader036.fdocuments.in/reader036/viewer/2022062314/568143db550346895db06919/html5/thumbnails/17.jpg)
The figure represents the free energy landscape of Trp-zip2 at 300K, using force field 99C. Projected to 1st and 2nd principal component of 99sb*-ildn, using dPCA of turn region. In our force field, Trp-zip2 form stable beta-turn so that it rarely sample other conformation.
Using dPCA to compare Trp-zip2 potential energy surface in different force field
Native like turn
![Page 18: Principal Component Analysis in MD Simulation](https://reader036.fdocuments.in/reader036/viewer/2022062314/568143db550346895db06919/html5/thumbnails/18.jpg)
Summary
• PCA is a linear transformation of old coordinates to capture maximum variance
• Instead of using Cartesian coordinates, dihedral angles could be a better choice in description of conformational change
• General coordinates or a subset of coordinates (for region of interest) can be used for PCA analysis
• The result of PCA could used for further analysis such as clustering and transition rate calculation.
![Page 19: Principal Component Analysis in MD Simulation](https://reader036.fdocuments.in/reader036/viewer/2022062314/568143db550346895db06919/html5/thumbnails/19.jpg)
Thank you!Thank you!