Folding @ Home - Distributed Parallel Protein folding Chris Garlock.

11
Folding @ Home - Distributed Parallel Protein folding Chris Garlock

Transcript of Folding @ Home - Distributed Parallel Protein folding Chris Garlock.

Page 1: Folding @ Home - Distributed Parallel Protein folding Chris Garlock.

Folding @ Home - Distributed Parallel

Protein folding

Chris Garlock

Page 2: Folding @ Home - Distributed Parallel Protein folding Chris Garlock.

Protein Folding - Why is it important- Proteins are biological nano-

machines which play apart in all of our bodies functions

- Protein folding is the process all proteins undergo to assemble into their native structure

- Strive for states of low free energy

- Sometimes proteins misfold, and misfolded proteins can clump together or aggregate, which can cause serious health problems

- Alzheimers, Cystic Fibrosis, Several types of cancer and more

Page 3: Folding @ Home - Distributed Parallel Protein folding Chris Garlock.

How is protein folding simulated on a computer?• Atomic level simulations

• Newtonian mechanics (lots of numerical integration)

• Proteins can fold in many ways, so a statistical model is needed to represent all of the possibilities accurately

• Markov State Models

• Set of states (shapes)

• Transition rates between states

Page 4: Folding @ Home - Distributed Parallel Protein folding Chris Garlock.

Problems with serial protein folding

• Modern computers can simulate ~50 nanoseconds of protein assembly in 24 hours

• Many proteins fold on the millisecond timescale (1,000,000 nanoseconds)

• It would take 20,000 days or ~55 years to simulate folding for just one protein!

Page 5: Folding @ Home - Distributed Parallel Protein folding Chris Garlock.

Folding At Home

• F@H uses donated computational power from otherwise idle processors across the globe

• Anyone who wants to contribute to the project just needs to download the client program, and the F@H server will give them work units to complete, once a work unit is finished, the client will return results to the server and get a new work unit

• 500,000 processor cores outputting 19,900 TFLOP/S to help simulate protein folding

• This is faster than the Titan supercomputer!

Page 6: Folding @ Home - Distributed Parallel Protein folding Chris Garlock.

Challenges with distributed computing

Heterogeneous processors

F@H clients can run on Windows, Linux, OSX, Android, and PS3. A special client can also be downloaded to run on a rack of computers or a cluster of GPU’s

Extremely slow communication between processors

WAN connections

Processors can be unreliable

What happens if a host powers down their machine in the middle of a work unit?

Page 7: Folding @ Home - Distributed Parallel Protein folding Chris Garlock.

How to parallelize folding simulations1. To start a simulation project, first choose some initial conformations (protein shapes).

2. Each conformation becomes the starting point for some simulations which together are called a run

3. Within each run we launch many different trajectories each called a clone

• All clones in a run start with the same conformation, but different initial velocities for the atoms involved

4. Because each clone takes large amounts of time to execute, clones are further divided into generations. Generations have to be run serially.

5. Some clones may find additional conformations (states of equilibrium) in which case new runs are started from those conformations

6. Repeat steps 2-5 until the Markov State Model is complete

Page 8: Folding @ Home - Distributed Parallel Protein folding Chris Garlock.

A

Generate Initial Conformation

Start a run for A (comprised of 5 clones)

B

C

Discover additional Conformations

Start a run for B and C

D

Discover additional Conformations and new pathways

E

Start a run for D and E

Discover a misfold condition

Page 9: Folding @ Home - Distributed Parallel Protein folding Chris Garlock.
Page 10: Folding @ Home - Distributed Parallel Protein folding Chris Garlock.

Results

• The results of F@H simulations have been verified experimentally several times

• The drug design industry has begun to use information from simulations to narrow down the number of molecules they test experimentally. This allows drug designers to be more thorough when evaluating a molecule, and improves their throughput.

• The Folding at Home project has been active for 15 years, and over 100 papers have been published about its findings

Page 11: Folding @ Home - Distributed Parallel Protein folding Chris Garlock.

Questions?