High-Throughput Virtual Molecular Docking: Hadoop Implementation of AutoDock4 on a Private Cloud

19
High-Throughput Virtual Molecular Docking: Hadoop Implementation of AutoDock4 on a Private Cloud Sally R. Ellingson Graduate Research Assistant Center for Molecular Biophysics, UT/ORNL Department of Genome Science and Technology, UT Scalable Computing and Leading Edge Innovative Technologies (IGERT) Dr. Jerome Baudry PhD Advisor Center for Molecular Biophysics, UT/ORNL Department of BCMB, UT International Emerging Computational Methods for the Life Sciences M International Symposium on High Performance Distributed Computing June 8, 2011, San Jose, CA

description

High-Throughput Virtual Molecular Docking: Hadoop Implementation of AutoDock4 on a Private Cloud. The Second International Emerging Computational Methods for the Life Sciences Workshop ACM International Symposium on High Performance Distributed Computing June 8, 2011, San Jose, CA. - PowerPoint PPT Presentation

Transcript of High-Throughput Virtual Molecular Docking: Hadoop Implementation of AutoDock4 on a Private Cloud

Page 1: High-Throughput Virtual Molecular Docking:  Hadoop  Implementation of AutoDock4 on a Private Cloud

High-Throughput Virtual Molecular Docking: Hadoop Implementation of AutoDock4 on a Private Cloud

Sally R. EllingsonGraduate Research Assistant

Center for Molecular Biophysics, UT/ORNLDepartment of Genome Science and Technology, UT

Scalable Computing and Leading Edge Innovative Technologies (IGERT)

Dr. Jerome BaudryPhD Advisor

Center for Molecular Biophysics, UT/ORNLDepartment of BCMB, UT

The Second International Emerging Computational Methods for the Life Sciences WorkshopACM International Symposium on High Performance Distributed Computing

June 8, 2011, San Jose, CA

Page 2: High-Throughput Virtual Molecular Docking:  Hadoop  Implementation of AutoDock4 on a Private Cloud

Ultimate Goal:

Reduce the time and cost of discovering novel drugs

Page 3: High-Throughput Virtual Molecular Docking:  Hadoop  Implementation of AutoDock4 on a Private Cloud

1. Virtual Molecular Dockinga) Novel Drug Discoveryb) Virtual high-throughput screenings (VHTS)

2. Cloud Computinga) Advantages for VHTSb) Kandinskyc) Hadoop (MapReduce)

3. AutoDockClouda) Current Implementationb) Future Implementations

Page 4: High-Throughput Virtual Molecular Docking:  Hadoop  Implementation of AutoDock4 on a Private Cloud

Virtual Molecular Docking

Given a receptor (protein) and ligand (small molecule), predict

1. Bound conformations• Search algorithm to explore conformational space

2. Binding affinity• Force field to evaluate energetics

Page 5: High-Throughput Virtual Molecular Docking:  Hadoop  Implementation of AutoDock4 on a Private Cloud

Autodock4Virtual Docking Engine

http://autodock.scripps.edu/wiki/AutoDock4

Page 6: High-Throughput Virtual Molecular Docking:  Hadoop  Implementation of AutoDock4 on a Private Cloud

Novel Drug Discovery

Human HDAC4HA3 crystal structureZINC03962325

Page 7: High-Throughput Virtual Molecular Docking:  Hadoop  Implementation of AutoDock4 on a Private Cloud

Virtual High-Throughput Screening (VHTS)

Page 8: High-Throughput Virtual Molecular Docking:  Hadoop  Implementation of AutoDock4 on a Private Cloud

VHTS with Autodock4

Page 9: High-Throughput Virtual Molecular Docking:  Hadoop  Implementation of AutoDock4 on a Private Cloud

Potential advantages of Cloud Computing for VHTS

• Affordable access to compute resources (especially for small labs and classrooms).

• Easy to use interface accessible through web for non-computer experts. Software maintained by experts.

• Scalable resources for size of screening.

Page 10: High-Throughput Virtual Molecular Docking:  Hadoop  Implementation of AutoDock4 on a Private Cloud

KandinskyPrivate Cloud Platform at ORNL

Kandinsky, the Systems Biology Knowledgebase Computer, Sponsored by the Office of

Biological and Environmental Research in the DOE Office of Science

68 nodes X 16 cores/node = 1088 cores 20 Gbps Infiniband Interconnect

Designed to support Hadoop applications and gain an understanding of the MapReduce paradigm.

•57 nodes for MapReduce tasks • 1 tasktracker per node •10 map and 6 reduce tasks per node (16 tasks per node) •570 map tasks and 342 reduce tasks can run simultaneously on Kandinsky

Page 11: High-Throughput Virtual Molecular Docking:  Hadoop  Implementation of AutoDock4 on a Private Cloud

Hadoop

• Scalable• Economical• Efficient• Reliable

http://hadoop.apache.org/common/docs/current/api/overview-summary.html

Page 12: High-Throughput Virtual Molecular Docking:  Hadoop  Implementation of AutoDock4 on a Private Cloud

MapReduce programming paradigm used by Hadoop

people.apache.org

people.apache.org

Page 13: High-Throughput Virtual Molecular Docking:  Hadoop  Implementation of AutoDock4 on a Private Cloud

Current AutoDockCloud Implementation

input=file names needed for each docking

map(input){

copy input to local working directory;run AutoDock4 locally;copy result file to HDFS;

}

*pre-docking set-up and post-docking analysis is currently done manually*no reduce function is currently being used

Page 14: High-Throughput Virtual Molecular Docking:  Hadoop  Implementation of AutoDock4 on a Private Cloud

Current AutoDockCloud Implementation

Er Agonist screening from DUD as benchmark450 speed-up with 570 available map slots on Kandinsky, private cloud at ORNL

Page 15: High-Throughput Virtual Molecular Docking:  Hadoop  Implementation of AutoDock4 on a Private Cloud

Current AutoDockCloud Implementation

Docking enrichment plot for ER agonist using AutoDockCloud and DUD.

Perc

ent o

f kno

wn

ligan

ds fo

und

Percent of ranked database

Page 16: High-Throughput Virtual Molecular Docking:  Hadoop  Implementation of AutoDock4 on a Private Cloud

Future AutoDockCloud Implementationinput=ligand file from chemical compound database

map(input){

create pdbqt (AutoDock input file) from input;run AutoDock4 locally;find best scoring ligand structure;save structure to HDFS;return <score, ligand>;

}reduce(<score, ligand>){

sort; return ranked_database;}

*pre-docking and post-docking will be automated and distributed*less total I/O requirements

Page 17: High-Throughput Virtual Molecular Docking:  Hadoop  Implementation of AutoDock4 on a Private Cloud

Future Plans

• Incorporate additional docking engines– Autodock Vina• Less I/O• More efficient and accurate algorithm• No charge information needed

• Deploy on Commercial Cloud (EC2)• Develop web interface

Page 18: High-Throughput Virtual Molecular Docking:  Hadoop  Implementation of AutoDock4 on a Private Cloud

1. Virtual Molecular Dockinga) Novel Drug Discoveryb) Virtual high-throughput screenings (VHTS)

2. Cloud Computinga) Advantages for VHTSb) Kandinskyc) Hadoop (MapReduce)

3. AutoDockClouda) Current Implementationb) Future Implementations

Page 19: High-Throughput Virtual Molecular Docking:  Hadoop  Implementation of AutoDock4 on a Private Cloud

Questions/Comments

Acknowledgements• Dr. Jerome Baudry (advisor)• Center for Molecular Biophysics, UT/ORNL• Genome Science and Technology, UT• Scalable Computing and Leading Edge Innovative Technologies (IGERT)• Avinash Kewalramani, ORNL• ECMLS and HPDC organizers and participants