High-Throughput Virtual Molecular Docking: Hadoop Implementation of AutoDock4 on a Private Cloud

Sally R. EllingsonGraduate Research Assistant

Center for Molecular Biophysics, UT/ORNLDepartment of Genome Science and Technology, UT

Scalable Computing and Leading Edge Innovative Technologies (IGERT)

Dr. Jerome BaudryPhD Advisor

Center for Molecular Biophysics, UT/ORNLDepartment of BCMB, UT

The Second International Emerging Computational Methods for the Life Sciences WorkshopACM International Symposium on High Performance Distributed Computing

June 8, 2011, San Jose, CA

Ultimate Goal:

Reduce the time and cost of discovering novel drugs

1. Virtual Molecular Dockinga) Novel Drug Discoveryb) Virtual high-throughput screenings (VHTS)

2. Cloud Computinga) Advantages for VHTSb) Kandinskyc) Hadoop (MapReduce)

3. AutoDockClouda) Current Implementationb) Future Implementations

Virtual Molecular Docking

Given a receptor (protein) and ligand (small molecule), predict

1. Bound conformations• Search algorithm to explore conformational space

2. Binding affinity• Force field to evaluate energetics

Autodock4Virtual Docking Engine

http://autodock.scripps.edu/wiki/AutoDock4

Novel Drug Discovery

Human HDAC4HA3 crystal structureZINC03962325

Virtual High-Throughput Screening (VHTS)

VHTS with Autodock4

Potential advantages of Cloud Computing for VHTS

• Affordable access to compute resources (especially for small labs and classrooms).

• Easy to use interface accessible through web for non-computer experts. Software maintained by experts.

• Scalable resources for size of screening.

KandinskyPrivate Cloud Platform at ORNL

Kandinsky, the Systems Biology Knowledgebase Computer, Sponsored by the Office of

Biological and Environmental Research in the DOE Office of Science

68 nodes X 16 cores/node = 1088 cores 20 Gbps Infiniband Interconnect

Designed to support Hadoop applications and gain an understanding of the MapReduce paradigm.

•57 nodes for MapReduce tasks • 1 tasktracker per node •10 map and 6 reduce tasks per node (16 tasks per node) •570 map tasks and 342 reduce tasks can run simultaneously on Kandinsky

Hadoop

• Scalable• Economical• Efficient• Reliable

http://hadoop.apache.org/common/docs/current/api/overview-summary.html

MapReduce programming paradigm used by Hadoop

people.apache.org

Current AutoDockCloud Implementation

input=file names needed for each docking

map(input){

copy input to local working directory;run AutoDock4 locally;copy result file to HDFS;

*pre-docking set-up and post-docking analysis is currently done manually*no reduce function is currently being used

Er Agonist screening from DUD as benchmark450 speed-up with 570 available map slots on Kandinsky, private cloud at ORNL

Docking enrichment plot for ER agonist using AutoDockCloud and DUD.

Percent of ranked database

Future AutoDockCloud Implementationinput=ligand file from chemical compound database

map(input){

create pdbqt (AutoDock input file) from input;run AutoDock4 locally;find best scoring ligand structure;save structure to HDFS;return <score, ligand>;

}reduce(<score, ligand>){

sort; return ranked_database;}

*pre-docking and post-docking will be automated and distributed*less total I/O requirements

Future Plans

• Incorporate additional docking engines– Autodock Vina• Less I/O• More efficient and accurate algorithm• No charge information needed

• Deploy on Commercial Cloud (EC2)• Develop web interface

1. Virtual Molecular Dockinga) Novel Drug Discoveryb) Virtual high-throughput screenings (VHTS)

2. Cloud Computinga) Advantages for VHTSb) Kandinskyc) Hadoop (MapReduce)

3. AutoDockClouda) Current Implementationb) Future Implementations

Questions/Comments

Acknowledgements• Dr. Jerome Baudry (advisor)• Center for Molecular Biophysics, UT/ORNL• Genome Science and Technology, UT• Scalable Computing and Leading Edge Innovative Technologies (IGERT)• Avinash Kewalramani, ORNL• ECMLS and HPDC organizers and participants

High-Throughput Virtual Molecular Docking: Hadoop Implementation of AutoDock4 on a Private Cloud

Documents

Transcript of High-Throughput Virtual Molecular Docking: Hadoop Implementation of AutoDock4 on a Private Cloud

High-Throughput Virtual Molecular Docking: Hadoop Implementation of AutoDock4 on a Private Cloud Sally R. Ellingson Graduate Research Assistant Center.

IntelliDoX Docking Module Operator Manual · INTELLIDOX DOCKING MODULE OPERATOR MANUAL || GETTING STARTED . About the IntelliDoX Docking Module The IntelliDoX Docking Module (‘the

71550 Docking

DOCKING TUTORIAL - Unistrainfochim.u-strasbg.fr/CS3_2010/Tutorial/Docking/tutorial-DOCKING... · DOCKING TUTORIAL A. The docking Workflow 1. Ligand preparation It consists in the

Molecular Docking Tutorial - Universitas Indonesiatelaga.cs.ui.ac.id/~heru/research/docking09/Autodock4/Docking Tutorial.pdfthe autodock program as potential tools to dock inhibitors

Hardware Accelerated Molecular Docking: A Survey...of hardware accelerated molecular docking. 2. Overview of molecular docking Molecular docking is a computer simulation technique

Docking right

Docking Survey

Docking and Post-Docking strategiesinfochim.u-strasbg.fr/CS3/program/material/Rognan.pdf · CENTRE NATIONAL DE LA RECHERCHE SCIENTIFIQUE Docking and Post-Docking strategies Didier

Docking Ega

Drug Docking

Protein Docking

Docking Ppt.

Hadoop , Hadoop , Hadoop !!!

Cross Docking

Molecular docking

Docking Final

Docking - Docking

2. Hadoop - lsd.ls.fi.upm.eslsd.ls.fi.upm.es/nuevas-tendencias-en-sistemas-distribuidos/Hadoop_… · Hadoop Hadoop Software Ecosystem Hadoop MapReduce Hadoop Distributed File System

Docking Paper