Big Data in Drug Discoveryhpcuniversity.org/vscse/bigdata/downloads/2010-Big... · 04 2005-07...

44
Big Data in Drug Discovery David J. Wild Assistant Professor & Director, Cheminformatics Program Indiana University School of Informatics and Computing [email protected] - http://djwild.info

Transcript of Big Data in Drug Discoveryhpcuniversity.org/vscse/bigdata/downloads/2010-Big... · 04 2005-07...

Page 1: Big Data in Drug Discoveryhpcuniversity.org/vscse/bigdata/downloads/2010-Big... · 04 2005-07 2005-10 2006-01 2006-04 2006-07 2006-10 2007-01 2007-04 2007-07 2007-10 2008-01 2008-04

Big Data in Drug Discovery

David J. WildAssistant Professor & Director, Cheminformatics Program

Indiana University School of Informatics and Computing

[email protected] - http://djwild.info

Page 2: Big Data in Drug Discoveryhpcuniversity.org/vscse/bigdata/downloads/2010-Big... · 04 2005-07 2005-10 2006-01 2006-04 2006-07 2006-10 2007-01 2007-04 2007-07 2007-10 2008-01 2008-04

Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

Epochs in drug discovery

Empirical – up until 1960’s

754 First pharmacy opened in Baghdad

Late 1800’s – major pharmaceutical companies, mass production

1900-1960 – major discoveries (insulin, penicillin, the pill …)

Rational – 1960’s to 1990’s

Designing molecules to target protein active sites – “lock and key”

Computational Drug Discovery

Biggest success HIV (RT, protease inhibitors)

Big Experiment – 1990’s to 2000’s

High throughput screening

Microarray Assays

Gene Sequencing and Human Genome Project

Big Data – 2010’s onwards

Informatics-driven drug discovery

Accepting the body is amazingly complex and we don’t understand it well

Everything is connected

Page 3: Big Data in Drug Discoveryhpcuniversity.org/vscse/bigdata/downloads/2010-Big... · 04 2005-07 2005-10 2006-01 2006-04 2006-07 2006-10 2007-01 2007-04 2007-07 2007-10 2008-01 2008-04

Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

The metabolic pathways of a single cell

David Wild,

December 2009.

Page 4: Big Data in Drug Discoveryhpcuniversity.org/vscse/bigdata/downloads/2010-Big... · 04 2005-07 2005-10 2006-01 2006-04 2006-07 2006-10 2007-01 2007-04 2007-07 2007-10 2008-01 2008-04

Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

The inner life of the cell

David Wild,

December 2009.

http://video.google.com/videoplay?docid=-2351549868099343381&hl=en#

Page 5: Big Data in Drug Discoveryhpcuniversity.org/vscse/bigdata/downloads/2010-Big... · 04 2005-07 2005-10 2006-01 2006-04 2006-07 2006-10 2007-01 2007-04 2007-07 2007-10 2008-01 2008-04

Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

Big Data in the public domain

There is now an incredibly rich resource of public information relating compounds, targets, genes, pathways, and diseases. Just for starters there is in the public domain information on:

69 million compounds and 449,392 bioassays (PubChem)

4,763 drugs (DrugBank)

9 million protein sequences (SwissProt) and 58,000 3D structures (PDB)

14 million human nucleotide sequences (EMBL)

19 million life science publications - 800,000 new each year (PubMed)

Multitude of other sets (drugs, toxicogenomics, chemogenomics, SAR, …)

Even more important are the relationships between these entities. For example a chemical compound can be linked to a gene or a protein target in a multitude of ways:

Biological assay with percent inhibition, IC50, etc

Crystal structure of ligand/protein complex

Co-occurrence in a paper abstract

Computational experiment (docking, predictive model)

Statistical relationship

System association (e.g. involved in same pathways cellular processes)

Page 6: Big Data in Drug Discoveryhpcuniversity.org/vscse/bigdata/downloads/2010-Big... · 04 2005-07 2005-10 2006-01 2006-04 2006-07 2006-10 2007-01 2007-04 2007-07 2007-10 2008-01 2008-04

Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

PubChem growth since 2005

David Wild,

December 2009.

2,824,265

35,379,748

56,774,950

69,088,100

0

10,000,000

20,000,000

30,000,000

40,000,000

50,000,000

60,000,000

70,000,000

80,000,000

2005-0

1

2005-0

3

2005-0

5

2005-0

7

2005-0

9

2005-1

1

2006-0

1

2006-0

3

2006-0

5

2006-0

7

2006-0

9

2006-1

1

2007-0

1

2007-0

3

2007-0

5

2007-0

7

2007-0

9

2007-1

1

2008-0

1

2008-0

3

2008-0

5

2008-0

7

2008-0

9

2008-1

1

2009-0

1

2009-0

3

2009-0

5

2009-0

7

2009-0

9

2009-1

1

2010-0

1

2010-0

3

2010-0

5

2010-0

7

PubChem Substance Size 2005-2010

Addition of

ChemSpider

434635

1

10

100

1000

10000

100000

1000000

2005-0

1

2005-0

4

2005-0

7

2005-1

0

2006-0

1

2006-0

4

2006-0

7

2006-1

0

2007-0

1

2007-0

4

2007-0

7

2007-1

0

2008-0

1

2008-0

4

2008-0

7

2008-1

0

2009-0

1

2009-0

4

2009-0

7

2009-1

0

2010-0

1

2010-0

4

2010-0

7

PubChem Bioassays 2005-2010

Addition of

ChEMBL

Page 7: Big Data in Drug Discoveryhpcuniversity.org/vscse/bigdata/downloads/2010-Big... · 04 2005-07 2005-10 2006-01 2006-04 2006-07 2006-10 2007-01 2007-04 2007-07 2007-10 2008-01 2008-04

Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

Large amount of data and links for each compound

David Wild,

December 2009.

Page 8: Big Data in Drug Discoveryhpcuniversity.org/vscse/bigdata/downloads/2010-Big... · 04 2005-07 2005-10 2006-01 2006-04 2006-07 2006-10 2007-01 2007-04 2007-07 2007-10 2008-01 2008-04

Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

Proteins & Genes

David Wild,

December 2009.

http://www.genome.jp/en/db_growth.html

Page 9: Big Data in Drug Discoveryhpcuniversity.org/vscse/bigdata/downloads/2010-Big... · 04 2005-07 2005-10 2006-01 2006-04 2006-07 2006-10 2007-01 2007-04 2007-07 2007-10 2008-01 2008-04

Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

Chem2Bio2RDF: The FaceBook of Drug Discovery

David Wild,

December 2009.

Page 10: Big Data in Drug Discoveryhpcuniversity.org/vscse/bigdata/downloads/2010-Big... · 04 2005-07 2005-10 2006-01 2006-04 2006-07 2006-10 2007-01 2007-04 2007-07 2007-10 2008-01 2008-04

Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

You are a big pile of data too!

Page 11: Big Data in Drug Discoveryhpcuniversity.org/vscse/bigdata/downloads/2010-Big... · 04 2005-07 2005-10 2006-01 2006-04 2006-07 2006-10 2007-01 2007-04 2007-07 2007-10 2008-01 2008-04

Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

Large-scale predictive modeling adds even more data

Range of ROCV values from different classes of BioAssay data set. Range of ROCV values from three different classes of BioAssay data set for original

models and models built with additional inactive compounds (ŅimprovedÓ).

Chen, B. and Wild, D.J. PubChem BioAssays as a data source for predictive models, Journal of

Molecular Graphics and Modeling. 2010; 28, 420-426.

Page 12: Big Data in Drug Discoveryhpcuniversity.org/vscse/bigdata/downloads/2010-Big... · 04 2005-07 2005-10 2006-01 2006-04 2006-07 2006-10 2007-01 2007-04 2007-07 2007-10 2008-01 2008-04

Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

Informatics-based drug discovery

Predicting new molecular targets for known drugs. Nature 462, 175-181(12 November 2009)

Page 13: Big Data in Drug Discoveryhpcuniversity.org/vscse/bigdata/downloads/2010-Big... · 04 2005-07 2005-10 2006-01 2006-04 2006-07 2006-10 2007-01 2007-04 2007-07 2007-10 2008-01 2008-04

Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

“Systems chemical biology” and chemogenomics

Page 14: Big Data in Drug Discoveryhpcuniversity.org/vscse/bigdata/downloads/2010-Big... · 04 2005-07 2005-10 2006-01 2006-04 2006-07 2006-10 2007-01 2007-04 2007-07 2007-10 2008-01 2008-04

Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

Recent enabling technologies for SCB / Chemogenomics

Cloud computing allows processing

and data mining on a vast scale

Integrative cheminformatics &

bioinformatics connects

compounds, targets genes, pathways, diseases and side

effects

Health informatics (PHRs and EHRs)

allows integration of the molecular and

patient models (QP)

Semantic technologies and complex systems

tools allow seamless integration and

human-scale data mining

Analysis

Visualization, projection,data mining, hypothesis

generation, network tools

Integration

RDF, XML, Triple StoresOntologies, SPARQL,

Graph algorithms

Access

Web Services, RPCInformation extraction

Page 15: Big Data in Drug Discoveryhpcuniversity.org/vscse/bigdata/downloads/2010-Big... · 04 2005-07 2005-10 2006-01 2006-04 2006-07 2006-10 2007-01 2007-04 2007-07 2007-10 2008-01 2008-04

Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

ChemBioGrid.org: Web service infrastructure for cheminformatics

Dong, X., Gilbert, K.E., Guha, R., Heiland, R., Kim, J., Pierce, M.E. Pierce, Fox, G.C. and Wild, D.J. Web service

infrastructure for chemoinformatics, J. Chem. Inf. Model., 2007; 47(4) pp 1303-1307.

Page 16: Big Data in Drug Discoveryhpcuniversity.org/vscse/bigdata/downloads/2010-Big... · 04 2005-07 2005-10 2006-01 2006-04 2006-07 2006-10 2007-01 2007-04 2007-07 2007-10 2008-01 2008-04

Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

The Semantic Web – meaning & relationships

Page 17: Big Data in Drug Discoveryhpcuniversity.org/vscse/bigdata/downloads/2010-Big... · 04 2005-07 2005-10 2006-01 2006-04 2006-07 2006-10 2007-01 2007-04 2007-07 2007-10 2008-01 2008-04

Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

Chem2Bio2RDF – RDF integration & SPARQL querying

Chen, B., Dong. X., Jiao, D., Wang, H., Zhu, Q., Ding, Y., Wild, D.J.Chem2Bio2RDF: a semantic framework for linking and data mining chemogenomic and systems

chemical biology data. BMC Bioinformatics 2010, 11, 255

Page 18: Big Data in Drug Discoveryhpcuniversity.org/vscse/bigdata/downloads/2010-Big... · 04 2005-07 2005-10 2006-01 2006-04 2006-07 2006-10 2007-01 2007-04 2007-07 2007-10 2008-01 2008-04

Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

Chem2Bio2RDF context

Page 19: Big Data in Drug Discoveryhpcuniversity.org/vscse/bigdata/downloads/2010-Big... · 04 2005-07 2005-10 2006-01 2006-04 2006-07 2006-10 2007-01 2007-04 2007-07 2007-10 2008-01 2008-04

Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

Chem2Bio2RDF Relationships

Page 20: Big Data in Drug Discoveryhpcuniversity.org/vscse/bigdata/downloads/2010-Big... · 04 2005-07 2005-10 2006-01 2006-04 2006-07 2006-10 2007-01 2007-04 2007-07 2007-10 2008-01 2008-04

Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

Linked Open Data Cloud (linkeddata.org)

Page 21: Big Data in Drug Discoveryhpcuniversity.org/vscse/bigdata/downloads/2010-Big... · 04 2005-07 2005-10 2006-01 2006-04 2006-07 2006-10 2007-01 2007-04 2007-07 2007-10 2008-01 2008-04

Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

Converting data into RDF

Page 22: Big Data in Drug Discoveryhpcuniversity.org/vscse/bigdata/downloads/2010-Big... · 04 2005-07 2005-10 2006-01 2006-04 2006-07 2006-10 2007-01 2007-04 2007-07 2007-10 2008-01 2008-04

Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

Finding multi-target inhibitors of MAPK pathway with a SPARQL query

Page 23: Big Data in Drug Discoveryhpcuniversity.org/vscse/bigdata/downloads/2010-Big... · 04 2005-07 2005-10 2006-01 2006-04 2006-07 2006-10 2007-01 2007-04 2007-07 2007-10 2008-01 2008-04

Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

Finding compounds with similar polypharmacology using SPARQL

Page 24: Big Data in Drug Discoveryhpcuniversity.org/vscse/bigdata/downloads/2010-Big... · 04 2005-07 2005-10 2006-01 2006-04 2006-07 2006-10 2007-01 2007-04 2007-07 2007-10 2008-01 2008-04

Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

Projecting queries into chemical space

GTM / MDS projection and embedding of all PubChem using clouds

Plotting and embedding unknown

compounds with SCB

property labels

Dynamic querying and projection into

chemical space

Page 25: Big Data in Drug Discoveryhpcuniversity.org/vscse/bigdata/downloads/2010-Big... · 04 2005-07 2005-10 2006-01 2006-04 2006-07 2006-10 2007-01 2007-04 2007-07 2007-10 2008-01 2008-04

Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

Projecting queries into chemical space

Choi, J.Y. , Bae, S.H., Qiu, J., Fox, G., Chen, B., Wild. D.J. Browsing Large Scale Cheminformatics Data

with Dimension Reduction. Emerging Computational Methods for the Life Sciences Workshop, ACM

Symposium for High Performance Distributed Computing Jun 21-25, 2010, Chicago IL

Page 26: Big Data in Drug Discoveryhpcuniversity.org/vscse/bigdata/downloads/2010-Big... · 04 2005-07 2005-10 2006-01 2006-04 2006-07 2006-10 2007-01 2007-04 2007-07 2007-10 2008-01 2008-04

Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

“Doppler Radar Plot” – Kinase Specificity

Choi, J.Y. , Bae, S.H., Qiu, J., Fox, G., Chen, B., Wild. D.J. Browsing Large Scale Cheminformatics Data

with Dimension Reduction. Emerging Computational Methods for the Life Sciences Workshop, ACM

Symposium for High Performance Distributed Computing Jun 21-25, 2010, Chicago IL

Page 27: Big Data in Drug Discoveryhpcuniversity.org/vscse/bigdata/downloads/2010-Big... · 04 2005-07 2005-10 2006-01 2006-04 2006-07 2006-10 2007-01 2007-04 2007-07 2007-10 2008-01 2008-04

Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

“Doppler Radar Plot” – Kinase Specificity

Page 28: Big Data in Drug Discoveryhpcuniversity.org/vscse/bigdata/downloads/2010-Big... · 04 2005-07 2005-10 2006-01 2006-04 2006-07 2006-10 2007-01 2007-04 2007-07 2007-10 2008-01 2008-04

Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

Chem2Bio2RDF Dashboard: finding paths

Page 29: Big Data in Drug Discoveryhpcuniversity.org/vscse/bigdata/downloads/2010-Big... · 04 2005-07 2005-10 2006-01 2006-04 2006-07 2006-10 2007-01 2007-04 2007-07 2007-10 2008-01 2008-04

Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

Pathfinder

DexamethasoneTriamcinalone

NFKB1

Glucocorticoid Receptor

http://ella.slis.indiana.edu/~yuysun/flex/pathfinder.html

Page 30: Big Data in Drug Discoveryhpcuniversity.org/vscse/bigdata/downloads/2010-Big... · 04 2005-07 2005-10 2006-01 2006-04 2006-07 2006-10 2007-01 2007-04 2007-07 2007-10 2008-01 2008-04

Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

Dynamic exploration with clouds and Cytoscape

Virtuoso runs Chem2Bio2RDF queries on the

cloud

Cytoscape plugins give access to

Chem2Bio2RDF, LPG and chemical

structure visualization

Dynamic exploration in

Cytoscape

Page 31: Big Data in Drug Discoveryhpcuniversity.org/vscse/bigdata/downloads/2010-Big... · 04 2005-07 2005-10 2006-01 2006-04 2006-07 2006-10 2007-01 2007-04 2007-07 2007-10 2008-01 2008-04

Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

Hydrocortisone – Dexamethasone links

Fig. Use Case 1.Network diagram of the paths obtained between Hydrocortisone and Dexamethasone using

ChemBioScape.Drugbank interaction contains information about every drug’s target. In this case, DB00741 and DB01234

share common targets through several different Drugbank interaction ID’s.

Page 32: Big Data in Drug Discoveryhpcuniversity.org/vscse/bigdata/downloads/2010-Big... · 04 2005-07 2005-10 2006-01 2006-04 2006-07 2006-10 2007-01 2007-04 2007-07 2007-10 2008-01 2008-04

Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

Tolcapone and Entacapone links

Fig. Use case 2.Tolcapone and Entacapone are connected to each other through drugbank interaction

2348 and 1962.Also, the two drugs appear in PubMed articles 8119326 and 8223912 via their CID

(Compound ID)

Page 33: Big Data in Drug Discoveryhpcuniversity.org/vscse/bigdata/downloads/2010-Big... · 04 2005-07 2005-10 2006-01 2006-04 2006-07 2006-10 2007-01 2007-04 2007-07 2007-10 2008-01 2008-04

Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

Isoniazid and Ethionamide – replicate paper results

Banerjee, A., Dubnau, E., Quemard, A., Balasubramanian, V., Um, K., Wilson, T., et al.: inhA, a gene encoding a target for isoniazid

and ethionamide in Mycobacterium tuberculosis. Science, 263(5144), 227-230 (1994).

Page 34: Big Data in Drug Discoveryhpcuniversity.org/vscse/bigdata/downloads/2010-Big... · 04 2005-07 2005-10 2006-01 2006-04 2006-07 2006-10 2007-01 2007-04 2007-07 2007-10 2008-01 2008-04

Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

WENDI – exploring compound knowledge space

Doxorubicin (anthracyclin antibiotic)

Page 35: Big Data in Drug Discoveryhpcuniversity.org/vscse/bigdata/downloads/2010-Big... · 04 2005-07 2005-10 2006-01 2006-04 2006-07 2006-10 2007-01 2007-04 2007-07 2007-10 2008-01 2008-04

Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

WENDI v1.0 - insights from the literature

Page 36: Big Data in Drug Discoveryhpcuniversity.org/vscse/bigdata/downloads/2010-Big... · 04 2005-07 2005-10 2006-01 2006-04 2006-07 2006-10 2007-01 2007-04 2007-07 2007-10 2008-01 2008-04

Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

WENDI v2.0 - Automated reasoning with RDF

Simple OWL ontology for relationships

Large RDF network expands out from Query

RDF inference engines applied & results filtered / prioritized

QUERY

CID

86427

CID

8642

AID

328

PubMed

12856

Breast

Cancer

Breast

Cancer

HER2Breast

Cancer

similar_to

similar_to

active_against contains_term

contained_in contains_term

contains_term

predicted_

inactive_again

st

Page 37: Big Data in Drug Discoveryhpcuniversity.org/vscse/bigdata/downloads/2010-Big... · 04 2005-07 2005-10 2006-01 2006-04 2006-07 2006-10 2007-01 2007-04 2007-07 2007-10 2008-01 2008-04

Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

WENDI OWL/RDF Network & Inferred Associations

Page 38: Big Data in Drug Discoveryhpcuniversity.org/vscse/bigdata/downloads/2010-Big... · 04 2005-07 2005-10 2006-01 2006-04 2006-07 2006-10 2007-01 2007-04 2007-07 2007-10 2008-01 2008-04

Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

Semantic text mining of journal articles

Jiao, D. and Wild, D.J. Extraction of CYP Chemical Interactions from Biomedical Literature Using Natural Language

Processing Methods, Journal of Chemical Information and Modeling, 49(2); pp263-269

Page 39: Big Data in Drug Discoveryhpcuniversity.org/vscse/bigdata/downloads/2010-Big... · 04 2005-07 2005-10 2006-01 2006-04 2006-07 2006-10 2007-01 2007-04 2007-07 2007-10 2008-01 2008-04

Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

Chemical & Biological Literature Extraction

Page 40: Big Data in Drug Discoveryhpcuniversity.org/vscse/bigdata/downloads/2010-Big... · 04 2005-07 2005-10 2006-01 2006-04 2006-07 2006-10 2007-01 2007-04 2007-07 2007-10 2008-01 2008-04

Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

Validating topics by experimental relationships

Topic 26: cell, expression, cancer,

tumor,…

Related Disease: DNA Damage,

Melanoma, Glioblastoma, …

Un-proved link

proved link by c2b2r_chemogenomics

Target

Drug

Page 41: Big Data in Drug Discoveryhpcuniversity.org/vscse/bigdata/downloads/2010-Big... · 04 2005-07 2005-10 2006-01 2006-04 2006-07 2006-10 2007-01 2007-04 2007-07 2007-10 2008-01 2008-04

Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

Bio-LDA III

Entropy

In information theory, entropy is a measure of the uncertainty associated with a

random variable.

Here we can compute the bio-term entropies over topics

Kullback-Leibler divergence (KL divergence)

a non-symmetric measure of the difference between two probability distributions.

Here we used the KL divergence as the non-symmetric distance measure for two

bio-terms over topics

Page 42: Big Data in Drug Discoveryhpcuniversity.org/vscse/bigdata/downloads/2010-Big... · 04 2005-07 2005-10 2006-01 2006-04 2006-07 2006-10 2007-01 2007-04 2007-07 2007-10 2008-01 2008-04

Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

Combining path finding and Bio-LDA

Detect semantic association

Path finding algorithm

millions of RDF triples from Chem2bio2rdf

Assess semantic association

Bio-LDA model

Entropy and KL divergence

Additional knowledge base: 50, 100 and 200 topics using the recent 336,899

MEDLINE abstracts, which contains 13,338 identical bio-terms

Page 43: Big Data in Drug Discoveryhpcuniversity.org/vscse/bigdata/downloads/2010-Big... · 04 2005-07 2005-10 2006-01 2006-04 2006-07 2006-10 2007-01 2007-04 2007-07 2007-10 2008-01 2008-04

Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

Summary

Drug discovery is enetering a new era that is arguably centered on informatics

analysis of the vast amount of biological and chemical data now being produced,

and which looks at the effect of drugs on biological systems as a whole. This new

approach underlies the new fields of systems chemical biology and chemogenomics

Analyzing this data and particularly the relationships beween compounds, drugs,

proteins, genes, diseases, pathways and people promises to provide important

understanding of the nature of disease and treatment

The Semantic Web provides an effective framework for logically managing the data,

and Cloud Computing provides a physical framework for computation and searching

Early-stage methods developed at Indiana allow integrated access to this data, path

finding between any two points, visualization in chemical space and network tools,

and advanced handling of the scholarly literature

Critical next steps include ranking and intelligent filtering of paths and relationships

to provide aggregate evidence-based approaches, and integration of NGS and

patient data

Page 44: Big Data in Drug Discoveryhpcuniversity.org/vscse/bigdata/downloads/2010-Big... · 04 2005-07 2005-10 2006-01 2006-04 2006-07 2006-10 2007-01 2007-04 2007-07 2007-10 2008-01 2008-04

Big Data in Drug Discovery David Wild, July 2010. http://djwild.info.

Cheminformatics group at Indiana University

http://djwild.info