Use of Machine Learning in Chemoinformatics

Post on 11-Jan-2016

58 views 2 download

Tags:

description

Use of Machine Learning in Chemoinformatics. Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course. Major Aspects of Chemoinformatics. Databases: Development of databases for storage and retrieval of small molecule structures and their properties. - PowerPoint PPT Presentation

Transcript of Use of Machine Learning in Chemoinformatics

Use of Machine Learning in Chemoinformatics

Irene KouskoumvekakiAssociate Professor

December 12th, 2012Biological Sequence Analysis course

2 CBS, Department of Systems Biology

Major Aspects of Chemoinformatics

•Databases: Development of databases for storage and retrieval of small molecule structures and their properties.

•Machine learning: Training of Decision Trees, Neural Networks, Self Organizing Maps, etc. on molecular data.

•Predictions: Molecular properties relevant to drugs, virtual screening of chemical libraries, system chemical biology networks…

3 CBS, Department of Systems Biology

Machine Learning

4 CBS, Department of Systems Biology

5 CBS, Department of Systems Biology

6 CBS, Department of Systems Biology

7 CBS, Department of Systems Biology

8 CBS, Department of Systems Biology

9 CBS, Department of Systems Biology

10 CBS, Department of Systems Biology

11 CBS, Department of Systems Biology

12 CBS, Department of Systems Biology

13 CBS, Department of Systems Biology

14 CBS, Department of Systems Biology

15 CBS, Department of Systems Biology

16 CBS, Department of Systems Biology

17 CBS, Department of Systems Biology

18 CBS, Department of Systems Biology

Machine learning classifiers

19 CBS, Department of Systems Biology

Clustering: Self Organizing Maps

Distinguishing molecules of different biological activities and finding a new lead structure

20 CBS, Department of Systems Biology

Clustering: Self Organizing Maps

Distinguishing molecules of different biological activities and finding a new lead structure

21 CBS, Department of Systems Biology

Clustering: Self Organizing Maps

Distinguishing molecules of different biological activities and finding a new lead structure

22 CBS, Department of Systems Biology

Clustering: Self Organizing Maps

Distinguishing molecules of different biological activities and finding a new lead structure

23 CBS, Department of Systems Biology

Machine Learning

24 CBS, Department of Systems Biology

Machine Learning

Molecular

StructuresProperties

Molecular Descriptors

QSAR

Virtual Screening

Clustering

Classification

25 CBS, Department of Systems Biology

Different descriptor types

• Simple feature counts (such as number of rotatable bonds or molecular weight)

• Fragmental descriptors which indicate the presence or absence (or count) of groups of atoms and substructures

• Physicochemical properties (density, solubility, vdWaals volume)

• Topological indices (size, branching, overall shape)

26 CBS, Department of Systems Biology

Major Aspects of Chemoinformatics

•Databases: Development of databases for storage and retrieval of small molecule structures and their properties.

•Machine learning: Training of Decision Trees, Neural Networks, Self Organizing Maps, etc. on molecular data.

•Predictions: Molecular properties relevant to drugs, virtual screening of chemical libraries, system chemical biology networks…

27 CBS, Department of Systems Biology

In QSAR models structural parameters (descriptors) are fitted to experimental data for biological activity (or another given property, P)

Quantitative Structure-Activity Relationships (QSAR)

28 CBS, Department of Systems Biology

Prediction of Solubility, ADME & Toxicity

29 CBS, Department of Systems Biology

hERG Classification with SVM

30 CBS, Department of Systems Biology

Evaluation of the data set

31 CBS, Department of Systems Biology

Performance of SVM

32 CBS, Department of Systems Biology

Performance of SVM

33 CBS, Department of Systems Biology

Virtual screening

34 CBS, Department of Systems Biology

Similarity Search

• Similar Property Principle – Molecules having similar structures and properties are expected to exhibit similar biological activity.

• Thus, molecules that are located closely together in the chemical space are often considered to be functionally related.

35 CBS, Department of Systems Biology

Fingerprints-based Similarity Search

– widely used similarity search tool– consists of descriptors encoded as bit strings– Bit strings of query and database are compared using

similarity metric such as Tanimoto coefficient

MACCS fingerprints: 166 structural keys

that answer questions of the type:

• Is there a ring of size 4?

• Is at least one F, Br, Cl, or I present?

where the answer is either

TRUE (1) or FALSE (0)

36 CBS, Department of Systems Biology

Tanimoto Similarity

Tc c

ab c

9

109 90.9

or 90% similarity

37 CBS, Department of Systems Biology

Similarity Search

38 CBS, Department of Systems Biology

Questions?

39 CBS, Department of Systems Biology

Molecular editors and viewers

http://www.chemaxon.com/products/marvin/

40 CBS, Department of Systems Biology

http://jmol.sourceforge.net/

Molecular editors and viewers

41 CBS, Department of Systems Biology

Format conversion

http://cactus.nci.nih.gov/translate/