Use of Machine Learning in Chemoinformatics

41
Use of Machine Learning in Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course

description

Use of Machine Learning in Chemoinformatics. Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course. Major Aspects of Chemoinformatics. Databases: Development of databases for storage and retrieval of small molecule structures and their properties. - PowerPoint PPT Presentation

Transcript of Use of Machine Learning in Chemoinformatics

Page 1: Use of Machine Learning in Chemoinformatics

Use of Machine Learning in Chemoinformatics

Irene KouskoumvekakiAssociate Professor

December 12th, 2012Biological Sequence Analysis course

Page 2: Use of Machine Learning in Chemoinformatics

2 CBS, Department of Systems Biology

Major Aspects of Chemoinformatics

•Databases: Development of databases for storage and retrieval of small molecule structures and their properties.

•Machine learning: Training of Decision Trees, Neural Networks, Self Organizing Maps, etc. on molecular data.

•Predictions: Molecular properties relevant to drugs, virtual screening of chemical libraries, system chemical biology networks…

Page 3: Use of Machine Learning in Chemoinformatics

3 CBS, Department of Systems Biology

Machine Learning

Page 4: Use of Machine Learning in Chemoinformatics

4 CBS, Department of Systems Biology

Page 5: Use of Machine Learning in Chemoinformatics

5 CBS, Department of Systems Biology

Page 6: Use of Machine Learning in Chemoinformatics

6 CBS, Department of Systems Biology

Page 7: Use of Machine Learning in Chemoinformatics

7 CBS, Department of Systems Biology

Page 8: Use of Machine Learning in Chemoinformatics

8 CBS, Department of Systems Biology

Page 9: Use of Machine Learning in Chemoinformatics

9 CBS, Department of Systems Biology

Page 10: Use of Machine Learning in Chemoinformatics

10 CBS, Department of Systems Biology

Page 11: Use of Machine Learning in Chemoinformatics

11 CBS, Department of Systems Biology

Page 12: Use of Machine Learning in Chemoinformatics

12 CBS, Department of Systems Biology

Page 13: Use of Machine Learning in Chemoinformatics

13 CBS, Department of Systems Biology

Page 14: Use of Machine Learning in Chemoinformatics

14 CBS, Department of Systems Biology

Page 15: Use of Machine Learning in Chemoinformatics

15 CBS, Department of Systems Biology

Page 16: Use of Machine Learning in Chemoinformatics

16 CBS, Department of Systems Biology

Page 17: Use of Machine Learning in Chemoinformatics

17 CBS, Department of Systems Biology

Page 18: Use of Machine Learning in Chemoinformatics

18 CBS, Department of Systems Biology

Machine learning classifiers

Page 19: Use of Machine Learning in Chemoinformatics

19 CBS, Department of Systems Biology

Clustering: Self Organizing Maps

Distinguishing molecules of different biological activities and finding a new lead structure

Page 20: Use of Machine Learning in Chemoinformatics

20 CBS, Department of Systems Biology

Clustering: Self Organizing Maps

Distinguishing molecules of different biological activities and finding a new lead structure

Page 21: Use of Machine Learning in Chemoinformatics

21 CBS, Department of Systems Biology

Clustering: Self Organizing Maps

Distinguishing molecules of different biological activities and finding a new lead structure

Page 22: Use of Machine Learning in Chemoinformatics

22 CBS, Department of Systems Biology

Clustering: Self Organizing Maps

Distinguishing molecules of different biological activities and finding a new lead structure

Page 23: Use of Machine Learning in Chemoinformatics

23 CBS, Department of Systems Biology

Machine Learning

Page 24: Use of Machine Learning in Chemoinformatics

24 CBS, Department of Systems Biology

Machine Learning

Molecular

StructuresProperties

Molecular Descriptors

QSAR

Virtual Screening

Clustering

Classification

Page 25: Use of Machine Learning in Chemoinformatics

25 CBS, Department of Systems Biology

Different descriptor types

• Simple feature counts (such as number of rotatable bonds or molecular weight)

• Fragmental descriptors which indicate the presence or absence (or count) of groups of atoms and substructures

• Physicochemical properties (density, solubility, vdWaals volume)

• Topological indices (size, branching, overall shape)

Page 26: Use of Machine Learning in Chemoinformatics

26 CBS, Department of Systems Biology

Major Aspects of Chemoinformatics

•Databases: Development of databases for storage and retrieval of small molecule structures and their properties.

•Machine learning: Training of Decision Trees, Neural Networks, Self Organizing Maps, etc. on molecular data.

•Predictions: Molecular properties relevant to drugs, virtual screening of chemical libraries, system chemical biology networks…

Page 27: Use of Machine Learning in Chemoinformatics

27 CBS, Department of Systems Biology

In QSAR models structural parameters (descriptors) are fitted to experimental data for biological activity (or another given property, P)

Quantitative Structure-Activity Relationships (QSAR)

Page 28: Use of Machine Learning in Chemoinformatics

28 CBS, Department of Systems Biology

Prediction of Solubility, ADME & Toxicity

Page 29: Use of Machine Learning in Chemoinformatics

29 CBS, Department of Systems Biology

hERG Classification with SVM

Page 30: Use of Machine Learning in Chemoinformatics

30 CBS, Department of Systems Biology

Evaluation of the data set

Page 31: Use of Machine Learning in Chemoinformatics

31 CBS, Department of Systems Biology

Performance of SVM

Page 32: Use of Machine Learning in Chemoinformatics

32 CBS, Department of Systems Biology

Performance of SVM

Page 33: Use of Machine Learning in Chemoinformatics

33 CBS, Department of Systems Biology

Virtual screening

Page 34: Use of Machine Learning in Chemoinformatics

34 CBS, Department of Systems Biology

Similarity Search

• Similar Property Principle – Molecules having similar structures and properties are expected to exhibit similar biological activity.

• Thus, molecules that are located closely together in the chemical space are often considered to be functionally related.

Page 35: Use of Machine Learning in Chemoinformatics

35 CBS, Department of Systems Biology

Fingerprints-based Similarity Search

– widely used similarity search tool– consists of descriptors encoded as bit strings– Bit strings of query and database are compared using

similarity metric such as Tanimoto coefficient

MACCS fingerprints: 166 structural keys

that answer questions of the type:

• Is there a ring of size 4?

• Is at least one F, Br, Cl, or I present?

where the answer is either

TRUE (1) or FALSE (0)

Page 36: Use of Machine Learning in Chemoinformatics

36 CBS, Department of Systems Biology

Tanimoto Similarity

Tc c

ab c

9

109 90.9

or 90% similarity

Page 37: Use of Machine Learning in Chemoinformatics

37 CBS, Department of Systems Biology

Similarity Search

Page 38: Use of Machine Learning in Chemoinformatics

38 CBS, Department of Systems Biology

Questions?

Page 39: Use of Machine Learning in Chemoinformatics

39 CBS, Department of Systems Biology

Molecular editors and viewers

http://www.chemaxon.com/products/marvin/

Page 40: Use of Machine Learning in Chemoinformatics

40 CBS, Department of Systems Biology

http://jmol.sourceforge.net/

Molecular editors and viewers

Page 41: Use of Machine Learning in Chemoinformatics

41 CBS, Department of Systems Biology

Format conversion

http://cactus.nci.nih.gov/translate/