Post on 11-Jan-2016
description
Use of Machine Learning in Chemoinformatics
Irene KouskoumvekakiAssociate Professor
December 12th, 2012Biological Sequence Analysis course
2 CBS, Department of Systems Biology
Major Aspects of Chemoinformatics
•Databases: Development of databases for storage and retrieval of small molecule structures and their properties.
•Machine learning: Training of Decision Trees, Neural Networks, Self Organizing Maps, etc. on molecular data.
•Predictions: Molecular properties relevant to drugs, virtual screening of chemical libraries, system chemical biology networks…
3 CBS, Department of Systems Biology
Machine Learning
4 CBS, Department of Systems Biology
5 CBS, Department of Systems Biology
6 CBS, Department of Systems Biology
7 CBS, Department of Systems Biology
8 CBS, Department of Systems Biology
9 CBS, Department of Systems Biology
10 CBS, Department of Systems Biology
11 CBS, Department of Systems Biology
12 CBS, Department of Systems Biology
13 CBS, Department of Systems Biology
14 CBS, Department of Systems Biology
15 CBS, Department of Systems Biology
16 CBS, Department of Systems Biology
17 CBS, Department of Systems Biology
18 CBS, Department of Systems Biology
Machine learning classifiers
19 CBS, Department of Systems Biology
Clustering: Self Organizing Maps
Distinguishing molecules of different biological activities and finding a new lead structure
20 CBS, Department of Systems Biology
Clustering: Self Organizing Maps
Distinguishing molecules of different biological activities and finding a new lead structure
21 CBS, Department of Systems Biology
Clustering: Self Organizing Maps
Distinguishing molecules of different biological activities and finding a new lead structure
22 CBS, Department of Systems Biology
Clustering: Self Organizing Maps
Distinguishing molecules of different biological activities and finding a new lead structure
23 CBS, Department of Systems Biology
Machine Learning
24 CBS, Department of Systems Biology
Machine Learning
Molecular
StructuresProperties
Molecular Descriptors
QSAR
Virtual Screening
Clustering
Classification
25 CBS, Department of Systems Biology
Different descriptor types
• Simple feature counts (such as number of rotatable bonds or molecular weight)
• Fragmental descriptors which indicate the presence or absence (or count) of groups of atoms and substructures
• Physicochemical properties (density, solubility, vdWaals volume)
• Topological indices (size, branching, overall shape)
26 CBS, Department of Systems Biology
Major Aspects of Chemoinformatics
•Databases: Development of databases for storage and retrieval of small molecule structures and their properties.
•Machine learning: Training of Decision Trees, Neural Networks, Self Organizing Maps, etc. on molecular data.
•Predictions: Molecular properties relevant to drugs, virtual screening of chemical libraries, system chemical biology networks…
27 CBS, Department of Systems Biology
In QSAR models structural parameters (descriptors) are fitted to experimental data for biological activity (or another given property, P)
Quantitative Structure-Activity Relationships (QSAR)
28 CBS, Department of Systems Biology
Prediction of Solubility, ADME & Toxicity
29 CBS, Department of Systems Biology
hERG Classification with SVM
30 CBS, Department of Systems Biology
Evaluation of the data set
31 CBS, Department of Systems Biology
Performance of SVM
32 CBS, Department of Systems Biology
Performance of SVM
33 CBS, Department of Systems Biology
Virtual screening
34 CBS, Department of Systems Biology
Similarity Search
• Similar Property Principle – Molecules having similar structures and properties are expected to exhibit similar biological activity.
• Thus, molecules that are located closely together in the chemical space are often considered to be functionally related.
35 CBS, Department of Systems Biology
Fingerprints-based Similarity Search
– widely used similarity search tool– consists of descriptors encoded as bit strings– Bit strings of query and database are compared using
similarity metric such as Tanimoto coefficient
MACCS fingerprints: 166 structural keys
that answer questions of the type:
• Is there a ring of size 4?
• Is at least one F, Br, Cl, or I present?
where the answer is either
TRUE (1) or FALSE (0)
36 CBS, Department of Systems Biology
Tanimoto Similarity
Tc c
ab c
9
109 90.9
or 90% similarity
37 CBS, Department of Systems Biology
Similarity Search
38 CBS, Department of Systems Biology
Questions?
39 CBS, Department of Systems Biology
Molecular editors and viewers
http://www.chemaxon.com/products/marvin/
40 CBS, Department of Systems Biology
http://jmol.sourceforge.net/
Molecular editors and viewers
41 CBS, Department of Systems Biology
Format conversion
http://cactus.nci.nih.gov/translate/