Use of Machine Learning in Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th,...
-
Upload
christopher-bell -
Category
Documents
-
view
218 -
download
0
Transcript of Use of Machine Learning in Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th,...
Use of Machine Learning in Chemoinformatics
Irene KouskoumvekakiAssociate Professor
December 12th, 2012Biological Sequence Analysis course
2 CBS, Department of Systems Biology
Major Aspects of Chemoinformatics
•Databases: Development of databases for storage and retrieval of small molecule structures and their properties.
•Machine learning: Training of Decision Trees, Neural Networks, Self Organizing Maps, etc. on molecular data.
•Predictions: Molecular properties relevant to drugs, virtual screening of chemical libraries, system chemical biology networks…
19 CBS, Department of Systems Biology
Clustering: Self Organizing Maps
Distinguishing molecules of different biological activities and finding a new lead structure
20 CBS, Department of Systems Biology
Clustering: Self Organizing Maps
Distinguishing molecules of different biological activities and finding a new lead structure
21 CBS, Department of Systems Biology
Clustering: Self Organizing Maps
Distinguishing molecules of different biological activities and finding a new lead structure
22 CBS, Department of Systems Biology
Clustering: Self Organizing Maps
Distinguishing molecules of different biological activities and finding a new lead structure
24 CBS, Department of Systems Biology
Machine Learning
Molecular
StructuresProperties
Molecular Descriptors
QSAR
Virtual Screening
Clustering
Classification
25 CBS, Department of Systems Biology
Different descriptor types
• Simple feature counts (such as number of rotatable bonds or molecular weight)
• Fragmental descriptors which indicate the presence or absence (or count) of groups of atoms and substructures
• Physicochemical properties (density, solubility, vdWaals volume)
• Topological indices (size, branching, overall shape)
26 CBS, Department of Systems Biology
Major Aspects of Chemoinformatics
•Databases: Development of databases for storage and retrieval of small molecule structures and their properties.
•Machine learning: Training of Decision Trees, Neural Networks, Self Organizing Maps, etc. on molecular data.
•Predictions: Molecular properties relevant to drugs, virtual screening of chemical libraries, system chemical biology networks…
27 CBS, Department of Systems Biology
In QSAR models structural parameters (descriptors) are fitted to experimental data for biological activity (or another given property, P)
Quantitative Structure-Activity Relationships (QSAR)
34 CBS, Department of Systems Biology
Similarity Search
• Similar Property Principle – Molecules having similar structures and properties are expected to exhibit similar biological activity.
• Thus, molecules that are located closely together in the chemical space are often considered to be functionally related.
35 CBS, Department of Systems Biology
Fingerprints-based Similarity Search
– widely used similarity search tool– consists of descriptors encoded as bit strings– Bit strings of query and database are compared using
similarity metric such as Tanimoto coefficient
MACCS fingerprints: 166 structural keys
that answer questions of the type:
• Is there a ring of size 4?
• Is at least one F, Br, Cl, or I present?
where the answer is either
TRUE (1) or FALSE (0)
39 CBS, Department of Systems Biology
Molecular editors and viewers
http://www.chemaxon.com/products/marvin/