Molecular similarity searching methods, seminar

26
Molecular similarity searching methods in drug discovery A Presentation in advanced graphical engineering systems seminar 2011/2012 By: Haytham Hijazi Advisor: Univ-Prof. Hon-Prof. Dr. Dieter Roller 1

description

Here we present a new method of classifying the similar molecules using

Transcript of Molecular similarity searching methods, seminar

Page 1: Molecular similarity searching methods, seminar

Molecular similarity searching methods in drug discovery

A Presentation in advanced graphical engineering systems seminar 2011/2012

By: Haytham Hijazi

Advisor: Univ-Prof. Hon-Prof. Dr. Dieter Roller

Page 2: Molecular similarity searching methods, seminar

Molecular similarity searching methods in drug discovery

A Presentation in advanced graphical engineering systems seminar 2011/2012

By: Haytham Hijazi

Advisor: Univ-Prof. Hon-Prof. Dr. Dieter Roller

In this work, I propose a contribution to the field of “Cheminformatic”.Cheminformatic means solving chemical problems using computational methods[1].

James Rhodes, Stephen Boyer1, Jeffrey Kreulen, Ying Chen, Patricia Ordonez, “Mining patents using molecular similarity search”, IBM, Almaden Services Research, Pacific Symposium on Biocomputing 12:304-315(2007).

Page 3: Molecular similarity searching methods, seminar

A Presentation in advanced graphical engineering systems seminar 2011/2012

Agenda• The main question in this research

• The principle of similarity

• Drug discovery as an application

• Research problem

• Molecular representations (1D, 2D…)

• Searching the similarity

• Similarity coefficients calculations

• The probabilistic model (BIM)

• The contribution (MDC)

• Experiments, conclusions and discussion

Page 4: Molecular similarity searching methods, seminar

Shape Colour

Size Pattern

“The similarity is in the eye of the beholder”

Can we claim?

Page 5: Molecular similarity searching methods, seminar

Question: Which molecules in a database are similar to the query molecule?

Application: •better compounds than initial lead compound (Drug discovery)•Property prediction of unknown compound.

The main question

Page 6: Molecular similarity searching methods, seminar

Structurally similar molecules are assumed to have similar biological properties.

Similar biological propritiesdrug discovery.

In our context…the principle

1. Sylvaine Roy and Laurence Lafanechère, “Chemogenomics and Chemical Genetics: A User's Introduction for Biologists, Chemists and Informaticians”, Molecular similarity, Springer Berlin, ISBN 978-3-642-19614-0, 1st Edition. 17.06.2011

[1]

Page 7: Molecular similarity searching methods, seminar

7

Problems

Claim: General manufacturing problems!

Page 8: Molecular similarity searching methods, seminar

8

The Map

Molecule represntation

Feature selection

Similarity coefficients

calculations and ranking for search

Page 9: Molecular similarity searching methods, seminar

Historical progression◦ Complete structure◦ Sub-Structure

Descriptors◦ 1D (psychophysical properties), 2D, 3D, and 4D

Connectivity tables and graph theory!

Molecular representation

Image Source: Karine Audouze, “Representation of molecular structures and structural diversity”, ChemoInformatics in Drug Discovery, 2009.

Page 10: Molecular similarity searching methods, seminar

2D structure, line notation

CC(=O)OC1=CC=CC=C1C(=O)OCCCC1=NN(C2=C1NC(=NC2=O)C3=C(C=CC(=C3)S(=O)(=O)N4CCN(CC4)C)OCC)C

SMILES – Simplified Molecular Line Entry System

SMILES

Source: Karine Audouze, “Representation of molecular structures and structural diversity”, ChemoInformatics in Drug Discovery, 2009.

Page 11: Molecular similarity searching methods, seminar

A fingerprint is a vector encoding the presence (‘1’) or absence (‘0’) of FRAGMENT substructures in a molecule

Dictionary based or and hash based fingerprints

2D Fingerprints - Structural key

Descriptor Fragment

1 AR

2 CCCCN

3 Me

9 NH2

2. Source: Karine Audouze, “Representation of molecular structures and structural diversity”, ChemoInformatics in Drug Discovery, 2009.

[1] [2]

Page 12: Molecular similarity searching methods, seminar

3D-fingerprint-topology In 3D keys the position of each bit

corresponds to a certain range of distances or angels.

Computationally complex

Source: Karine Audouze, “Representation of molecular structures and structural diversity”, ChemoInformatics in Drug Discovery, 2009.

Page 13: Molecular similarity searching methods, seminar

13

The Map

Similarity coefficients

calculations and ranking for search

Molecule represntation

Feature selection

Page 14: Molecular similarity searching methods, seminar

Exact structure search Structure search

Substructure search

Similarity searching: maximal common sub graph isomorphism, Tanimoto/Dice/Cosine coefficients

Searching the similarity

Page 15: Molecular similarity searching methods, seminar

The similarity measure (coefficient) is a quantitative measure of similarity

Used to rank the results of the query

Results are ordered decreasingly

Searching the similarity

Distance coefficients. Probabilistic coefficients. Correlation coefficients. Association coefficients.

Page 16: Molecular similarity searching methods, seminar

Associative

Simple matching coefficient (c+d)/(a+b-c+d)

Jaccard measure (Tanimoto) c/(a+b-c) =AND/OR

Cosine, Ochiai c/√(a+b)(c+d)

Dice c/.5[(a+c)+(b+c)] and 2c/a+b

Distance

Hamming distance a+b-2c

Euclidean distance √a+b-2c

Soregel distance a+b-2c/a+b-c

Other coefficients

Pattern difference ab/(a+b c+d)2

Size (a-b)2/(a+b+c+d)2

More coefficients !

Naomie Salim, “The study of probability model for compound similarity searching”, UTM Research Management Centre Project Vote – 75207, University of Malaysia, 2009

Page 17: Molecular similarity searching methods, seminar

Assume we generate the fingerprint fragment based bits

Molecule A:00010100010101000101010011110100

Molecule B:00000000100101001001000011100000

Tanimoto coefficient = Where c=A AND B

Tanimoto=6/(13+8)-6=0.4

Example

( )

c

a b c

ba c

Page 18: Molecular similarity searching methods, seminar

Associate the relevance of a structure to an explicit feature

pi=probability that bit bi appears in an active structure. qi=probability that bit bi appears in an inactive structure αi represents a binary selector. If αi=1 means the bit occurs in the structure, else it is 0 and negated. P (A|S) is the probability of an active structure given S. P (NA|S) is the probability of an inactive structure given S. P(A) is the probability of ACTIVEs P(NA) is the probability of INACTIVES

A probabilistic model (BIM)

Naomie Salim, “The study of probability model for compound similarity searching”, UTM Research Management Centre Project Vote – 75207, University of Malaysia, 2009

Page 19: Molecular similarity searching methods, seminar

19

Problems again

Claim: General manufacturing problems !

Page 20: Molecular similarity searching methods, seminar

20

My proposed hybrid search design Molecular Dynamic Classification method (MDC)

Active compounds DatabaseClass 1

Class 2

Class n

Molecular dynamic

simulating tool

Psychophysical properties

Classification Algorithm

Voting

Page 21: Molecular similarity searching methods, seminar

Better insight about the similarity in terms of bioactivity, toxicity, reactivity...(+)

The time of searching (+)

Prediction and voting possibilities (+)

Cost of simulation tools (-)

Classification errors (-)

MDC discussion

Page 22: Molecular similarity searching methods, seminar

Materials Explorer

Itemtracker -Freezer/Cryogen sample tracking system

CHARMM

MDynaMix

Simulation tools

Page 23: Molecular similarity searching methods, seminar

Fingerprint time generation experiment

Data source: simulating tool indicated in the report [17]

Consider if we have more than 1000 bits!

45

67

8

0

5

10

15

20

25

30

2 bits

3 bits

4 bits

Fingerprint time gneration

2 bits3 bits4 bits

Max path.length

Time (Ms)

Page 24: Molecular similarity searching methods, seminar

Hit rate expirement

0 500 1000 1500 2000 25000

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

Hit rate

Hit rate

Selection Size

Hit

Rate

Data source: simulating tool indicated in the report [17]

The more we increase the size of features, the more the hit rate of finding actives decreaes.

Page 25: Molecular similarity searching methods, seminar

Even fingerprint fragment based is time consuming

Probabilistic models and machine learning introduced substantial changes

Mixing more than type of descriptors seems efficient i.e. Time and results quality

Still need to have experimental results

General evaluation and conclusions

Page 26: Molecular similarity searching methods, seminar

Molecular similarity searching methods in drug discovery

A Presentation to the advanced graphical engineering systems seminar 2011/2012

Thanks for your listening

Haytham Hijazi