IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Ultrafast Shape Recognition
for Similarity Search in Molecular Databases
Pedro J. Ballester
NFCR Centre for Computational Drug Discovery
University of Oxford
Pedro J. Ballester USR for Similarity Search 1
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Outline
1 Introduction
2 Ultrafast Shape Recognition
Foundations
Encoding
Comparing Molecular Shapes
Effectiveness
Efficiency
3 Ligand-based Virtual Screening
Experimental Setup
Enrichment Plots
USR virtual query
4 Future Work
5 Conclusions
Pedro J. Ballester USR for Similarity Search 2
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Introduction
Pedro J. Ballester USR for Similarity Search 3
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Virtual Screening
Ligand-based Virtual Screening
Goal: Identifying drug-like molecules likely to bebiologically active.
Principle: Molecules with similar patterns are likely to havesimilar biological activity.
Template: e.g. a molecule of known biological activity.
Strategy: Search a database of molecules for those with apattern similar to that of the template.
Pedro J. Ballester USR for Similarity Search 4
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Virtual Screening
Ligand-based Virtual Screening
Goal: Identifying drug-like molecules likely to bebiologically active.
Principle: Molecules with similar patterns are likely to havesimilar biological activity.
Template: e.g. a molecule of known biological activity.
Strategy: Search a database of molecules for those with apattern similar to that of the template.
Pedro J. Ballester USR for Similarity Search 4
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Virtual Screening
Ligand-based Virtual Screening
Goal: Identifying drug-like molecules likely to bebiologically active.
Principle: Molecules with similar patterns are likely to havesimilar biological activity.
Template: e.g. a molecule of known biological activity.
Strategy: Search a database of molecules for those with apattern similar to that of the template.
Pedro J. Ballester USR for Similarity Search 4
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Virtual Screening
Ligand-based Virtual Screening
Goal: Identifying drug-like molecules likely to bebiologically active.
Principle: Molecules with similar patterns are likely to havesimilar biological activity.
Template: e.g. a molecule of known biological activity.
Strategy: Search a database of molecules for those with apattern similar to that of the template.
Pedro J. Ballester USR for Similarity Search 4
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Virtual Screening
Ligand-based Virtual Screening
Goal: Identifying drug-like molecules likely to bebiologically active.
Principle: Molecules with similar patterns are likely to havesimilar biological activity.
Template: e.g. a molecule of known biological activity.
Strategy: Search a database of molecules for those with apattern similar to that of the template.
Pedro J. Ballester USR for Similarity Search 4
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Molecular Shape Comparison
Molecular Shape Comparison
Molecular shape has been widely highlighted as animportant pattern for which to search.
Shape complementarity between ligand and receptor isnecessary for binding.
Additional advantage: chemical structure is not specifiedand therefore novel chemical scaffolds may be found.
Pedro J. Ballester USR for Similarity Search 5
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Molecular Shape Comparison
Molecular Shape Comparison
Molecular shape has been widely highlighted as animportant pattern for which to search.
Shape complementarity between ligand and receptor isnecessary for binding.
Additional advantage: chemical structure is not specifiedand therefore novel chemical scaffolds may be found.
Pedro J. Ballester USR for Similarity Search 5
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Molecular Shape Comparison
Molecular Shape Comparison
Molecular shape has been widely highlighted as animportant pattern for which to search.
Shape complementarity between ligand and receptor isnecessary for binding.
Additional advantage: chemical structure is not specifiedand therefore novel chemical scaffolds may be found.
Pedro J. Ballester USR for Similarity Search 5
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Molecular Shape Comparison
Molecular Shape Comparison
Molecular shape has been widely highlighted as animportant pattern for which to search.
Shape complementarity between ligand and receptor isnecessary for binding.
Additional advantage: chemical structure is not specifiedand therefore novel chemical scaffolds may be found.
Pedro J. Ballester USR for Similarity Search 5
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Challenges
Alignment
Some methods require alignment of the molecules beforecomparing their shapes.
Essentially: a multimodal optimisation problem with verylimited number of objective function evaluations available.
May lead to suboptimal molecular alignment and thuserrors in the comparison.
Pedro J. Ballester USR for Similarity Search 6
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Challenges
Alignment
Some methods require alignment of the molecules beforecomparing their shapes.
Essentially: a multimodal optimisation problem with verylimited number of objective function evaluations available.
May lead to suboptimal molecular alignment and thuserrors in the comparison.
Pedro J. Ballester USR for Similarity Search 6
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Challenges
Alignment
Some methods require alignment of the molecules beforecomparing their shapes.
Essentially: a multimodal optimisation problem with verylimited number of objective function evaluations available.
May lead to suboptimal molecular alignment and thuserrors in the comparison.
Pedro J. Ballester USR for Similarity Search 6
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Challenges
Alignment
Some methods require alignment of the molecules beforecomparing their shapes.
Essentially: a multimodal optimisation problem with verylimited number of objective function evaluations available.
May lead to suboptimal molecular alignment and thuserrors in the comparison.
Pedro J. Ballester USR for Similarity Search 6
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Challenges
Efficiency
Shape information regarded as difficult to encode efficientlyand use in database searching (e.g. Zauhar et al. 2003).
Increasing size of molecular databases poses a seriouslimitation for current shape comparison methods.
The more conformations, the less likely to miss moleculesthat can adopt the template’s shape.The more compounds, the more likely to find innovativebioactive molecules.
Consequently, the speed of molecular shape comparisonmethods is highly important.
Pedro J. Ballester USR for Similarity Search 7
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Challenges
Efficiency
Shape information regarded as difficult to encode efficientlyand use in database searching (e.g. Zauhar et al. 2003).
Increasing size of molecular databases poses a seriouslimitation for current shape comparison methods.
The more conformations, the less likely to miss moleculesthat can adopt the template’s shape.The more compounds, the more likely to find innovativebioactive molecules.
Consequently, the speed of molecular shape comparisonmethods is highly important.
Pedro J. Ballester USR for Similarity Search 7
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Challenges
Efficiency
Shape information regarded as difficult to encode efficientlyand use in database searching (e.g. Zauhar et al. 2003).
Increasing size of molecular databases poses a seriouslimitation for current shape comparison methods.
The more conformations, the less likely to miss moleculesthat can adopt the template’s shape.The more compounds, the more likely to find innovativebioactive molecules.
Consequently, the speed of molecular shape comparisonmethods is highly important.
Pedro J. Ballester USR for Similarity Search 7
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Challenges
Efficiency
Shape information regarded as difficult to encode efficientlyand use in database searching (e.g. Zauhar et al. 2003).
Increasing size of molecular databases poses a seriouslimitation for current shape comparison methods.
The more conformations, the less likely to miss moleculesthat can adopt the template’s shape.The more compounds, the more likely to find innovativebioactive molecules.
Consequently, the speed of molecular shape comparisonmethods is highly important.
Pedro J. Ballester USR for Similarity Search 7
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Challenges
Efficiency
Shape information regarded as difficult to encode efficientlyand use in database searching (e.g. Zauhar et al. 2003).
Increasing size of molecular databases poses a seriouslimitation for current shape comparison methods.
The more conformations, the less likely to miss moleculesthat can adopt the template’s shape.The more compounds, the more likely to find innovativebioactive molecules.
Consequently, the speed of molecular shape comparisonmethods is highly important.
Pedro J. Ballester USR for Similarity Search 7
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Challenges
Efficiency
Shape information regarded as difficult to encode efficientlyand use in database searching (e.g. Zauhar et al. 2003).
Increasing size of molecular databases poses a seriouslimitation for current shape comparison methods.
The more conformations, the less likely to miss moleculesthat can adopt the template’s shape.The more compounds, the more likely to find innovativebioactive molecules.
Consequently, the speed of molecular shape comparisonmethods is highly important.
Pedro J. Ballester USR for Similarity Search 7
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency
Ultrafast Shape Recognition (USR)
Ballester, P.J., US Patent Application filed on 25 May 2007
Ballester, P.J. and Richards, W.G. (2007) J Comput Chem
Ballester, P.J. and Richards, W.G. (2007) Proc R Soc A
Pedro J. Ballester USR for Similarity Search 8
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency
Foundations
USR is based on the observation that the shape of a molecule isuniquely determined by the relative position of its atoms.
Such positions are in turn determined by the set of allinter-atomic distances.
No need for alignment or translation of the molecule, as this setof distances is independent of molecular orientation or position.
Pedro J. Ballester USR for Similarity Search 9
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency
Foundations
USR is based on the observation that the shape of a molecule isuniquely determined by the relative position of its atoms.
Such positions are in turn determined by the set of allinter-atomic distances.
No need for alignment or translation of the molecule, as this setof distances is independent of molecular orientation or position.
Pedro J. Ballester USR for Similarity Search 9
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency
Foundations
USR is based on the observation that the shape of a molecule isuniquely determined by the relative position of its atoms.
Such positions are in turn determined by the set of allinter-atomic distances.
No need for alignment or translation of the molecule, as this setof distances is independent of molecular orientation or position.
Pedro J. Ballester USR for Similarity Search 9
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency
Foundations
USR is based on the observation that the shape of a molecule isuniquely determined by the relative position of its atoms.
Such positions are in turn determined by the set of allinter-atomic distances.
No need for alignment or translation of the molecule, as this setof distances is independent of molecular orientation or position.
Pedro J. Ballester USR for Similarity Search 9
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency
Foundations
USR is based on the observation that the shape of a molecule isuniquely determined by the relative position of its atoms.
Such positions are in turn determined by the set of allinter-atomic distances.
No need for alignment or translation of the molecule, as this setof distances is independent of molecular orientation or position.
Pedro J. Ballester USR for Similarity Search 9
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency
Further Considerations
Furthermore, values of inter-atomic distances are heavilyconstrained:
Distances between bound atoms strongly depends on whichare these atoms.Other inter-atomic distances depends on the flexibility ofthe molecule.
The set of all inter-atomic distances may contain moreinformation than needed for accurate description of shape.
Strategy: encoding shape from a subset of these distances.
Pedro J. Ballester USR for Similarity Search 10
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency
Further Considerations
Furthermore, values of inter-atomic distances are heavilyconstrained:
Distances between bound atoms strongly depends on whichare these atoms.Other inter-atomic distances depends on the flexibility ofthe molecule.
The set of all inter-atomic distances may contain moreinformation than needed for accurate description of shape.
Strategy: encoding shape from a subset of these distances.
Pedro J. Ballester USR for Similarity Search 10
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency
Further Considerations
Furthermore, values of inter-atomic distances are heavilyconstrained:
Distances between bound atoms strongly depends on whichare these atoms.Other inter-atomic distances depends on the flexibility ofthe molecule.
The set of all inter-atomic distances may contain moreinformation than needed for accurate description of shape.
Strategy: encoding shape from a subset of these distances.
Pedro J. Ballester USR for Similarity Search 10
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency
Further Considerations
Furthermore, values of inter-atomic distances are heavilyconstrained:
Distances between bound atoms strongly depends on whichare these atoms.Other inter-atomic distances depends on the flexibility ofthe molecule.
The set of all inter-atomic distances may contain moreinformation than needed for accurate description of shape.
Strategy: encoding shape from a subset of these distances.
Pedro J. Ballester USR for Similarity Search 10
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency
Further Considerations
Furthermore, values of inter-atomic distances are heavilyconstrained:
Distances between bound atoms strongly depends on whichare these atoms.Other inter-atomic distances depends on the flexibility ofthe molecule.
The set of all inter-atomic distances may contain moreinformation than needed for accurate description of shape.
Strategy: encoding shape from a subset of these distances.
Pedro J. Ballester USR for Similarity Search 10
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency
Further Considerations
Furthermore, values of inter-atomic distances are heavilyconstrained:
Distances between bound atoms strongly depends on whichare these atoms.Other inter-atomic distances depends on the flexibility ofthe molecule.
The set of all inter-atomic distances may contain moreinformation than needed for accurate description of shape.
Strategy: encoding shape from a subset of these distances.
Pedro J. Ballester USR for Similarity Search 10
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency
Representation of molecular shape
Reference Locations
Distances from two close atoms are similar and thus containsimilar information →
→ consider sets of atomic distances from reference locationswhich are far from each other.
Four reference locations: ctd, cst, fct and ftf.
Each conformer is represented now by 4N distances.
Pedro J. Ballester USR for Similarity Search 11
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency
Representation of molecular shape
Reference Locations
Distances from two close atoms are similar and thus containsimilar information →
→ consider sets of atomic distances from reference locationswhich are far from each other.
Four reference locations: ctd, cst, fct and ftf.
Each conformer is represented now by 4N distances.
Pedro J. Ballester USR for Similarity Search 11
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency
Representation of molecular shape
Reference Locations
Distances from two close atoms are similar and thus containsimilar information →
→ consider sets of atomic distances from reference locationswhich are far from each other.
Four reference locations: ctd, cst, fct and ftf.
Each conformer is represented now by 4N distances.
Pedro J. Ballester USR for Similarity Search 11
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency
Representation of molecular shape
Reference Locations
Distances from two close atoms are similar and thus containsimilar information →
→ consider sets of atomic distances from reference locationswhich are far from each other.
Four reference locations: ctd, cst, fct and ftf.
Each conformer is represented now by 4N distances.
Pedro J. Ballester USR for Similarity Search 11
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency
Representation of molecular shape
Reference Locations
Distances from two close atoms are similar and thus containsimilar information →
→ consider sets of atomic distances from reference locationswhich are far from each other.
Four reference locations: ctd, cst, fct and ftf.
Each conformer is represented now by 4N distances.
Pedro J. Ballester USR for Similarity Search 11
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency
Encoding
Moments of atomic distance distributions
But: how do we compare molecules with different N?
Histogram of each distribution of distances has a number ofwell-known drawbacks:
Difficulty of selecting a bin size suitable for all comparedmolecules.Relatively large storage needed for the histograms.Relatively large computing cost.
A distribution is completely determined by its moments(e.g. Hall, 1983).
Idea: describe the distribution of atomic distances by its firstmoments (avoids histogram calculation).
Pedro J. Ballester USR for Similarity Search 12
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency
Encoding
Moments of atomic distance distributions
But: how do we compare molecules with different N?
Histogram of each distribution of distances has a number ofwell-known drawbacks:
Difficulty of selecting a bin size suitable for all comparedmolecules.Relatively large storage needed for the histograms.Relatively large computing cost.
A distribution is completely determined by its moments(e.g. Hall, 1983).
Idea: describe the distribution of atomic distances by its firstmoments (avoids histogram calculation).
Pedro J. Ballester USR for Similarity Search 12
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency
Encoding
Moments of atomic distance distributions
But: how do we compare molecules with different N?
Histogram of each distribution of distances has a number ofwell-known drawbacks:
Difficulty of selecting a bin size suitable for all comparedmolecules.Relatively large storage needed for the histograms.Relatively large computing cost.
A distribution is completely determined by its moments(e.g. Hall, 1983).
Idea: describe the distribution of atomic distances by its firstmoments (avoids histogram calculation).
Pedro J. Ballester USR for Similarity Search 12
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency
Encoding
Moments of atomic distance distributions
But: how do we compare molecules with different N?
Histogram of each distribution of distances has a number ofwell-known drawbacks:
Difficulty of selecting a bin size suitable for all comparedmolecules.Relatively large storage needed for the histograms.Relatively large computing cost.
A distribution is completely determined by its moments(e.g. Hall, 1983).
Idea: describe the distribution of atomic distances by its firstmoments (avoids histogram calculation).
Pedro J. Ballester USR for Similarity Search 12
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency
Encoding
Moments of atomic distance distributions
But: how do we compare molecules with different N?
Histogram of each distribution of distances has a number ofwell-known drawbacks:
Difficulty of selecting a bin size suitable for all comparedmolecules.Relatively large storage needed for the histograms.Relatively large computing cost.
A distribution is completely determined by its moments(e.g. Hall, 1983).
Idea: describe the distribution of atomic distances by its firstmoments (avoids histogram calculation).
Pedro J. Ballester USR for Similarity Search 12
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency
Encoding
Moments of atomic distance distributions
But: how do we compare molecules with different N?
Histogram of each distribution of distances has a number ofwell-known drawbacks:
Difficulty of selecting a bin size suitable for all comparedmolecules.Relatively large storage needed for the histograms.Relatively large computing cost.
A distribution is completely determined by its moments(e.g. Hall, 1983).
Idea: describe the distribution of atomic distances by its firstmoments (avoids histogram calculation).
Pedro J. Ballester USR for Similarity Search 12
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency
Encoding
Moments of atomic distance distributions
But: how do we compare molecules with different N?
Histogram of each distribution of distances has a number ofwell-known drawbacks:
Difficulty of selecting a bin size suitable for all comparedmolecules.Relatively large storage needed for the histograms.Relatively large computing cost.
A distribution is completely determined by its moments(e.g. Hall, 1983).
Idea: describe the distribution of atomic distances by its firstmoments (avoids histogram calculation).
Pedro J. Ballester USR for Similarity Search 12
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency
Encoding
Moments of atomic distance distributions
But: how do we compare molecules with different N?
Histogram of each distribution of distances has a number ofwell-known drawbacks:
Difficulty of selecting a bin size suitable for all comparedmolecules.Relatively large storage needed for the histograms.Relatively large computing cost.
A distribution is completely determined by its moments(e.g. Hall, 1983).
Idea: describe the distribution of atomic distances by its firstmoments (avoids histogram calculation).
Pedro J. Ballester USR for Similarity Search 12
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency
USR descriptors
12 descriptors: 4 reference locations x 3 first moments (i.e. mean,variance and skewness of each set of atomic distances).
Excellent compromise between effectiveness and efficiency.
Warning: if moments are poorly estimated, no reason to expectthe resulting implementation of USR to be effective!
Pedro J. Ballester USR for Similarity Search 13
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency
USR descriptors
12 descriptors: 4 reference locations x 3 first moments (i.e. mean,variance and skewness of each set of atomic distances).
Excellent compromise between effectiveness and efficiency.
Warning: if moments are poorly estimated, no reason to expectthe resulting implementation of USR to be effective!
Pedro J. Ballester USR for Similarity Search 13
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency
USR descriptors
12 descriptors: 4 reference locations x 3 first moments (i.e. mean,variance and skewness of each set of atomic distances).
Excellent compromise between effectiveness and efficiency.
Warning: if moments are poorly estimated, no reason to expectthe resulting implementation of USR to be effective!
Pedro J. Ballester USR for Similarity Search 13
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency
USR descriptors
12 descriptors: 4 reference locations x 3 first moments (i.e. mean,variance and skewness of each set of atomic distances).
Excellent compromise between effectiveness and efficiency.
Warning: if moments are poorly estimated, no reason to expectthe resulting implementation of USR to be effective!
Pedro J. Ballester USR for Similarity Search 13
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency
Molecular Shape Comparison
USR similarity score
Score to quantify the similarity between the query (q) and theith database conformer.
Sqi =1
1 + 1
12
∑12
l=1|Mq
l − M il |
Pedro J. Ballester USR for Similarity Search 14
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency
Molecular Shape Comparison
USR similarity score
Score to quantify the similarity between the query (q) and theith database conformer.
Sqi =1
1 + 1
12
∑12
l=1|Mq
l − M il |
Pedro J. Ballester USR for Similarity Search 14
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency
Molecular Shape Comparison
USR similarity score
Score to quantify the similarity between the query (q) and theith database conformer.
Sqi =1
1 + 1
12
∑12
l=1|Mq
l − M il |
Pedro J. Ballester USR for Similarity Search 14
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency
Comparing Molecular Shapes: Example 1
Pedro J. Ballester USR for Similarity Search 15
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency
Query 4 on a database with 2.5 million compounds
USR
1st 2nd 3rd 4th
1st 2nd 3rd 4th
ESshape3DPedro J. Ballester USR for Similarity Search 16
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency
Most Important Feature of USR
Efficiency
Compare USR with three state-of-the-art methods: ESshape3D,Shape Signatures and ROCS.
Precalculating descriptors for a database (only once; in s/c):
USR (1.18 · 10−4 s/c; Intel Core2 2.0GHz).ESshape3D (9.15 · 10−4 s/c; Intel Core2 2.0GHz).Shape Signatures (50.82 s/c; 450MHz Intel Pentium III).ROCS (none).
As many queries are carried out, the important performancemeasure is how many conformers can be compared per second.
Pedro J. Ballester USR for Similarity Search 17
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency
Most Important Feature of USR
Efficiency
Compare USR with three state-of-the-art methods: ESshape3D,Shape Signatures and ROCS.
Precalculating descriptors for a database (only once; in s/c):
USR (1.18 · 10−4 s/c; Intel Core2 2.0GHz).ESshape3D (9.15 · 10−4 s/c; Intel Core2 2.0GHz).Shape Signatures (50.82 s/c; 450MHz Intel Pentium III).ROCS (none).
As many queries are carried out, the important performancemeasure is how many conformers can be compared per second.
Pedro J. Ballester USR for Similarity Search 17
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency
Most Important Feature of USR
Efficiency
Compare USR with three state-of-the-art methods: ESshape3D,Shape Signatures and ROCS.
Precalculating descriptors for a database (only once; in s/c):
USR (1.18 · 10−4 s/c; Intel Core2 2.0GHz).ESshape3D (9.15 · 10−4 s/c; Intel Core2 2.0GHz).Shape Signatures (50.82 s/c; 450MHz Intel Pentium III).ROCS (none).
As many queries are carried out, the important performancemeasure is how many conformers can be compared per second.
Pedro J. Ballester USR for Similarity Search 17
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency
Most Important Feature of USR
Efficiency
Compare USR with three state-of-the-art methods: ESshape3D,Shape Signatures and ROCS.
Precalculating descriptors for a database (only once; in s/c):
USR (1.18 · 10−4 s/c; Intel Core2 2.0GHz).ESshape3D (9.15 · 10−4 s/c; Intel Core2 2.0GHz).Shape Signatures (50.82 s/c; 450MHz Intel Pentium III).ROCS (none).
As many queries are carried out, the important performancemeasure is how many conformers can be compared per second.
Pedro J. Ballester USR for Similarity Search 17
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency
Most Important Feature of USR
Efficiency
Compare USR with three state-of-the-art methods: ESshape3D,Shape Signatures and ROCS.
Precalculating descriptors for a database (only once; in s/c):
USR (1.18 · 10−4 s/c; Intel Core2 2.0GHz).ESshape3D (9.15 · 10−4 s/c; Intel Core2 2.0GHz).Shape Signatures (50.82 s/c; 450MHz Intel Pentium III).ROCS (none).
As many queries are carried out, the important performancemeasure is how many conformers can be compared per second.
Pedro J. Ballester USR for Similarity Search 17
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency
Most Important Feature of USR
Efficiency
Compare USR with three state-of-the-art methods: ESshape3D,Shape Signatures and ROCS.
Precalculating descriptors for a database (only once; in s/c):
USR (1.18 · 10−4 s/c; Intel Core2 2.0GHz).ESshape3D (9.15 · 10−4 s/c; Intel Core2 2.0GHz).Shape Signatures (50.82 s/c; 450MHz Intel Pentium III).ROCS (none).
As many queries are carried out, the important performancemeasure is how many conformers can be compared per second.
Pedro J. Ballester USR for Similarity Search 17
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency
Most Important Feature of USR
Efficiency
Compare USR with three state-of-the-art methods: ESshape3D,Shape Signatures and ROCS.
Precalculating descriptors for a database (only once; in s/c):
USR (1.18 · 10−4 s/c; Intel Core2 2.0GHz).ESshape3D (9.15 · 10−4 s/c; Intel Core2 2.0GHz).Shape Signatures (50.82 s/c; 450MHz Intel Pentium III).ROCS (none).
As many queries are carried out, the important performancemeasure is how many conformers can be compared per second.
Pedro J. Ballester USR for Similarity Search 17
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency
Most Important Feature of USR
Efficiency
Compare USR with three state-of-the-art methods: ESshape3D,Shape Signatures and ROCS.
Precalculating descriptors for a database (only once; in s/c):
USR (1.18 · 10−4 s/c; Intel Core2 2.0GHz).ESshape3D (9.15 · 10−4 s/c; Intel Core2 2.0GHz).Shape Signatures (50.82 s/c; 450MHz Intel Pentium III).ROCS (none).
As many queries are carried out, the important performancemeasure is how many conformers can be compared per second.
Pedro J. Ballester USR for Similarity Search 17
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency
Efficiency Comparison with Descriptor-based Methods
USR is 1 546 times faster than ESshape3D.
USR is 2 038 times faster than Shape Signatures.
Pedro J. Ballester USR for Similarity Search 18
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency
Efficiency Comparison with Descriptor-based Methods
USR is 1 546 times faster than ESshape3D.
USR is 2 038 times faster than Shape Signatures.
Pedro J. Ballester USR for Similarity Search 18
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency
Efficiency Comparison with Descriptor-based Methods
USR is 1 546 times faster than ESshape3D.
USR is 2 038 times faster than Shape Signatures.
Pedro J. Ballester USR for Similarity Search 18
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency
Efficiency Comparison with Superposition Methods
USR is 14 238 times faster than ROCS.
Based on ROCS’s reported comparison rate on a modernworkstation (USR on a 2.93 GHz Intel Core2 processor).
Pedro J. Ballester USR for Similarity Search 19
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency
Efficiency Comparison with Superposition Methods
USR is 14 238 times faster than ROCS.
Based on ROCS’s reported comparison rate on a modernworkstation (USR on a 2.93 GHz Intel Core2 processor).
Pedro J. Ballester USR for Similarity Search 19
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency
Efficiency Comparison with Superposition Methods
USR is 14 238 times faster than ROCS.
Based on ROCS’s reported comparison rate on a modernworkstation (USR on a 2.93 GHz Intel Core2 processor).
Pedro J. Ballester USR for Similarity Search 19
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Experimental SetupEnrichment PlotsUSR virtual query
Ligand-based Virtual Screening
Ballester, P.J., Finn, P.W. and Richards, W.G. (2007?)
Pedro J. Ballester USR for Similarity Search 20
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Experimental SetupEnrichment PlotsUSR virtual query
Virtual Screening Validation
Aim
How good will be the method at identifying molecules with agiven activity?
Retrospective virtual screening experiment.
DrugBank-3D Test Database
Publicly available resource: DrugBank(http://redpoll.pharmacy.ualberta.ca/drugbank/index.html).
Input: set of 3 764 chemical structures formed by FDA-approved(708) and experimental (3 056) drugs.
MOE’s conformer generator→ an average of about 200conformations per compound (3 330 chemical structures).
DrugBank-3D: 666 892 conformers in 3D MDL SD format.
Pedro J. Ballester USR for Similarity Search 21
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Experimental SetupEnrichment PlotsUSR virtual query
Virtual Screening Validation
Aim
How good will be the method at identifying molecules with agiven activity?
Retrospective virtual screening experiment.
DrugBank-3D Test Database
Publicly available resource: DrugBank(http://redpoll.pharmacy.ualberta.ca/drugbank/index.html).
Input: set of 3 764 chemical structures formed by FDA-approved(708) and experimental (3 056) drugs.
MOE’s conformer generator→ an average of about 200conformations per compound (3 330 chemical structures).
DrugBank-3D: 666 892 conformers in 3D MDL SD format.
Pedro J. Ballester USR for Similarity Search 21
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Experimental SetupEnrichment PlotsUSR virtual query
Virtual Screening Validation
Aim
How good will be the method at identifying molecules with agiven activity?
Retrospective virtual screening experiment.
DrugBank-3D Test Database
Publicly available resource: DrugBank(http://redpoll.pharmacy.ualberta.ca/drugbank/index.html).
Input: set of 3 764 chemical structures formed by FDA-approved(708) and experimental (3 056) drugs.
MOE’s conformer generator→ an average of about 200conformations per compound (3 330 chemical structures).
DrugBank-3D: 666 892 conformers in 3D MDL SD format.
Pedro J. Ballester USR for Similarity Search 21
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Experimental SetupEnrichment PlotsUSR virtual query
Virtual Screening Validation
Aim
How good will be the method at identifying molecules with agiven activity?
Retrospective virtual screening experiment.
DrugBank-3D Test Database
Publicly available resource: DrugBank(http://redpoll.pharmacy.ualberta.ca/drugbank/index.html).
Input: set of 3 764 chemical structures formed by FDA-approved(708) and experimental (3 056) drugs.
MOE’s conformer generator→ an average of about 200conformations per compound (3 330 chemical structures).
DrugBank-3D: 666 892 conformers in 3D MDL SD format.
Pedro J. Ballester USR for Similarity Search 21
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Experimental SetupEnrichment PlotsUSR virtual query
Virtual Screening Validation
Aim
How good will be the method at identifying molecules with agiven activity?
Retrospective virtual screening experiment.
DrugBank-3D Test Database
Publicly available resource: DrugBank(http://redpoll.pharmacy.ualberta.ca/drugbank/index.html).
Input: set of 3 764 chemical structures formed by FDA-approved(708) and experimental (3 056) drugs.
MOE’s conformer generator→ an average of about 200conformations per compound (3 330 chemical structures).
DrugBank-3D: 666 892 conformers in 3D MDL SD format.
Pedro J. Ballester USR for Similarity Search 21
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Experimental SetupEnrichment PlotsUSR virtual query
Virtual Screening Validation
Aim
How good will be the method at identifying molecules with agiven activity?
Retrospective virtual screening experiment.
DrugBank-3D Test Database
Publicly available resource: DrugBank(http://redpoll.pharmacy.ualberta.ca/drugbank/index.html).
Input: set of 3 764 chemical structures formed by FDA-approved(708) and experimental (3 056) drugs.
MOE’s conformer generator→ an average of about 200conformations per compound (3 330 chemical structures).
DrugBank-3D: 666 892 conformers in 3D MDL SD format.
Pedro J. Ballester USR for Similarity Search 21
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Experimental SetupEnrichment PlotsUSR virtual query
Virtual Screening Validation
Aim
How good will be the method at identifying molecules with agiven activity?
Retrospective virtual screening experiment.
DrugBank-3D Test Database
Publicly available resource: DrugBank(http://redpoll.pharmacy.ualberta.ca/drugbank/index.html).
Input: set of 3 764 chemical structures formed by FDA-approved(708) and experimental (3 056) drugs.
MOE’s conformer generator→ an average of about 200conformations per compound (3 330 chemical structures).
DrugBank-3D: 666 892 conformers in 3D MDL SD format.
Pedro J. Ballester USR for Similarity Search 21
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Experimental SetupEnrichment PlotsUSR virtual query
Virtual Screening Validation
Aim
How good will be the method at identifying molecules with agiven activity?
Retrospective virtual screening experiment.
DrugBank-3D Test Database
Publicly available resource: DrugBank(http://redpoll.pharmacy.ualberta.ca/drugbank/index.html).
Input: set of 3 764 chemical structures formed by FDA-approved(708) and experimental (3 056) drugs.
MOE’s conformer generator→ an average of about 200conformations per compound (3 330 chemical structures).
DrugBank-3D: 666 892 conformers in 3D MDL SD format.
Pedro J. Ballester USR for Similarity Search 21
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Experimental SetupEnrichment PlotsUSR virtual query
Query Selection
Adopted Procedure (Nicholls et al., 2004)
For each activity class:
1 Consider the lowest energy conformer of each activecompound.
2 Perform hierarchical agglomerative clustering on the USRsimilarity matrix of these conformers (Sthreshold = 0.75).
3 Main cluster ≡ that with highest number of actives.4 Identify the closest of these conformers to the centroid of
the main cluster (consensus shape template).5 Use shape template as query against the whole database.
Pedro J. Ballester USR for Similarity Search 22
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Experimental SetupEnrichment PlotsUSR virtual query
Query Selection
Adopted Procedure (Nicholls et al., 2004)
For each activity class:
1 Consider the lowest energy conformer of each activecompound.
2 Perform hierarchical agglomerative clustering on the USRsimilarity matrix of these conformers (Sthreshold = 0.75).
3 Main cluster ≡ that with highest number of actives.4 Identify the closest of these conformers to the centroid of
the main cluster (consensus shape template).5 Use shape template as query against the whole database.
Pedro J. Ballester USR for Similarity Search 22
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Experimental SetupEnrichment PlotsUSR virtual query
Query Selection
Adopted Procedure (Nicholls et al., 2004)
For each activity class:
1 Consider the lowest energy conformer of each activecompound.
2 Perform hierarchical agglomerative clustering on the USRsimilarity matrix of these conformers (Sthreshold = 0.75).
3 Main cluster ≡ that with highest number of actives.4 Identify the closest of these conformers to the centroid of
the main cluster (consensus shape template).5 Use shape template as query against the whole database.
Pedro J. Ballester USR for Similarity Search 22
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Experimental SetupEnrichment PlotsUSR virtual query
Query Selection
Adopted Procedure (Nicholls et al., 2004)
For each activity class:
1 Consider the lowest energy conformer of each activecompound.
2 Perform hierarchical agglomerative clustering on the USRsimilarity matrix of these conformers (Sthreshold = 0.75).
3 Main cluster ≡ that with highest number of actives.4 Identify the closest of these conformers to the centroid of
the main cluster (consensus shape template).5 Use shape template as query against the whole database.
Pedro J. Ballester USR for Similarity Search 22
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Experimental SetupEnrichment PlotsUSR virtual query
Query Selection
Adopted Procedure (Nicholls et al., 2004)
For each activity class:
1 Consider the lowest energy conformer of each activecompound.
2 Perform hierarchical agglomerative clustering on the USRsimilarity matrix of these conformers (Sthreshold = 0.75).
3 Main cluster ≡ that with highest number of actives.4 Identify the closest of these conformers to the centroid of
the main cluster (consensus shape template).5 Use shape template as query against the whole database.
Pedro J. Ballester USR for Similarity Search 22
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Experimental SetupEnrichment PlotsUSR virtual query
Query Selection
Adopted Procedure (Nicholls et al., 2004)
For each activity class:
1 Consider the lowest energy conformer of each activecompound.
2 Perform hierarchical agglomerative clustering on the USRsimilarity matrix of these conformers (Sthreshold = 0.75).
3 Main cluster ≡ that with highest number of actives.4 Identify the closest of these conformers to the centroid of
the main cluster (consensus shape template).5 Use shape template as query against the whole database.
Pedro J. Ballester USR for Similarity Search 22
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Experimental SetupEnrichment PlotsUSR virtual query
Query Selection
Adopted Procedure (Nicholls et al., 2004)
For each activity class:
1 Consider the lowest energy conformer of each activecompound.
2 Perform hierarchical agglomerative clustering on the USRsimilarity matrix of these conformers (Sthreshold = 0.75).
3 Main cluster ≡ that with highest number of actives.4 Identify the closest of these conformers to the centroid of
the main cluster (consensus shape template).5 Use shape template as query against the whole database.
Pedro J. Ballester USR for Similarity Search 22
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Experimental SetupEnrichment PlotsUSR virtual query
Query Selection
Used Activity Classes
Activity Name # of Actives (Ai)TK Thymidine Kinase 13HH1R Histamine H1 Receptor 41COX-2 Cyclooxygenase-2 28NM Neuraminidase 85-HT-2A 5-HT-2A Receptor 15ER Estrogen Receptor 24PR Progesterone Receptor 12TKTL Transketolase 3AT1 Type-1 Angiotensin II Receptor 8HIV-1 HIV-1 Protease 6
Pedro J. Ballester USR for Similarity Search 23
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Experimental SetupEnrichment PlotsUSR virtual query
Query Selection
Used Activity Classes
Activity Name # of Actives (Ai)TK Thymidine Kinase 13HH1R Histamine H1 Receptor 41COX-2 Cyclooxygenase-2 28NM Neuraminidase 85-HT-2A 5-HT-2A Receptor 15ER Estrogen Receptor 24PR Progesterone Receptor 12TKTL Transketolase 3AT1 Type-1 Angiotensin II Receptor 8HIV-1 HIV-1 Protease 6
Pedro J. Ballester USR for Similarity Search 23
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Experimental SetupEnrichment PlotsUSR virtual query
Query Selection
Used Activity Classes
Activity Name # of Actives (Ai)TK Thymidine Kinase 13HH1R Histamine H1 Receptor 41COX-2 Cyclooxygenase-2 28NM Neuraminidase 85-HT-2A 5-HT-2A Receptor 15ER Estrogen Receptor 24PR Progesterone Receptor 12TKTL Transketolase 3AT1 Type-1 Angiotensin II Receptor 8HIV-1 HIV-1 Protease 6
Pedro J. Ballester USR for Similarity Search 23
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Experimental SetupEnrichment PlotsUSR virtual query
Activity Class: 5-HT-2A Receptor
Pedro J. Ballester USR for Similarity Search 24
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Experimental SetupEnrichment PlotsUSR virtual query
Activity Class: 5-HT-2A Receptor
Pedro J. Ballester USR for Similarity Search 24
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Experimental SetupEnrichment PlotsUSR virtual query
Selected Queries for each Activity
Activity Query # of Heavy Atoms (N)TK EXPT01835-1 17HH1R APRD00587-1 19COX-2 APRD01060-1 19NM EXPT00332-1 205-HT-2A APRD00033-1 22ER APRD00754-1 23PR APRD00941-1 23TKTL EXPT02273-1 26AT1 APRD00052-1 30HIV-1 APRD00623-1 49
Pedro J. Ballester USR for Similarity Search 25
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Experimental SetupEnrichment PlotsUSR virtual query
Selected Queries for each Activity
Activity Query # of Heavy Atoms (N)TK EXPT01835-1 17HH1R APRD00587-1 19COX-2 APRD01060-1 19NM EXPT00332-1 205-HT-2A APRD00033-1 22ER APRD00754-1 23PR APRD00941-1 23TKTL EXPT02273-1 26AT1 APRD00052-1 30HIV-1 APRD00623-1 49
Pedro J. Ballester USR for Similarity Search 25
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Experimental SetupEnrichment PlotsUSR virtual query
Selected Queries for each Activity
Activity Query # of Heavy Atoms (N)TK EXPT01835-1 17HH1R APRD00587-1 19COX-2 APRD01060-1 19NM EXPT00332-1 205-HT-2A APRD00033-1 22ER APRD00754-1 23PR APRD00941-1 23TKTL EXPT02273-1 26AT1 APRD00052-1 30HIV-1 APRD00623-1 49
Pedro J. Ballester USR for Similarity Search 25
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Experimental SetupEnrichment PlotsUSR virtual query
Shape of Selected Queries
TK HH1R COX-2 NM 5-HT-2A
ER PR TKTL AT1 HIV-1
Pedro J. Ballester USR for Similarity Search 26
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Experimental SetupEnrichment PlotsUSR virtual query
Enrichment Plot: 5-HT-2A Receptor
5-HT-2A (APRD00033-1)
0
5
10
15
20
25
30
35
40
45
50
0 1 2 3 4 5
top x%
E(x%)
USR
ESshape3D
Pedro J. Ballester USR for Similarity Search 27
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Experimental SetupEnrichment PlotsUSR virtual query
Comparing USR and ESshape3D on the 10 Activities
Mean Enrichment Top 1% Top 3% Top 5%USR 25.5 9.9 7.5ESshape3D 15.8 8.0 6.0
Pedro J. Ballester USR for Similarity Search 28
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Experimental SetupEnrichment PlotsUSR virtual query
Enrichment Plot: Thymidine Kinase
Thymidine Kinase (EXPT01835-1)
0
10
20
30
40
50
60
0 1 2 3 4 5
top x%
E(x%)
ESshape3D
USR
Pedro J. Ballester USR for Similarity Search 29
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Experimental SetupEnrichment PlotsUSR virtual query
A Possible Explanation
Thymidine Kinase (EXPT01835-1)
0
10
20
30
40
50
60
0 1 2 3 4 5
top x%
E(x%)
ESshape3D
USR
+
++
x +
xx
x
x x
xx
Pedro J. Ballester USR for Similarity Search 30
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Experimental SetupEnrichment PlotsUSR virtual query
A Possible Explanation
Thymidine Kinase (EXPT01835-1)
0
10
20
30
40
50
60
0 1 2 3 4 5
top x%
E(x%)
ESshape3D
USR
+
++
x +
xx
x
x x
xx
Pedro J. Ballester USR for Similarity Search 30
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Experimental SetupEnrichment PlotsUSR virtual query
A Possible Explanation
Thymidine Kinase (EXPT01835-1)
0
10
20
30
40
50
60
0 1 2 3 4 5
top x%
E(x%)
ESshape3D
USR
+
++
x +
xx
x
x x
xx
Pedro J. Ballester USR for Similarity Search 30
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Experimental SetupEnrichment PlotsUSR virtual query
A Possible Explanation
Thymidine Kinase (EXPT01835-1)
0
10
20
30
40
50
60
0 1 2 3 4 5
top x%
E(x%)
ESshape3D
USR
USR(C1-CTD)
++
+x
+
x
x
x
x x
xx
Pedro J. Ballester USR for Similarity Search 30
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Experimental SetupEnrichment PlotsUSR virtual query
Enrichment Plot: 5-HT-2A Receptor
5-HT-2A (APRD00033-1)
0
5
10
15
20
25
30
35
40
45
50
0 1 2 3 4 5
top x%
E(x%)
USR
ESshape3D
USR(C1-CTD)
Pedro J. Ballester USR for Similarity Search 31
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Experimental SetupEnrichment PlotsUSR virtual query
Comparing USR and ESshape3D on the 10 Activities
Mean Enrichment Top 1% Top 3% Top 5%USR-CTD 36.9 14.4 10.2USR 25.5 9.9 7.5ESshape3D 15.8 8.0 6.0
Pedro J. Ballester USR for Similarity Search 32
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Future Work
In Collaboration with:
� GlaxoSmithKline (Harlow, UK)
� Pfizer (Groton, USA)
� University of Oxford (Dept. of Pharmacology)
Pedro J. Ballester USR for Similarity Search 33
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Topics for Future Research
Virtual Screening (VS): Validating generalisation ability of VSmethods using biological screening data.
Prospective VS on selected activities.
Clustering molecular databases in terms of shape:
USR-based clustering on multi-million databases.Screening strategies for Docking and HTS.
Combining USR with other VS methods (electrostatics).
Pedro J. Ballester USR for Similarity Search 34
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Topics for Future Research
Virtual Screening (VS): Validating generalisation ability of VSmethods using biological screening data.
Prospective VS on selected activities.
Clustering molecular databases in terms of shape:
USR-based clustering on multi-million databases.Screening strategies for Docking and HTS.
Combining USR with other VS methods (electrostatics).
Pedro J. Ballester USR for Similarity Search 34
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Topics for Future Research
Virtual Screening (VS): Validating generalisation ability of VSmethods using biological screening data.
Prospective VS on selected activities.
Clustering molecular databases in terms of shape:
USR-based clustering on multi-million databases.Screening strategies for Docking and HTS.
Combining USR with other VS methods (electrostatics).
Pedro J. Ballester USR for Similarity Search 34
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Topics for Future Research
Virtual Screening (VS): Validating generalisation ability of VSmethods using biological screening data.
Prospective VS on selected activities.
Clustering molecular databases in terms of shape:
USR-based clustering on multi-million databases.Screening strategies for Docking and HTS.
Combining USR with other VS methods (electrostatics).
Pedro J. Ballester USR for Similarity Search 34
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Topics for Future Research
Virtual Screening (VS): Validating generalisation ability of VSmethods using biological screening data.
Prospective VS on selected activities.
Clustering molecular databases in terms of shape:
USR-based clustering on multi-million databases.Screening strategies for Docking and HTS.
Combining USR with other VS methods (electrostatics).
Pedro J. Ballester USR for Similarity Search 34
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Topics for Future Research
Virtual Screening (VS): Validating generalisation ability of VSmethods using biological screening data.
Prospective VS on selected activities.
Clustering molecular databases in terms of shape:
USR-based clustering on multi-million databases.Screening strategies for Docking and HTS.
Combining USR with other VS methods (electrostatics).
Pedro J. Ballester USR for Similarity Search 34
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Topics for Future Research
Virtual Screening (VS): Validating generalisation ability of VSmethods using biological screening data.
Prospective VS on selected activities.
Clustering molecular databases in terms of shape:
USR-based clustering on multi-million databases.Screening strategies for Docking and HTS.
Combining USR with other VS methods (electrostatics).
Pedro J. Ballester USR for Similarity Search 34
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Conclusions
Pedro J. Ballester USR for Similarity Search 35
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Conclusions
A novel molecular shape comparison approach (USR) hasbeen proposed.
Effective at identifying similarly shaped conformers in adatabase.
Effective in retrospective virtual screening across a set ofdiverse activities:
Adopted query selection procedure works well.Proposed USR virtual query improves performance.Suggest that USR will be effective in prospective virtualscreening.
Extremely fast: more than 3 orders de magnitude fasterthan the fastest existing methods!
Pedro J. Ballester USR for Similarity Search 36
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Conclusions
A novel molecular shape comparison approach (USR) hasbeen proposed.
Effective at identifying similarly shaped conformers in adatabase.
Effective in retrospective virtual screening across a set ofdiverse activities:
Adopted query selection procedure works well.Proposed USR virtual query improves performance.Suggest that USR will be effective in prospective virtualscreening.
Extremely fast: more than 3 orders de magnitude fasterthan the fastest existing methods!
Pedro J. Ballester USR for Similarity Search 36
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Conclusions
A novel molecular shape comparison approach (USR) hasbeen proposed.
Effective at identifying similarly shaped conformers in adatabase.
Effective in retrospective virtual screening across a set ofdiverse activities:
Adopted query selection procedure works well.Proposed USR virtual query improves performance.Suggest that USR will be effective in prospective virtualscreening.
Extremely fast: more than 3 orders de magnitude fasterthan the fastest existing methods!
Pedro J. Ballester USR for Similarity Search 36
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Conclusions
A novel molecular shape comparison approach (USR) hasbeen proposed.
Effective at identifying similarly shaped conformers in adatabase.
Effective in retrospective virtual screening across a set ofdiverse activities:
Adopted query selection procedure works well.Proposed USR virtual query improves performance.Suggest that USR will be effective in prospective virtualscreening.
Extremely fast: more than 3 orders de magnitude fasterthan the fastest existing methods!
Pedro J. Ballester USR for Similarity Search 36
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Conclusions
A novel molecular shape comparison approach (USR) hasbeen proposed.
Effective at identifying similarly shaped conformers in adatabase.
Effective in retrospective virtual screening across a set ofdiverse activities:
Adopted query selection procedure works well.Proposed USR virtual query improves performance.Suggest that USR will be effective in prospective virtualscreening.
Extremely fast: more than 3 orders de magnitude fasterthan the fastest existing methods!
Pedro J. Ballester USR for Similarity Search 36
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Conclusions
A novel molecular shape comparison approach (USR) hasbeen proposed.
Effective at identifying similarly shaped conformers in adatabase.
Effective in retrospective virtual screening across a set ofdiverse activities:
Adopted query selection procedure works well.Proposed USR virtual query improves performance.Suggest that USR will be effective in prospective virtualscreening.
Extremely fast: more than 3 orders de magnitude fasterthan the fastest existing methods!
Pedro J. Ballester USR for Similarity Search 36
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Conclusions
A novel molecular shape comparison approach (USR) hasbeen proposed.
Effective at identifying similarly shaped conformers in adatabase.
Effective in retrospective virtual screening across a set ofdiverse activities:
Adopted query selection procedure works well.Proposed USR virtual query improves performance.Suggest that USR will be effective in prospective virtualscreening.
Extremely fast: more than 3 orders de magnitude fasterthan the fastest existing methods!
Pedro J. Ballester USR for Similarity Search 36
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Conclusions
A novel molecular shape comparison approach (USR) hasbeen proposed.
Effective at identifying similarly shaped conformers in adatabase.
Effective in retrospective virtual screening across a set ofdiverse activities:
Adopted query selection procedure works well.Proposed USR virtual query improves performance.Suggest that USR will be effective in prospective virtualscreening.
Extremely fast: more than 3 orders de magnitude fasterthan the fastest existing methods!
Pedro J. Ballester USR for Similarity Search 36
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Conclusions
USR could be adapted to other shape recognition problems(e.g. internet search engine for 3D Shapes).
Overall, this work has attracted the attention of the media.
Pedro J. Ballester USR for Similarity Search 37
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Conclusions
USR could be adapted to other shape recognition problems(e.g. internet search engine for 3D Shapes).
Overall, this work has attracted the attention of the media.
Pedro J. Ballester USR for Similarity Search 37
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Conclusions
USR could be adapted to other shape recognition problems(e.g. internet search engine for 3D Shapes).
Overall, this work has attracted the attention of the media.
Pedro J. Ballester USR for Similarity Search 37
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Acknowledgements
Graham Richards (University of Oxford):feedback.
Paul Finn (Inhibox Ltd.):feedback and preparing molecular databases.
US National Foundation of Cancer Research:funding.
Pedro J. Ballester USR for Similarity Search 38
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Acknowledgements
Graham Richards (University of Oxford):feedback.
Paul Finn (Inhibox Ltd.):feedback and preparing molecular databases.
US National Foundation of Cancer Research:funding.
Pedro J. Ballester USR for Similarity Search 38
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Acknowledgements
Graham Richards (University of Oxford):feedback.
Paul Finn (Inhibox Ltd.):feedback and preparing molecular databases.
US National Foundation of Cancer Research:funding.
Pedro J. Ballester USR for Similarity Search 38
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Acknowledgements
Graham Richards (University of Oxford):feedback.
Paul Finn (Inhibox Ltd.):feedback and preparing molecular databases.
US National Foundation of Cancer Research:funding.
Pedro J. Ballester USR for Similarity Search 38
IntroductionUltrafast Shape Recognition
Ligand-based Virtual ScreeningFuture WorkConclusions
Thank You
Pedro J. Ballester USR for Similarity Search 39
Top Related