Download - Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

Transcript
Page 1: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Ultrafast Shape Recognition

for Similarity Search in Molecular Databases

Pedro J. Ballester

NFCR Centre for Computational Drug Discovery

University of Oxford

Pedro J. Ballester USR for Similarity Search 1

Page 2: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Outline

1 Introduction

2 Ultrafast Shape Recognition

Foundations

Encoding

Comparing Molecular Shapes

Effectiveness

Efficiency

3 Ligand-based Virtual Screening

Experimental Setup

Enrichment Plots

USR virtual query

4 Future Work

5 Conclusions

Pedro J. Ballester USR for Similarity Search 2

Page 3: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Introduction

Pedro J. Ballester USR for Similarity Search 3

Page 4: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Virtual Screening

Ligand-based Virtual Screening

Goal: Identifying drug-like molecules likely to bebiologically active.

Principle: Molecules with similar patterns are likely to havesimilar biological activity.

Template: e.g. a molecule of known biological activity.

Strategy: Search a database of molecules for those with apattern similar to that of the template.

Pedro J. Ballester USR for Similarity Search 4

Page 5: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Virtual Screening

Ligand-based Virtual Screening

Goal: Identifying drug-like molecules likely to bebiologically active.

Principle: Molecules with similar patterns are likely to havesimilar biological activity.

Template: e.g. a molecule of known biological activity.

Strategy: Search a database of molecules for those with apattern similar to that of the template.

Pedro J. Ballester USR for Similarity Search 4

Page 6: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Virtual Screening

Ligand-based Virtual Screening

Goal: Identifying drug-like molecules likely to bebiologically active.

Principle: Molecules with similar patterns are likely to havesimilar biological activity.

Template: e.g. a molecule of known biological activity.

Strategy: Search a database of molecules for those with apattern similar to that of the template.

Pedro J. Ballester USR for Similarity Search 4

Page 7: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Virtual Screening

Ligand-based Virtual Screening

Goal: Identifying drug-like molecules likely to bebiologically active.

Principle: Molecules with similar patterns are likely to havesimilar biological activity.

Template: e.g. a molecule of known biological activity.

Strategy: Search a database of molecules for those with apattern similar to that of the template.

Pedro J. Ballester USR for Similarity Search 4

Page 8: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Virtual Screening

Ligand-based Virtual Screening

Goal: Identifying drug-like molecules likely to bebiologically active.

Principle: Molecules with similar patterns are likely to havesimilar biological activity.

Template: e.g. a molecule of known biological activity.

Strategy: Search a database of molecules for those with apattern similar to that of the template.

Pedro J. Ballester USR for Similarity Search 4

Page 9: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Molecular Shape Comparison

Molecular Shape Comparison

Molecular shape has been widely highlighted as animportant pattern for which to search.

Shape complementarity between ligand and receptor isnecessary for binding.

Additional advantage: chemical structure is not specifiedand therefore novel chemical scaffolds may be found.

Pedro J. Ballester USR for Similarity Search 5

Page 10: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Molecular Shape Comparison

Molecular Shape Comparison

Molecular shape has been widely highlighted as animportant pattern for which to search.

Shape complementarity between ligand and receptor isnecessary for binding.

Additional advantage: chemical structure is not specifiedand therefore novel chemical scaffolds may be found.

Pedro J. Ballester USR for Similarity Search 5

Page 11: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Molecular Shape Comparison

Molecular Shape Comparison

Molecular shape has been widely highlighted as animportant pattern for which to search.

Shape complementarity between ligand and receptor isnecessary for binding.

Additional advantage: chemical structure is not specifiedand therefore novel chemical scaffolds may be found.

Pedro J. Ballester USR for Similarity Search 5

Page 12: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Molecular Shape Comparison

Molecular Shape Comparison

Molecular shape has been widely highlighted as animportant pattern for which to search.

Shape complementarity between ligand and receptor isnecessary for binding.

Additional advantage: chemical structure is not specifiedand therefore novel chemical scaffolds may be found.

Pedro J. Ballester USR for Similarity Search 5

Page 13: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Challenges

Alignment

Some methods require alignment of the molecules beforecomparing their shapes.

Essentially: a multimodal optimisation problem with verylimited number of objective function evaluations available.

May lead to suboptimal molecular alignment and thuserrors in the comparison.

Pedro J. Ballester USR for Similarity Search 6

Page 14: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Challenges

Alignment

Some methods require alignment of the molecules beforecomparing their shapes.

Essentially: a multimodal optimisation problem with verylimited number of objective function evaluations available.

May lead to suboptimal molecular alignment and thuserrors in the comparison.

Pedro J. Ballester USR for Similarity Search 6

Page 15: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Challenges

Alignment

Some methods require alignment of the molecules beforecomparing their shapes.

Essentially: a multimodal optimisation problem with verylimited number of objective function evaluations available.

May lead to suboptimal molecular alignment and thuserrors in the comparison.

Pedro J. Ballester USR for Similarity Search 6

Page 16: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Challenges

Alignment

Some methods require alignment of the molecules beforecomparing their shapes.

Essentially: a multimodal optimisation problem with verylimited number of objective function evaluations available.

May lead to suboptimal molecular alignment and thuserrors in the comparison.

Pedro J. Ballester USR for Similarity Search 6

Page 17: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Challenges

Efficiency

Shape information regarded as difficult to encode efficientlyand use in database searching (e.g. Zauhar et al. 2003).

Increasing size of molecular databases poses a seriouslimitation for current shape comparison methods.

The more conformations, the less likely to miss moleculesthat can adopt the template’s shape.The more compounds, the more likely to find innovativebioactive molecules.

Consequently, the speed of molecular shape comparisonmethods is highly important.

Pedro J. Ballester USR for Similarity Search 7

Page 18: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Challenges

Efficiency

Shape information regarded as difficult to encode efficientlyand use in database searching (e.g. Zauhar et al. 2003).

Increasing size of molecular databases poses a seriouslimitation for current shape comparison methods.

The more conformations, the less likely to miss moleculesthat can adopt the template’s shape.The more compounds, the more likely to find innovativebioactive molecules.

Consequently, the speed of molecular shape comparisonmethods is highly important.

Pedro J. Ballester USR for Similarity Search 7

Page 19: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Challenges

Efficiency

Shape information regarded as difficult to encode efficientlyand use in database searching (e.g. Zauhar et al. 2003).

Increasing size of molecular databases poses a seriouslimitation for current shape comparison methods.

The more conformations, the less likely to miss moleculesthat can adopt the template’s shape.The more compounds, the more likely to find innovativebioactive molecules.

Consequently, the speed of molecular shape comparisonmethods is highly important.

Pedro J. Ballester USR for Similarity Search 7

Page 20: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Challenges

Efficiency

Shape information regarded as difficult to encode efficientlyand use in database searching (e.g. Zauhar et al. 2003).

Increasing size of molecular databases poses a seriouslimitation for current shape comparison methods.

The more conformations, the less likely to miss moleculesthat can adopt the template’s shape.The more compounds, the more likely to find innovativebioactive molecules.

Consequently, the speed of molecular shape comparisonmethods is highly important.

Pedro J. Ballester USR for Similarity Search 7

Page 21: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Challenges

Efficiency

Shape information regarded as difficult to encode efficientlyand use in database searching (e.g. Zauhar et al. 2003).

Increasing size of molecular databases poses a seriouslimitation for current shape comparison methods.

The more conformations, the less likely to miss moleculesthat can adopt the template’s shape.The more compounds, the more likely to find innovativebioactive molecules.

Consequently, the speed of molecular shape comparisonmethods is highly important.

Pedro J. Ballester USR for Similarity Search 7

Page 22: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Challenges

Efficiency

Shape information regarded as difficult to encode efficientlyand use in database searching (e.g. Zauhar et al. 2003).

Increasing size of molecular databases poses a seriouslimitation for current shape comparison methods.

The more conformations, the less likely to miss moleculesthat can adopt the template’s shape.The more compounds, the more likely to find innovativebioactive molecules.

Consequently, the speed of molecular shape comparisonmethods is highly important.

Pedro J. Ballester USR for Similarity Search 7

Page 23: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency

Ultrafast Shape Recognition (USR)

Ballester, P.J., US Patent Application filed on 25 May 2007

Ballester, P.J. and Richards, W.G. (2007) J Comput Chem

Ballester, P.J. and Richards, W.G. (2007) Proc R Soc A

Pedro J. Ballester USR for Similarity Search 8

Page 24: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency

Foundations

USR is based on the observation that the shape of a molecule isuniquely determined by the relative position of its atoms.

Such positions are in turn determined by the set of allinter-atomic distances.

No need for alignment or translation of the molecule, as this setof distances is independent of molecular orientation or position.

Pedro J. Ballester USR for Similarity Search 9

Page 25: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency

Foundations

USR is based on the observation that the shape of a molecule isuniquely determined by the relative position of its atoms.

Such positions are in turn determined by the set of allinter-atomic distances.

No need for alignment or translation of the molecule, as this setof distances is independent of molecular orientation or position.

Pedro J. Ballester USR for Similarity Search 9

Page 26: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency

Foundations

USR is based on the observation that the shape of a molecule isuniquely determined by the relative position of its atoms.

Such positions are in turn determined by the set of allinter-atomic distances.

No need for alignment or translation of the molecule, as this setof distances is independent of molecular orientation or position.

Pedro J. Ballester USR for Similarity Search 9

Page 27: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency

Foundations

USR is based on the observation that the shape of a molecule isuniquely determined by the relative position of its atoms.

Such positions are in turn determined by the set of allinter-atomic distances.

No need for alignment or translation of the molecule, as this setof distances is independent of molecular orientation or position.

Pedro J. Ballester USR for Similarity Search 9

Page 28: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency

Foundations

USR is based on the observation that the shape of a molecule isuniquely determined by the relative position of its atoms.

Such positions are in turn determined by the set of allinter-atomic distances.

No need for alignment or translation of the molecule, as this setof distances is independent of molecular orientation or position.

Pedro J. Ballester USR for Similarity Search 9

Page 29: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency

Further Considerations

Furthermore, values of inter-atomic distances are heavilyconstrained:

Distances between bound atoms strongly depends on whichare these atoms.Other inter-atomic distances depends on the flexibility ofthe molecule.

The set of all inter-atomic distances may contain moreinformation than needed for accurate description of shape.

Strategy: encoding shape from a subset of these distances.

Pedro J. Ballester USR for Similarity Search 10

Page 30: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency

Further Considerations

Furthermore, values of inter-atomic distances are heavilyconstrained:

Distances between bound atoms strongly depends on whichare these atoms.Other inter-atomic distances depends on the flexibility ofthe molecule.

The set of all inter-atomic distances may contain moreinformation than needed for accurate description of shape.

Strategy: encoding shape from a subset of these distances.

Pedro J. Ballester USR for Similarity Search 10

Page 31: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency

Further Considerations

Furthermore, values of inter-atomic distances are heavilyconstrained:

Distances between bound atoms strongly depends on whichare these atoms.Other inter-atomic distances depends on the flexibility ofthe molecule.

The set of all inter-atomic distances may contain moreinformation than needed for accurate description of shape.

Strategy: encoding shape from a subset of these distances.

Pedro J. Ballester USR for Similarity Search 10

Page 32: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency

Further Considerations

Furthermore, values of inter-atomic distances are heavilyconstrained:

Distances between bound atoms strongly depends on whichare these atoms.Other inter-atomic distances depends on the flexibility ofthe molecule.

The set of all inter-atomic distances may contain moreinformation than needed for accurate description of shape.

Strategy: encoding shape from a subset of these distances.

Pedro J. Ballester USR for Similarity Search 10

Page 33: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency

Further Considerations

Furthermore, values of inter-atomic distances are heavilyconstrained:

Distances between bound atoms strongly depends on whichare these atoms.Other inter-atomic distances depends on the flexibility ofthe molecule.

The set of all inter-atomic distances may contain moreinformation than needed for accurate description of shape.

Strategy: encoding shape from a subset of these distances.

Pedro J. Ballester USR for Similarity Search 10

Page 34: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency

Further Considerations

Furthermore, values of inter-atomic distances are heavilyconstrained:

Distances between bound atoms strongly depends on whichare these atoms.Other inter-atomic distances depends on the flexibility ofthe molecule.

The set of all inter-atomic distances may contain moreinformation than needed for accurate description of shape.

Strategy: encoding shape from a subset of these distances.

Pedro J. Ballester USR for Similarity Search 10

Page 35: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency

Representation of molecular shape

Reference Locations

Distances from two close atoms are similar and thus containsimilar information →

→ consider sets of atomic distances from reference locationswhich are far from each other.

Four reference locations: ctd, cst, fct and ftf.

Each conformer is represented now by 4N distances.

Pedro J. Ballester USR for Similarity Search 11

Page 36: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency

Representation of molecular shape

Reference Locations

Distances from two close atoms are similar and thus containsimilar information →

→ consider sets of atomic distances from reference locationswhich are far from each other.

Four reference locations: ctd, cst, fct and ftf.

Each conformer is represented now by 4N distances.

Pedro J. Ballester USR for Similarity Search 11

Page 37: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency

Representation of molecular shape

Reference Locations

Distances from two close atoms are similar and thus containsimilar information →

→ consider sets of atomic distances from reference locationswhich are far from each other.

Four reference locations: ctd, cst, fct and ftf.

Each conformer is represented now by 4N distances.

Pedro J. Ballester USR for Similarity Search 11

Page 38: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency

Representation of molecular shape

Reference Locations

Distances from two close atoms are similar and thus containsimilar information →

→ consider sets of atomic distances from reference locationswhich are far from each other.

Four reference locations: ctd, cst, fct and ftf.

Each conformer is represented now by 4N distances.

Pedro J. Ballester USR for Similarity Search 11

Page 39: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency

Representation of molecular shape

Reference Locations

Distances from two close atoms are similar and thus containsimilar information →

→ consider sets of atomic distances from reference locationswhich are far from each other.

Four reference locations: ctd, cst, fct and ftf.

Each conformer is represented now by 4N distances.

Pedro J. Ballester USR for Similarity Search 11

Page 40: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency

Encoding

Moments of atomic distance distributions

But: how do we compare molecules with different N?

Histogram of each distribution of distances has a number ofwell-known drawbacks:

Difficulty of selecting a bin size suitable for all comparedmolecules.Relatively large storage needed for the histograms.Relatively large computing cost.

A distribution is completely determined by its moments(e.g. Hall, 1983).

Idea: describe the distribution of atomic distances by its firstmoments (avoids histogram calculation).

Pedro J. Ballester USR for Similarity Search 12

Page 41: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency

Encoding

Moments of atomic distance distributions

But: how do we compare molecules with different N?

Histogram of each distribution of distances has a number ofwell-known drawbacks:

Difficulty of selecting a bin size suitable for all comparedmolecules.Relatively large storage needed for the histograms.Relatively large computing cost.

A distribution is completely determined by its moments(e.g. Hall, 1983).

Idea: describe the distribution of atomic distances by its firstmoments (avoids histogram calculation).

Pedro J. Ballester USR for Similarity Search 12

Page 42: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency

Encoding

Moments of atomic distance distributions

But: how do we compare molecules with different N?

Histogram of each distribution of distances has a number ofwell-known drawbacks:

Difficulty of selecting a bin size suitable for all comparedmolecules.Relatively large storage needed for the histograms.Relatively large computing cost.

A distribution is completely determined by its moments(e.g. Hall, 1983).

Idea: describe the distribution of atomic distances by its firstmoments (avoids histogram calculation).

Pedro J. Ballester USR for Similarity Search 12

Page 43: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency

Encoding

Moments of atomic distance distributions

But: how do we compare molecules with different N?

Histogram of each distribution of distances has a number ofwell-known drawbacks:

Difficulty of selecting a bin size suitable for all comparedmolecules.Relatively large storage needed for the histograms.Relatively large computing cost.

A distribution is completely determined by its moments(e.g. Hall, 1983).

Idea: describe the distribution of atomic distances by its firstmoments (avoids histogram calculation).

Pedro J. Ballester USR for Similarity Search 12

Page 44: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency

Encoding

Moments of atomic distance distributions

But: how do we compare molecules with different N?

Histogram of each distribution of distances has a number ofwell-known drawbacks:

Difficulty of selecting a bin size suitable for all comparedmolecules.Relatively large storage needed for the histograms.Relatively large computing cost.

A distribution is completely determined by its moments(e.g. Hall, 1983).

Idea: describe the distribution of atomic distances by its firstmoments (avoids histogram calculation).

Pedro J. Ballester USR for Similarity Search 12

Page 45: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency

Encoding

Moments of atomic distance distributions

But: how do we compare molecules with different N?

Histogram of each distribution of distances has a number ofwell-known drawbacks:

Difficulty of selecting a bin size suitable for all comparedmolecules.Relatively large storage needed for the histograms.Relatively large computing cost.

A distribution is completely determined by its moments(e.g. Hall, 1983).

Idea: describe the distribution of atomic distances by its firstmoments (avoids histogram calculation).

Pedro J. Ballester USR for Similarity Search 12

Page 46: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency

Encoding

Moments of atomic distance distributions

But: how do we compare molecules with different N?

Histogram of each distribution of distances has a number ofwell-known drawbacks:

Difficulty of selecting a bin size suitable for all comparedmolecules.Relatively large storage needed for the histograms.Relatively large computing cost.

A distribution is completely determined by its moments(e.g. Hall, 1983).

Idea: describe the distribution of atomic distances by its firstmoments (avoids histogram calculation).

Pedro J. Ballester USR for Similarity Search 12

Page 47: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency

Encoding

Moments of atomic distance distributions

But: how do we compare molecules with different N?

Histogram of each distribution of distances has a number ofwell-known drawbacks:

Difficulty of selecting a bin size suitable for all comparedmolecules.Relatively large storage needed for the histograms.Relatively large computing cost.

A distribution is completely determined by its moments(e.g. Hall, 1983).

Idea: describe the distribution of atomic distances by its firstmoments (avoids histogram calculation).

Pedro J. Ballester USR for Similarity Search 12

Page 48: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency

USR descriptors

12 descriptors: 4 reference locations x 3 first moments (i.e. mean,variance and skewness of each set of atomic distances).

Excellent compromise between effectiveness and efficiency.

Warning: if moments are poorly estimated, no reason to expectthe resulting implementation of USR to be effective!

Pedro J. Ballester USR for Similarity Search 13

Page 49: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency

USR descriptors

12 descriptors: 4 reference locations x 3 first moments (i.e. mean,variance and skewness of each set of atomic distances).

Excellent compromise between effectiveness and efficiency.

Warning: if moments are poorly estimated, no reason to expectthe resulting implementation of USR to be effective!

Pedro J. Ballester USR for Similarity Search 13

Page 50: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency

USR descriptors

12 descriptors: 4 reference locations x 3 first moments (i.e. mean,variance and skewness of each set of atomic distances).

Excellent compromise between effectiveness and efficiency.

Warning: if moments are poorly estimated, no reason to expectthe resulting implementation of USR to be effective!

Pedro J. Ballester USR for Similarity Search 13

Page 51: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency

USR descriptors

12 descriptors: 4 reference locations x 3 first moments (i.e. mean,variance and skewness of each set of atomic distances).

Excellent compromise between effectiveness and efficiency.

Warning: if moments are poorly estimated, no reason to expectthe resulting implementation of USR to be effective!

Pedro J. Ballester USR for Similarity Search 13

Page 52: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency

Molecular Shape Comparison

USR similarity score

Score to quantify the similarity between the query (q) and theith database conformer.

Sqi =1

1 + 1

12

∑12

l=1|Mq

l − M il |

Pedro J. Ballester USR for Similarity Search 14

Page 53: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency

Molecular Shape Comparison

USR similarity score

Score to quantify the similarity between the query (q) and theith database conformer.

Sqi =1

1 + 1

12

∑12

l=1|Mq

l − M il |

Pedro J. Ballester USR for Similarity Search 14

Page 54: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency

Molecular Shape Comparison

USR similarity score

Score to quantify the similarity between the query (q) and theith database conformer.

Sqi =1

1 + 1

12

∑12

l=1|Mq

l − M il |

Pedro J. Ballester USR for Similarity Search 14

Page 55: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency

Comparing Molecular Shapes: Example 1

Pedro J. Ballester USR for Similarity Search 15

Page 56: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency

Query 4 on a database with 2.5 million compounds

USR

1st 2nd 3rd 4th

1st 2nd 3rd 4th

ESshape3DPedro J. Ballester USR for Similarity Search 16

Page 57: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency

Most Important Feature of USR

Efficiency

Compare USR with three state-of-the-art methods: ESshape3D,Shape Signatures and ROCS.

Precalculating descriptors for a database (only once; in s/c):

USR (1.18 · 10−4 s/c; Intel Core2 2.0GHz).ESshape3D (9.15 · 10−4 s/c; Intel Core2 2.0GHz).Shape Signatures (50.82 s/c; 450MHz Intel Pentium III).ROCS (none).

As many queries are carried out, the important performancemeasure is how many conformers can be compared per second.

Pedro J. Ballester USR for Similarity Search 17

Page 58: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency

Most Important Feature of USR

Efficiency

Compare USR with three state-of-the-art methods: ESshape3D,Shape Signatures and ROCS.

Precalculating descriptors for a database (only once; in s/c):

USR (1.18 · 10−4 s/c; Intel Core2 2.0GHz).ESshape3D (9.15 · 10−4 s/c; Intel Core2 2.0GHz).Shape Signatures (50.82 s/c; 450MHz Intel Pentium III).ROCS (none).

As many queries are carried out, the important performancemeasure is how many conformers can be compared per second.

Pedro J. Ballester USR for Similarity Search 17

Page 59: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency

Most Important Feature of USR

Efficiency

Compare USR with three state-of-the-art methods: ESshape3D,Shape Signatures and ROCS.

Precalculating descriptors for a database (only once; in s/c):

USR (1.18 · 10−4 s/c; Intel Core2 2.0GHz).ESshape3D (9.15 · 10−4 s/c; Intel Core2 2.0GHz).Shape Signatures (50.82 s/c; 450MHz Intel Pentium III).ROCS (none).

As many queries are carried out, the important performancemeasure is how many conformers can be compared per second.

Pedro J. Ballester USR for Similarity Search 17

Page 60: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency

Most Important Feature of USR

Efficiency

Compare USR with three state-of-the-art methods: ESshape3D,Shape Signatures and ROCS.

Precalculating descriptors for a database (only once; in s/c):

USR (1.18 · 10−4 s/c; Intel Core2 2.0GHz).ESshape3D (9.15 · 10−4 s/c; Intel Core2 2.0GHz).Shape Signatures (50.82 s/c; 450MHz Intel Pentium III).ROCS (none).

As many queries are carried out, the important performancemeasure is how many conformers can be compared per second.

Pedro J. Ballester USR for Similarity Search 17

Page 61: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency

Most Important Feature of USR

Efficiency

Compare USR with three state-of-the-art methods: ESshape3D,Shape Signatures and ROCS.

Precalculating descriptors for a database (only once; in s/c):

USR (1.18 · 10−4 s/c; Intel Core2 2.0GHz).ESshape3D (9.15 · 10−4 s/c; Intel Core2 2.0GHz).Shape Signatures (50.82 s/c; 450MHz Intel Pentium III).ROCS (none).

As many queries are carried out, the important performancemeasure is how many conformers can be compared per second.

Pedro J. Ballester USR for Similarity Search 17

Page 62: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency

Most Important Feature of USR

Efficiency

Compare USR with three state-of-the-art methods: ESshape3D,Shape Signatures and ROCS.

Precalculating descriptors for a database (only once; in s/c):

USR (1.18 · 10−4 s/c; Intel Core2 2.0GHz).ESshape3D (9.15 · 10−4 s/c; Intel Core2 2.0GHz).Shape Signatures (50.82 s/c; 450MHz Intel Pentium III).ROCS (none).

As many queries are carried out, the important performancemeasure is how many conformers can be compared per second.

Pedro J. Ballester USR for Similarity Search 17

Page 63: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency

Most Important Feature of USR

Efficiency

Compare USR with three state-of-the-art methods: ESshape3D,Shape Signatures and ROCS.

Precalculating descriptors for a database (only once; in s/c):

USR (1.18 · 10−4 s/c; Intel Core2 2.0GHz).ESshape3D (9.15 · 10−4 s/c; Intel Core2 2.0GHz).Shape Signatures (50.82 s/c; 450MHz Intel Pentium III).ROCS (none).

As many queries are carried out, the important performancemeasure is how many conformers can be compared per second.

Pedro J. Ballester USR for Similarity Search 17

Page 64: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency

Most Important Feature of USR

Efficiency

Compare USR with three state-of-the-art methods: ESshape3D,Shape Signatures and ROCS.

Precalculating descriptors for a database (only once; in s/c):

USR (1.18 · 10−4 s/c; Intel Core2 2.0GHz).ESshape3D (9.15 · 10−4 s/c; Intel Core2 2.0GHz).Shape Signatures (50.82 s/c; 450MHz Intel Pentium III).ROCS (none).

As many queries are carried out, the important performancemeasure is how many conformers can be compared per second.

Pedro J. Ballester USR for Similarity Search 17

Page 65: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency

Efficiency Comparison with Descriptor-based Methods

USR is 1 546 times faster than ESshape3D.

USR is 2 038 times faster than Shape Signatures.

Pedro J. Ballester USR for Similarity Search 18

Page 66: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency

Efficiency Comparison with Descriptor-based Methods

USR is 1 546 times faster than ESshape3D.

USR is 2 038 times faster than Shape Signatures.

Pedro J. Ballester USR for Similarity Search 18

Page 67: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency

Efficiency Comparison with Descriptor-based Methods

USR is 1 546 times faster than ESshape3D.

USR is 2 038 times faster than Shape Signatures.

Pedro J. Ballester USR for Similarity Search 18

Page 68: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency

Efficiency Comparison with Superposition Methods

USR is 14 238 times faster than ROCS.

Based on ROCS’s reported comparison rate on a modernworkstation (USR on a 2.93 GHz Intel Core2 processor).

Pedro J. Ballester USR for Similarity Search 19

Page 69: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency

Efficiency Comparison with Superposition Methods

USR is 14 238 times faster than ROCS.

Based on ROCS’s reported comparison rate on a modernworkstation (USR on a 2.93 GHz Intel Core2 processor).

Pedro J. Ballester USR for Similarity Search 19

Page 70: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

FoundationsEncodingComparing Molecular ShapesEffectivenessEfficiency

Efficiency Comparison with Superposition Methods

USR is 14 238 times faster than ROCS.

Based on ROCS’s reported comparison rate on a modernworkstation (USR on a 2.93 GHz Intel Core2 processor).

Pedro J. Ballester USR for Similarity Search 19

Page 71: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Experimental SetupEnrichment PlotsUSR virtual query

Ligand-based Virtual Screening

Ballester, P.J., Finn, P.W. and Richards, W.G. (2007?)

Pedro J. Ballester USR for Similarity Search 20

Page 72: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Experimental SetupEnrichment PlotsUSR virtual query

Virtual Screening Validation

Aim

How good will be the method at identifying molecules with agiven activity?

Retrospective virtual screening experiment.

DrugBank-3D Test Database

Publicly available resource: DrugBank(http://redpoll.pharmacy.ualberta.ca/drugbank/index.html).

Input: set of 3 764 chemical structures formed by FDA-approved(708) and experimental (3 056) drugs.

MOE’s conformer generator→ an average of about 200conformations per compound (3 330 chemical structures).

DrugBank-3D: 666 892 conformers in 3D MDL SD format.

Pedro J. Ballester USR for Similarity Search 21

Page 73: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Experimental SetupEnrichment PlotsUSR virtual query

Virtual Screening Validation

Aim

How good will be the method at identifying molecules with agiven activity?

Retrospective virtual screening experiment.

DrugBank-3D Test Database

Publicly available resource: DrugBank(http://redpoll.pharmacy.ualberta.ca/drugbank/index.html).

Input: set of 3 764 chemical structures formed by FDA-approved(708) and experimental (3 056) drugs.

MOE’s conformer generator→ an average of about 200conformations per compound (3 330 chemical structures).

DrugBank-3D: 666 892 conformers in 3D MDL SD format.

Pedro J. Ballester USR for Similarity Search 21

Page 74: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Experimental SetupEnrichment PlotsUSR virtual query

Virtual Screening Validation

Aim

How good will be the method at identifying molecules with agiven activity?

Retrospective virtual screening experiment.

DrugBank-3D Test Database

Publicly available resource: DrugBank(http://redpoll.pharmacy.ualberta.ca/drugbank/index.html).

Input: set of 3 764 chemical structures formed by FDA-approved(708) and experimental (3 056) drugs.

MOE’s conformer generator→ an average of about 200conformations per compound (3 330 chemical structures).

DrugBank-3D: 666 892 conformers in 3D MDL SD format.

Pedro J. Ballester USR for Similarity Search 21

Page 75: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Experimental SetupEnrichment PlotsUSR virtual query

Virtual Screening Validation

Aim

How good will be the method at identifying molecules with agiven activity?

Retrospective virtual screening experiment.

DrugBank-3D Test Database

Publicly available resource: DrugBank(http://redpoll.pharmacy.ualberta.ca/drugbank/index.html).

Input: set of 3 764 chemical structures formed by FDA-approved(708) and experimental (3 056) drugs.

MOE’s conformer generator→ an average of about 200conformations per compound (3 330 chemical structures).

DrugBank-3D: 666 892 conformers in 3D MDL SD format.

Pedro J. Ballester USR for Similarity Search 21

Page 76: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Experimental SetupEnrichment PlotsUSR virtual query

Virtual Screening Validation

Aim

How good will be the method at identifying molecules with agiven activity?

Retrospective virtual screening experiment.

DrugBank-3D Test Database

Publicly available resource: DrugBank(http://redpoll.pharmacy.ualberta.ca/drugbank/index.html).

Input: set of 3 764 chemical structures formed by FDA-approved(708) and experimental (3 056) drugs.

MOE’s conformer generator→ an average of about 200conformations per compound (3 330 chemical structures).

DrugBank-3D: 666 892 conformers in 3D MDL SD format.

Pedro J. Ballester USR for Similarity Search 21

Page 77: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Experimental SetupEnrichment PlotsUSR virtual query

Virtual Screening Validation

Aim

How good will be the method at identifying molecules with agiven activity?

Retrospective virtual screening experiment.

DrugBank-3D Test Database

Publicly available resource: DrugBank(http://redpoll.pharmacy.ualberta.ca/drugbank/index.html).

Input: set of 3 764 chemical structures formed by FDA-approved(708) and experimental (3 056) drugs.

MOE’s conformer generator→ an average of about 200conformations per compound (3 330 chemical structures).

DrugBank-3D: 666 892 conformers in 3D MDL SD format.

Pedro J. Ballester USR for Similarity Search 21

Page 78: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Experimental SetupEnrichment PlotsUSR virtual query

Virtual Screening Validation

Aim

How good will be the method at identifying molecules with agiven activity?

Retrospective virtual screening experiment.

DrugBank-3D Test Database

Publicly available resource: DrugBank(http://redpoll.pharmacy.ualberta.ca/drugbank/index.html).

Input: set of 3 764 chemical structures formed by FDA-approved(708) and experimental (3 056) drugs.

MOE’s conformer generator→ an average of about 200conformations per compound (3 330 chemical structures).

DrugBank-3D: 666 892 conformers in 3D MDL SD format.

Pedro J. Ballester USR for Similarity Search 21

Page 79: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Experimental SetupEnrichment PlotsUSR virtual query

Virtual Screening Validation

Aim

How good will be the method at identifying molecules with agiven activity?

Retrospective virtual screening experiment.

DrugBank-3D Test Database

Publicly available resource: DrugBank(http://redpoll.pharmacy.ualberta.ca/drugbank/index.html).

Input: set of 3 764 chemical structures formed by FDA-approved(708) and experimental (3 056) drugs.

MOE’s conformer generator→ an average of about 200conformations per compound (3 330 chemical structures).

DrugBank-3D: 666 892 conformers in 3D MDL SD format.

Pedro J. Ballester USR for Similarity Search 21

Page 80: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Experimental SetupEnrichment PlotsUSR virtual query

Query Selection

Adopted Procedure (Nicholls et al., 2004)

For each activity class:

1 Consider the lowest energy conformer of each activecompound.

2 Perform hierarchical agglomerative clustering on the USRsimilarity matrix of these conformers (Sthreshold = 0.75).

3 Main cluster ≡ that with highest number of actives.4 Identify the closest of these conformers to the centroid of

the main cluster (consensus shape template).5 Use shape template as query against the whole database.

Pedro J. Ballester USR for Similarity Search 22

Page 81: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Experimental SetupEnrichment PlotsUSR virtual query

Query Selection

Adopted Procedure (Nicholls et al., 2004)

For each activity class:

1 Consider the lowest energy conformer of each activecompound.

2 Perform hierarchical agglomerative clustering on the USRsimilarity matrix of these conformers (Sthreshold = 0.75).

3 Main cluster ≡ that with highest number of actives.4 Identify the closest of these conformers to the centroid of

the main cluster (consensus shape template).5 Use shape template as query against the whole database.

Pedro J. Ballester USR for Similarity Search 22

Page 82: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Experimental SetupEnrichment PlotsUSR virtual query

Query Selection

Adopted Procedure (Nicholls et al., 2004)

For each activity class:

1 Consider the lowest energy conformer of each activecompound.

2 Perform hierarchical agglomerative clustering on the USRsimilarity matrix of these conformers (Sthreshold = 0.75).

3 Main cluster ≡ that with highest number of actives.4 Identify the closest of these conformers to the centroid of

the main cluster (consensus shape template).5 Use shape template as query against the whole database.

Pedro J. Ballester USR for Similarity Search 22

Page 83: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Experimental SetupEnrichment PlotsUSR virtual query

Query Selection

Adopted Procedure (Nicholls et al., 2004)

For each activity class:

1 Consider the lowest energy conformer of each activecompound.

2 Perform hierarchical agglomerative clustering on the USRsimilarity matrix of these conformers (Sthreshold = 0.75).

3 Main cluster ≡ that with highest number of actives.4 Identify the closest of these conformers to the centroid of

the main cluster (consensus shape template).5 Use shape template as query against the whole database.

Pedro J. Ballester USR for Similarity Search 22

Page 84: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Experimental SetupEnrichment PlotsUSR virtual query

Query Selection

Adopted Procedure (Nicholls et al., 2004)

For each activity class:

1 Consider the lowest energy conformer of each activecompound.

2 Perform hierarchical agglomerative clustering on the USRsimilarity matrix of these conformers (Sthreshold = 0.75).

3 Main cluster ≡ that with highest number of actives.4 Identify the closest of these conformers to the centroid of

the main cluster (consensus shape template).5 Use shape template as query against the whole database.

Pedro J. Ballester USR for Similarity Search 22

Page 85: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Experimental SetupEnrichment PlotsUSR virtual query

Query Selection

Adopted Procedure (Nicholls et al., 2004)

For each activity class:

1 Consider the lowest energy conformer of each activecompound.

2 Perform hierarchical agglomerative clustering on the USRsimilarity matrix of these conformers (Sthreshold = 0.75).

3 Main cluster ≡ that with highest number of actives.4 Identify the closest of these conformers to the centroid of

the main cluster (consensus shape template).5 Use shape template as query against the whole database.

Pedro J. Ballester USR for Similarity Search 22

Page 86: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Experimental SetupEnrichment PlotsUSR virtual query

Query Selection

Adopted Procedure (Nicholls et al., 2004)

For each activity class:

1 Consider the lowest energy conformer of each activecompound.

2 Perform hierarchical agglomerative clustering on the USRsimilarity matrix of these conformers (Sthreshold = 0.75).

3 Main cluster ≡ that with highest number of actives.4 Identify the closest of these conformers to the centroid of

the main cluster (consensus shape template).5 Use shape template as query against the whole database.

Pedro J. Ballester USR for Similarity Search 22

Page 87: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Experimental SetupEnrichment PlotsUSR virtual query

Query Selection

Used Activity Classes

Activity Name # of Actives (Ai)TK Thymidine Kinase 13HH1R Histamine H1 Receptor 41COX-2 Cyclooxygenase-2 28NM Neuraminidase 85-HT-2A 5-HT-2A Receptor 15ER Estrogen Receptor 24PR Progesterone Receptor 12TKTL Transketolase 3AT1 Type-1 Angiotensin II Receptor 8HIV-1 HIV-1 Protease 6

Pedro J. Ballester USR for Similarity Search 23

Page 88: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Experimental SetupEnrichment PlotsUSR virtual query

Query Selection

Used Activity Classes

Activity Name # of Actives (Ai)TK Thymidine Kinase 13HH1R Histamine H1 Receptor 41COX-2 Cyclooxygenase-2 28NM Neuraminidase 85-HT-2A 5-HT-2A Receptor 15ER Estrogen Receptor 24PR Progesterone Receptor 12TKTL Transketolase 3AT1 Type-1 Angiotensin II Receptor 8HIV-1 HIV-1 Protease 6

Pedro J. Ballester USR for Similarity Search 23

Page 89: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Experimental SetupEnrichment PlotsUSR virtual query

Query Selection

Used Activity Classes

Activity Name # of Actives (Ai)TK Thymidine Kinase 13HH1R Histamine H1 Receptor 41COX-2 Cyclooxygenase-2 28NM Neuraminidase 85-HT-2A 5-HT-2A Receptor 15ER Estrogen Receptor 24PR Progesterone Receptor 12TKTL Transketolase 3AT1 Type-1 Angiotensin II Receptor 8HIV-1 HIV-1 Protease 6

Pedro J. Ballester USR for Similarity Search 23

Page 90: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Experimental SetupEnrichment PlotsUSR virtual query

Activity Class: 5-HT-2A Receptor

Pedro J. Ballester USR for Similarity Search 24

Page 91: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Experimental SetupEnrichment PlotsUSR virtual query

Activity Class: 5-HT-2A Receptor

Pedro J. Ballester USR for Similarity Search 24

Page 92: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Experimental SetupEnrichment PlotsUSR virtual query

Selected Queries for each Activity

Activity Query # of Heavy Atoms (N)TK EXPT01835-1 17HH1R APRD00587-1 19COX-2 APRD01060-1 19NM EXPT00332-1 205-HT-2A APRD00033-1 22ER APRD00754-1 23PR APRD00941-1 23TKTL EXPT02273-1 26AT1 APRD00052-1 30HIV-1 APRD00623-1 49

Pedro J. Ballester USR for Similarity Search 25

Page 93: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Experimental SetupEnrichment PlotsUSR virtual query

Selected Queries for each Activity

Activity Query # of Heavy Atoms (N)TK EXPT01835-1 17HH1R APRD00587-1 19COX-2 APRD01060-1 19NM EXPT00332-1 205-HT-2A APRD00033-1 22ER APRD00754-1 23PR APRD00941-1 23TKTL EXPT02273-1 26AT1 APRD00052-1 30HIV-1 APRD00623-1 49

Pedro J. Ballester USR for Similarity Search 25

Page 94: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Experimental SetupEnrichment PlotsUSR virtual query

Selected Queries for each Activity

Activity Query # of Heavy Atoms (N)TK EXPT01835-1 17HH1R APRD00587-1 19COX-2 APRD01060-1 19NM EXPT00332-1 205-HT-2A APRD00033-1 22ER APRD00754-1 23PR APRD00941-1 23TKTL EXPT02273-1 26AT1 APRD00052-1 30HIV-1 APRD00623-1 49

Pedro J. Ballester USR for Similarity Search 25

Page 95: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Experimental SetupEnrichment PlotsUSR virtual query

Shape of Selected Queries

TK HH1R COX-2 NM 5-HT-2A

ER PR TKTL AT1 HIV-1

Pedro J. Ballester USR for Similarity Search 26

Page 96: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Experimental SetupEnrichment PlotsUSR virtual query

Enrichment Plot: 5-HT-2A Receptor

5-HT-2A (APRD00033-1)

0

5

10

15

20

25

30

35

40

45

50

0 1 2 3 4 5

top x%

E(x%)

USR

ESshape3D

Pedro J. Ballester USR for Similarity Search 27

Page 97: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Experimental SetupEnrichment PlotsUSR virtual query

Comparing USR and ESshape3D on the 10 Activities

Mean Enrichment Top 1% Top 3% Top 5%USR 25.5 9.9 7.5ESshape3D 15.8 8.0 6.0

Pedro J. Ballester USR for Similarity Search 28

Page 98: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Experimental SetupEnrichment PlotsUSR virtual query

Enrichment Plot: Thymidine Kinase

Thymidine Kinase (EXPT01835-1)

0

10

20

30

40

50

60

0 1 2 3 4 5

top x%

E(x%)

ESshape3D

USR

Pedro J. Ballester USR for Similarity Search 29

Page 99: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Experimental SetupEnrichment PlotsUSR virtual query

A Possible Explanation

Thymidine Kinase (EXPT01835-1)

0

10

20

30

40

50

60

0 1 2 3 4 5

top x%

E(x%)

ESshape3D

USR

+

++

x +

xx

x

x x

xx

Pedro J. Ballester USR for Similarity Search 30

Page 100: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Experimental SetupEnrichment PlotsUSR virtual query

A Possible Explanation

Thymidine Kinase (EXPT01835-1)

0

10

20

30

40

50

60

0 1 2 3 4 5

top x%

E(x%)

ESshape3D

USR

+

++

x +

xx

x

x x

xx

Pedro J. Ballester USR for Similarity Search 30

Page 101: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Experimental SetupEnrichment PlotsUSR virtual query

A Possible Explanation

Thymidine Kinase (EXPT01835-1)

0

10

20

30

40

50

60

0 1 2 3 4 5

top x%

E(x%)

ESshape3D

USR

+

++

x +

xx

x

x x

xx

Pedro J. Ballester USR for Similarity Search 30

Page 102: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Experimental SetupEnrichment PlotsUSR virtual query

A Possible Explanation

Thymidine Kinase (EXPT01835-1)

0

10

20

30

40

50

60

0 1 2 3 4 5

top x%

E(x%)

ESshape3D

USR

USR(C1-CTD)

++

+x

+

x

x

x

x x

xx

Pedro J. Ballester USR for Similarity Search 30

Page 103: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Experimental SetupEnrichment PlotsUSR virtual query

Enrichment Plot: 5-HT-2A Receptor

5-HT-2A (APRD00033-1)

0

5

10

15

20

25

30

35

40

45

50

0 1 2 3 4 5

top x%

E(x%)

USR

ESshape3D

USR(C1-CTD)

Pedro J. Ballester USR for Similarity Search 31

Page 104: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Experimental SetupEnrichment PlotsUSR virtual query

Comparing USR and ESshape3D on the 10 Activities

Mean Enrichment Top 1% Top 3% Top 5%USR-CTD 36.9 14.4 10.2USR 25.5 9.9 7.5ESshape3D 15.8 8.0 6.0

Pedro J. Ballester USR for Similarity Search 32

Page 105: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Future Work

In Collaboration with:

� GlaxoSmithKline (Harlow, UK)

� Pfizer (Groton, USA)

� University of Oxford (Dept. of Pharmacology)

Pedro J. Ballester USR for Similarity Search 33

Page 106: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Topics for Future Research

Virtual Screening (VS): Validating generalisation ability of VSmethods using biological screening data.

Prospective VS on selected activities.

Clustering molecular databases in terms of shape:

USR-based clustering on multi-million databases.Screening strategies for Docking and HTS.

Combining USR with other VS methods (electrostatics).

Pedro J. Ballester USR for Similarity Search 34

Page 107: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Topics for Future Research

Virtual Screening (VS): Validating generalisation ability of VSmethods using biological screening data.

Prospective VS on selected activities.

Clustering molecular databases in terms of shape:

USR-based clustering on multi-million databases.Screening strategies for Docking and HTS.

Combining USR with other VS methods (electrostatics).

Pedro J. Ballester USR for Similarity Search 34

Page 108: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Topics for Future Research

Virtual Screening (VS): Validating generalisation ability of VSmethods using biological screening data.

Prospective VS on selected activities.

Clustering molecular databases in terms of shape:

USR-based clustering on multi-million databases.Screening strategies for Docking and HTS.

Combining USR with other VS methods (electrostatics).

Pedro J. Ballester USR for Similarity Search 34

Page 109: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Topics for Future Research

Virtual Screening (VS): Validating generalisation ability of VSmethods using biological screening data.

Prospective VS on selected activities.

Clustering molecular databases in terms of shape:

USR-based clustering on multi-million databases.Screening strategies for Docking and HTS.

Combining USR with other VS methods (electrostatics).

Pedro J. Ballester USR for Similarity Search 34

Page 110: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Topics for Future Research

Virtual Screening (VS): Validating generalisation ability of VSmethods using biological screening data.

Prospective VS on selected activities.

Clustering molecular databases in terms of shape:

USR-based clustering on multi-million databases.Screening strategies for Docking and HTS.

Combining USR with other VS methods (electrostatics).

Pedro J. Ballester USR for Similarity Search 34

Page 111: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Topics for Future Research

Virtual Screening (VS): Validating generalisation ability of VSmethods using biological screening data.

Prospective VS on selected activities.

Clustering molecular databases in terms of shape:

USR-based clustering on multi-million databases.Screening strategies for Docking and HTS.

Combining USR with other VS methods (electrostatics).

Pedro J. Ballester USR for Similarity Search 34

Page 112: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Topics for Future Research

Virtual Screening (VS): Validating generalisation ability of VSmethods using biological screening data.

Prospective VS on selected activities.

Clustering molecular databases in terms of shape:

USR-based clustering on multi-million databases.Screening strategies for Docking and HTS.

Combining USR with other VS methods (electrostatics).

Pedro J. Ballester USR for Similarity Search 34

Page 113: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Conclusions

Pedro J. Ballester USR for Similarity Search 35

Page 114: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Conclusions

A novel molecular shape comparison approach (USR) hasbeen proposed.

Effective at identifying similarly shaped conformers in adatabase.

Effective in retrospective virtual screening across a set ofdiverse activities:

Adopted query selection procedure works well.Proposed USR virtual query improves performance.Suggest that USR will be effective in prospective virtualscreening.

Extremely fast: more than 3 orders de magnitude fasterthan the fastest existing methods!

Pedro J. Ballester USR for Similarity Search 36

Page 115: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Conclusions

A novel molecular shape comparison approach (USR) hasbeen proposed.

Effective at identifying similarly shaped conformers in adatabase.

Effective in retrospective virtual screening across a set ofdiverse activities:

Adopted query selection procedure works well.Proposed USR virtual query improves performance.Suggest that USR will be effective in prospective virtualscreening.

Extremely fast: more than 3 orders de magnitude fasterthan the fastest existing methods!

Pedro J. Ballester USR for Similarity Search 36

Page 116: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Conclusions

A novel molecular shape comparison approach (USR) hasbeen proposed.

Effective at identifying similarly shaped conformers in adatabase.

Effective in retrospective virtual screening across a set ofdiverse activities:

Adopted query selection procedure works well.Proposed USR virtual query improves performance.Suggest that USR will be effective in prospective virtualscreening.

Extremely fast: more than 3 orders de magnitude fasterthan the fastest existing methods!

Pedro J. Ballester USR for Similarity Search 36

Page 117: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Conclusions

A novel molecular shape comparison approach (USR) hasbeen proposed.

Effective at identifying similarly shaped conformers in adatabase.

Effective in retrospective virtual screening across a set ofdiverse activities:

Adopted query selection procedure works well.Proposed USR virtual query improves performance.Suggest that USR will be effective in prospective virtualscreening.

Extremely fast: more than 3 orders de magnitude fasterthan the fastest existing methods!

Pedro J. Ballester USR for Similarity Search 36

Page 118: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Conclusions

A novel molecular shape comparison approach (USR) hasbeen proposed.

Effective at identifying similarly shaped conformers in adatabase.

Effective in retrospective virtual screening across a set ofdiverse activities:

Adopted query selection procedure works well.Proposed USR virtual query improves performance.Suggest that USR will be effective in prospective virtualscreening.

Extremely fast: more than 3 orders de magnitude fasterthan the fastest existing methods!

Pedro J. Ballester USR for Similarity Search 36

Page 119: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Conclusions

A novel molecular shape comparison approach (USR) hasbeen proposed.

Effective at identifying similarly shaped conformers in adatabase.

Effective in retrospective virtual screening across a set ofdiverse activities:

Adopted query selection procedure works well.Proposed USR virtual query improves performance.Suggest that USR will be effective in prospective virtualscreening.

Extremely fast: more than 3 orders de magnitude fasterthan the fastest existing methods!

Pedro J. Ballester USR for Similarity Search 36

Page 120: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Conclusions

A novel molecular shape comparison approach (USR) hasbeen proposed.

Effective at identifying similarly shaped conformers in adatabase.

Effective in retrospective virtual screening across a set ofdiverse activities:

Adopted query selection procedure works well.Proposed USR virtual query improves performance.Suggest that USR will be effective in prospective virtualscreening.

Extremely fast: more than 3 orders de magnitude fasterthan the fastest existing methods!

Pedro J. Ballester USR for Similarity Search 36

Page 121: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Conclusions

A novel molecular shape comparison approach (USR) hasbeen proposed.

Effective at identifying similarly shaped conformers in adatabase.

Effective in retrospective virtual screening across a set ofdiverse activities:

Adopted query selection procedure works well.Proposed USR virtual query improves performance.Suggest that USR will be effective in prospective virtualscreening.

Extremely fast: more than 3 orders de magnitude fasterthan the fastest existing methods!

Pedro J. Ballester USR for Similarity Search 36

Page 122: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Conclusions

USR could be adapted to other shape recognition problems(e.g. internet search engine for 3D Shapes).

Overall, this work has attracted the attention of the media.

Pedro J. Ballester USR for Similarity Search 37

Page 123: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Conclusions

USR could be adapted to other shape recognition problems(e.g. internet search engine for 3D Shapes).

Overall, this work has attracted the attention of the media.

Pedro J. Ballester USR for Similarity Search 37

Page 124: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Conclusions

USR could be adapted to other shape recognition problems(e.g. internet search engine for 3D Shapes).

Overall, this work has attracted the attention of the media.

Pedro J. Ballester USR for Similarity Search 37

Page 125: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Acknowledgements

Graham Richards (University of Oxford):feedback.

Paul Finn (Inhibox Ltd.):feedback and preparing molecular databases.

US National Foundation of Cancer Research:funding.

Pedro J. Ballester USR for Similarity Search 38

Page 126: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Acknowledgements

Graham Richards (University of Oxford):feedback.

Paul Finn (Inhibox Ltd.):feedback and preparing molecular databases.

US National Foundation of Cancer Research:funding.

Pedro J. Ballester USR for Similarity Search 38

Page 127: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Acknowledgements

Graham Richards (University of Oxford):feedback.

Paul Finn (Inhibox Ltd.):feedback and preparing molecular databases.

US National Foundation of Cancer Research:funding.

Pedro J. Ballester USR for Similarity Search 38

Page 128: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Acknowledgements

Graham Richards (University of Oxford):feedback.

Paul Finn (Inhibox Ltd.):feedback and preparing molecular databases.

US National Foundation of Cancer Research:funding.

Pedro J. Ballester USR for Similarity Search 38

Page 129: Ultrafast Shape Recognition for Similarity Search in ...cisrg.shef.ac.uk/shef2007/talks/ballester.pdf · Molecular Shape Comparison Molecular shape has been widely highlighted as

IntroductionUltrafast Shape Recognition

Ligand-based Virtual ScreeningFuture WorkConclusions

Thank You

Pedro J. Ballester USR for Similarity Search 39