Structure generation, metabolite space, and metabolite likeness

54
Julio E. Peironcely @peyron Juliopeironcely.com PhD student at Leiden University and TNO Structure Generation, Metabolite Space, and Metabolite-Likeness

Transcript of Structure generation, metabolite space, and metabolite likeness

Julio E. Peironcely @peyron

Juliopeironcely.com

PhD student at Leiden University and TNO

Structure Generation, Metabolite Space, and Metabolite-Likeness

Metabolomics

the quantitative and qualitative analysis of all metabolites in

samples of cells, body fluids, tissues, etc.

Julio E. Peironcely

Metabolomics

Julio E. Peironcely

Biological question

Sample preparation

Experi- mental design

Data acquisition

Data pre- processing

Biological inter-

pretation

Data analysis

Samples Raw data List of peaks/ biomolecules

Relevant biomolecules/ connectivities

& Models

Metabolites

Sampling

Protocol

Metabolomics

Julio E. Peironcely

Biological question

Sample preparation

Experi- mental design

Data acquisition

Data pre- processing

Biological inter-

pretation

Data analysis

Samples Raw data List of peaks/ biomolecules

Relevant biomolecules/ connectivities

& Models

Metabolites

Sampling

Protocol

De-novo identification

We have

Julio E. Peironcely

Elemental Composition

Fragments (sometimes)

Experimental Information

We want

Julio E. Peironcely

List Of Candidate Structures

As Short As Possible

Good Structure Is In The List

We need

Julio E. Peironcely

Structure Generator

Keep only metabolites

Use experimental information to filter molecules

Elemental Composition

Julio E. Peironcely

Elemental Composition

Structure Generation

Julio E. Peironcely

Elemental Composition

Structure Generation

Molecules

Julio E. Peironcely

Structure Generator

In collaboration with Jean-Loup Faulon, Evry University Julio E. Peironcely

Elemental  Formula  

Generate  

Candidate  Structures  

Fragments  

Structure Generator

In collaboration with Jean-Loup Faulon, Evry University Julio E. Peironcely

Elemental  Formula  

Generate  Keep  Molecules  if  

Canonical  Augmenta:on  

Candidate  Structures  

Fragments  

Structure Generator

In collaboration with Jean-Loup Faulon, Evry University Julio E. Peironcely

Adding bonds

Structure Generator

In collaboration with Jean-Loup Faulon, Evry University Julio E. Peironcely

Isomorphism

Isomorphic class “triangle + 1 edge”

Isomorphic class “3-edge chain”

2 3

4

12 3

4 3

4 3

4

3

4

11

1

2 2

2

2 3

4

12 3

4 3

4 3

4

3

4

1

1

21

2

2

1

Structure Generator

In collaboration with Jean-Loup Faulon, Evry University Julio E. Peironcely

Isomorphism

Isomorphic class “triangle + 1 edge”

Isomorphic class “3-edge chain”

2 3

4

12 3

4 3

4 3

4

3

4

11

1

2 2

2

2 3

4

12 3

4 3

4 3

4

3

4

1

1

21

2

2

1

Output  ONLY  orange  graphs  

Structure Generator

Julio E. Peironcely

Canonical Labeling

2 3

4

12 3

4 3

4 3

4

3

4

11

1

2 2

2

2 3

4

12 3

4 3

4 3

4

3

4

1

1

21

2

2

1

                 Canonizer      (Nauty)  

(1,2) (1,3) (1,4) (2,3)

(1,2) (1,3) (2,4)

Only 1 canonical labeling in each

isomorphic class

(1,2)(1,3)(1,4)(2,3)

1

2 3

4 5

(1,2)

1

2 3

4 5

(1,2)(1,3)

1

2 3

4 5

(1,2)(1,3)(1,4)

1

2 3

4 5

(1,2)(1,3)(2,3)

1

2 3

4 5

1

2 3

4 5

(1,2)(1,3)(2,3)(2,4)

1

2 3

4 5 (1,2)(1,3)(1,4)(3,4)

1

2 3

4 5 (1,2)(1,3)(1,4)(4,5)

1

2 3

4 5

X

Use canonizer to remove duplicates after each extension

Canonical Augmentation

Julio E. Peironcely

A canonical object

augmented in a canonical way

produces a canonical object

Check For Canonical Augmentation

Julio E. Peironcely

Keep object if

a canonical deletion

takes you to the canonical father

(1,2)(1,3)(1,4)(2,3)

1

2 3

4 5

(1,2)

1

2 3

4 5

(1,2)(1,3)

1

2 3

4 5

(1,2)(1,3)(1,4)

1

2 3

4 5

(1,2)(1,3)(2,3)

1

2 3

4 5

1

2 3

4 5

(1,2)(1,3)(2,3)(2,4)

1

2 3

4 5 (1,2)(1,3)(1,4)(4,5)

1

2 3

4 5

Accept only canonically

augmented graphs

(1,2)(1,3)(1,4)(3,4)

2 3

4 5

X

1

X

Structure Generator Results

Glycine Phenylalanine Malic acid D-Cysteine p-Cresol sulfate

C2H5NO2 C9H11NO2 C4H6O5 C3H7NO2S C7H8O3S

84 277,810,163 8,070 3,838 10,203,389

6 4,037,499 1,601 100 19,940

93,137 948

584

278

Elemental Composition

# Output Molecules

1 Fragment

2 Fragments

3 Fragments

MOLGEN same # of molecules

In collaboration with Jean-Loup Faulon, Evry University Julio E. Peironcely

Lots of candidates structures

We are looking for metabolites

Elemental Composition

Structure Generation

Molecules

Metabolite Likeness

Julio E. Peironcely

Elemental Composition

Structure Generation

Molecules

Metabolite Likeness

Metabolites

Julio E. Peironcely

How do metabolites look like?

Understanding and Classifying Metabolite Space and Metabolite-Likeness Julio E. Peironcely et al. PLoS One (in press)

HMDB 8K

ZINC 21M

Julio E. Peironcely

metabolites non metabolites

Water Solubility MW

C Atoms Struc. Complexity

PSA

Julio E. Peironcely

PCA

Julio E. Peironcely

PCA

Not so different

Decision Tree

Julio E. Peironcely

Elemental Composition

Structure Generation

Molecules

Metabolite Likeness

Metabolites

Julio E. Peironcely

Metabolite-likeness

Julio E. Peironcely

HMDB 8K

ZINC 21M

Atom Counts

Physicochemical desc.

MDL Public Keys

FCFP_4

ECFP_4

Support Vector Machines (SVM)

Random Forest (RF)

Naïve Bayes (NB)

Representation + Classification

Metabolite-likeness

Julio E. Peironcely

HMDB 8K

ZINC 21M

Standardization

Diversity Selection Atom Counts Physicochemical desc.

MDL Public Keys FCFP_4 ECFP_4

Metabolite-likeness

Julio E. Peironcely

Training Set 532 + 532

HMDB 8K

ZINC 21M

Standardization

Diversity Selection

Test Set 6.4K + 6.4K

Atom Counts Physicochemical desc.

MDL Public Keys FCFP_4 ECFP_4

Metabolite-likeness

Julio E. Peironcely

Training Set 532 + 532

HMDB 8K

ZINC 21M

Standardization

Diversity Selection

Test Set 6.4K + 6.4K

5-fold CV

SVM RF BC

Atom Counts Physicochemical desc.

MDL Public Keys FCFP_4 ECFP_4

Metabolite-likeness

Julio E. Peironcely

Training Set 532 + 532

HMDB 8K

ZINC 21M

Standardization

Diversity Selection

Test Set 6.4K + 6.4K

5-fold CV

SVM RF BC

Metabolite likeness

3 classifiers X

5 descriptions

Metabolite-likeness

Julio E. Peironcely

Training Set 532 + 532

HMDB 8K

ZINC 21M

Standardization

Diversity Selection

Test Set 6.4K + 6.4K

5-fold CV

SVM RF BC

Metabolite likeness

Best = RF – MDLPublicKeys

Sensitivity Specificity AUC

99.84% 87.52% 99.20%

Bad BC – P_desc

Sensitivity Specificity AUC

42.51% 86.56% 61.57%

Metabolite-likeness, external validation

Julio E. Peironcely

HMDB External

validation set ChEMBL

Metabolite likeness

DrugBank

Standardization

Random Selection

Metabolite-likeness, external validation

Julio E. Peironcely

Met-likeness + structure generation (malic acid) 8K

Julio E. Peironcely

100%

57% 77%

Met-likeness + structure generation (methylhistamine) 260K

Julio E. Peironcely

46% 71%

What else do we know about our molecules?

Phenylalanine Molecule Minimized_Energy ALogP Index

0.1100 -1.605 5142

Julio E. Peironcely

Molecule Minimized_Energy ALogP Index

0.1100 -1.605 5142

C9H11NO2

Structure Generation

277 M

Julio E. Peironcely

Molecule Minimized_Energy ALogP Index

0.1100 -1.605 5142

C9H11NO2

Structure Generation

41 K

44%

99%

Julio E. Peironcely

Molecule Minimized_Energy ALogP Index

0.1100 -1.605 5142

C9H11NO2

Structure Generation

8 K

E < 10

40%

Julio E. Peironcely

Molecule Minimized_Energy ALogP Index

0.1100 -1.605 5142

C9H11NO2

Structure Generation

31

E < 10

ALogP < -1

76%

Conclusions

Julio E. Peironcely

Met-Likeness prediction is good, interpretation not

Local models needed

Structure Generator + Met-Likeness + other constraints = Met Id

improvement

Acknowledgements

TNO Quality of Life Leon Coulier Albert Tas

Evry University Jean-Loup Faulon Davide Fichera

HMP University of Alberta David Wishart Ying (Edison) Dong

Leiden University Miguel Rojas-Cherto Piotr Kasper Michael van Vliet Theo Reijmers Rob Vreeken Ronnie van Doorn Thomas Hankemeier

University of Cambridge Andreas Bender

Julio E. Peironcely