Structure generation, metabolite space, and metabolite likeness

download Structure generation, metabolite space, and metabolite likeness

of 54

  • date post

    12-Jul-2015
  • Category

    Technology

  • view

    416
  • download

    0

Embed Size (px)

Transcript of Structure generation, metabolite space, and metabolite likeness

  • Julio E. Peironcely @peyron

    Juliopeironcely.com

    PhD student at Leiden University and TNO

    Structure Generation, Metabolite Space, and Metabolite-Likeness

  • Metabolomics

    the quantitative and qualitative analysis of all metabolites in

    samples of cells, body fluids, tissues, etc.

    Julio E. Peironcely

  • Metabolomics

    Julio E. Peironcely

    Biological question

    Sample preparation

    Experi- mental design

    Data acquisition

    Data pre- processing

    Biological inter-

    pretation

    Data analysis

    Samples Raw data List of peaks/ biomolecules

    Relevant biomolecules/ connectivities

    & Models

    Metabolites

    Sampling

    Protocol

  • Metabolomics

    Julio E. Peironcely

    Biological question

    Sample preparation

    Experi- mental design

    Data acquisition

    Data pre- processing

    Biological inter-

    pretation

    Data analysis

    Samples Raw data List of peaks/ biomolecules

    Relevant biomolecules/ connectivities

    & Models

    Metabolites

    Sampling

    Protocol

  • De-novo identification

  • We have

    Julio E. Peironcely

    Elemental Composition

    Fragments (sometimes)

    Experimental Information

  • We want

    Julio E. Peironcely

    List Of Candidate Structures

    As Short As Possible

    Good Structure Is In The List

  • We need

    Julio E. Peironcely

    Structure Generator

    Keep only metabolites

    Use experimental information to filter molecules

  • Elemental Composition

    Julio E. Peironcely

  • Elemental Composition

    Structure Generation

    Julio E. Peironcely

  • Elemental Composition

    Structure Generation

    Molecules

    Julio E. Peironcely

  • Structure Generator

    In collaboration with Jean-Loup Faulon, Evry University Julio E. Peironcely

    Elemental Formula

    Generate

    Candidate Structures

    Fragments

  • Structure Generator

    In collaboration with Jean-Loup Faulon, Evry University Julio E. Peironcely

    Elemental Formula

    Generate Keep Molecules if

    Canonical Augmenta:on

    Candidate Structures

    Fragments

  • Structure Generator

    In collaboration with Jean-Loup Faulon, Evry University Julio E. Peironcely

    Adding bonds

  • Structure Generator

    In collaboration with Jean-Loup Faulon, Evry University Julio E. Peironcely

    Isomorphism

    Isomorphic class triangle + 1 edge

    Isomorphic class 3-edge chain

    2 3

    4

    12 3

    4 3

    4 3

    4

    3

    4

    11

    1

    2 2

    2

    2 3

    4

    12 3

    4 3

    4 3

    4

    3

    4

    1

    1

    21

    2

    2

    1

  • Structure Generator

    In collaboration with Jean-Loup Faulon, Evry University Julio E. Peironcely

    Isomorphism

    Isomorphic class triangle + 1 edge

    Isomorphic class 3-edge chain

    2 3

    4

    12 3

    4 3

    4 3

    4

    3

    4

    11

    1

    2 2

    2

    2 3

    4

    12 3

    4 3

    4 3

    4

    3

    4

    1

    1

    21

    2

    2

    1

    Output ONLY orange graphs

  • Structure Generator

    Julio E. Peironcely

    Canonical Labeling

    2 3

    4

    12 3

    4 3

    4 3

    4

    3

    4

    11

    1

    2 2

    2

    2 3

    4

    12 3

    4 3

    4 3

    4

    3

    4

    1

    1

    21

    2

    2

    1

    Canonizer (Nauty)

    (1,2) (1,3) (1,4) (2,3)

    (1,2) (1,3) (2,4)

  • Only 1 canonical labeling in each

    isomorphic class

  • (1,2)(1,3)(1,4)(2,3)

    1

    2 3

    4 5

    (1,2)

    1

    2 3

    4 5

    (1,2)(1,3)

    1

    2 3

    4 5

    (1,2)(1,3)(1,4)

    1

    2 3

    4 5

    (1,2)(1,3)(2,3)

    1

    2 3

    4 5

    1

    2 3

    4 5

    (1,2)(1,3)(2,3)(2,4)

    1

    2 3

    4 5 (1,2)(1,3)(1,4)(3,4)

    1

    2 3

    4 5 (1,2)(1,3)(1,4)(4,5)

    1

    2 3

    4 5

    X

    Use canonizer to remove duplicates after each extension

  • Canonical Augmentation

    Julio E. Peironcely

    A canonical object

    augmented in a canonical way

    produces a canonical object

  • Check For Canonical Augmentation

    Julio E. Peironcely

    Keep object if

    a canonical deletion

    takes you to the canonical father

  • (1,2)(1,3)(1,4)(2,3)

    1

    2 3

    4 5

    (1,2)

    1

    2 3

    4 5

    (1,2)(1,3)

    1

    2 3

    4 5

    (1,2)(1,3)(1,4)

    1

    2 3

    4 5

    (1,2)(1,3)(2,3)

    1

    2 3

    4 5

    1

    2 3

    4 5

    (1,2)(1,3)(2,3)(2,4)

    1

    2 3

    4 5 (1,2)(1,3)(1,4)(4,5)

    1

    2 3

    4 5

    Accept only canonically

    augmented graphs

    (1,2)(1,3)(1,4)(3,4)

    2 3

    4 5

    X

    1

    X

  • Structure Generator Results

    Glycine Phenylalanine Malic acid D-Cysteine p-Cresol sulfate

    C2H5NO2 C9H11NO2 C4H6O5 C3H7NO2S C7H8O3S

    84 277,810,163 8,070 3,838 10,203,389

    6 4,037,499 1,601 100 19,940

    93,137 948

    584

    278

    Elemental Composition

    # Output Molecules

    1 Fragment

    2 Fragments

    3 Fragments

    MOLGEN same # of molecules

    In collaboration with Jean-Loup Faulon, Evry University Julio E. Peironcely

  • Lots of candidates structures

  • We are looking for metabolites

  • Elemental Composition

    Structure Generation

    Molecules

    Metabolite Likeness

    Julio E. Peironcely

  • Elemental Composition

    Structure Generation

    Molecules

    Metabolite Likeness

    Metabolites

    Julio E. Peironcely

  • How do metabolites look like?

    Understanding and Classifying Metabolite Space and Metabolite-Likeness Julio E. Peironcely et al. PLoS One (in press)

  • HMDB 8K

    ZINC 21M

    Julio E. Peironcely

  • metabolites non metabolites

    Water Solubility MW

    C Atoms Struc. Complexity

    PSA

    Julio E. Peironcely

  • PCA

    Julio E. Peironcely

  • PCA

  • Not so different

  • Decision Tree

    Julio E. Peironcely

  • Elemental Composition

    Structure Generation

    Molecules

    Metabolite Likeness

    Metabolites

    Julio E. Peironcely

  • Metabolite-likeness

    Julio E. Peironcely

    HMDB 8K

    ZINC 21M

    Atom Counts

    Physicochemical desc.

    MDL Public Keys

    FCFP_4

    ECFP_4

    Support Vector Machines (SVM)

    Random Forest (RF)

    Nave Bayes (NB)

    Representation + Classification

  • Metabolite-likeness

    Julio E. Peironcely

    HMDB 8K

    ZINC 21M

    Standardization

    Diversity Selection Atom Counts Physicochemical desc.

    MDL Public Keys FCFP_4 ECFP_4

  • Metabolite-likeness

    Julio E. Peironcely

    Training Set 532 + 532

    HMDB 8K

    ZINC 21M

    Standardization

    Diversity Selection

    Test Set 6.4K + 6.4K

    Atom Counts Physicochemical desc.

    MDL Public Keys FCFP_4 ECFP_4

  • Metabolite-likeness

    Julio E. Peironcely

    Training Set 532 + 532

    HMDB 8K

    ZINC 21M

    Standardization

    Diversity Selection

    Test Set 6.4K + 6.4K

    5-fold CV

    SVM RF BC

    Atom Counts Physicochemical desc.

    MDL Public Keys FCFP_4 ECFP_4

  • Metabolite-likeness

    Julio E. Peironcely

    Training Set 532 + 532

    HMDB 8K

    ZINC 21M

    Standardization

    Diversity Selection

    Test Set 6.4K + 6.4K

    5-fold CV

    SVM RF BC

    Metabolite likeness

    3 classifiers X

    5 descriptions

  • Metabolite-likeness

    Julio E. Peironcely

    Training Set 532 + 532

    HMDB 8K

    ZINC 21M

    Standardization

    Diversity Selection

    Test Set 6.4K + 6.4K

    5-fold CV

    SVM RF BC

    Metabolite likeness

    Best = RF MDLPublicKeys

    Sensitivity Specificity AUC

    99.84% 87.52% 99.20%

    Bad BC P_desc

    Sensitivity Specificity AUC

    42.51% 86.56% 61.57%

  • Metabolite-likeness, external validation

    Julio E. Peironcely

    HMDB External

    validation set ChEMBL

    Metabolite likeness

    DrugBank

    Standardization

    Random Selection

  • Metabolite-likeness, external validation

    Julio E. Peironcely

  • Met-likeness + structure generation (malic acid) 8K

    Julio E. Peironcely

    100%

    57% 77%

  • Met-likeness + structure generation (methylhistamine) 260K

    Julio E. Peironcely

    46% 71%

  • What else do we know about our molecules?

  • Phenylalanine Molecule Minimized_Energy ALogP Index

    0.1100 -1.605 5142

  • Julio E. Peironcely

    Molecule Minimized_Energy ALogP Index

    0.1100 -1.605 5142

    C9H11NO2

    Structure Generation

    277 M

  • Julio E. Peironcely

    Molecule Minimized_Energy ALogP Index

    0.1100 -1.605 5142

    C9H11NO2

    St