Cheminformatics approaches for metabolomics...

23
1 Cheminformatics approaches for metabolomics research ChemAxon User Group Meeting 2009 San Diego, CA Tobias Kind UC Davis Genome Center FiehnLab - Metabolomics

Transcript of Cheminformatics approaches for metabolomics...

Page 1: Cheminformatics approaches for metabolomics researchfiehnlab.ucdavis.edu/downloads/staff/kind/chemaxon-ugm-sandiego... · • Can MSketch code a CML into a chemical reaction picture

1

Cheminformatics approaches for metabolomics research

ChemAxon User Group Meeting 2009San Diego, CA

Tobias Kind UC Davis Genome Center FiehnLab - Metabolomics

Page 2: Cheminformatics approaches for metabolomics researchfiehnlab.ucdavis.edu/downloads/staff/kind/chemaxon-ugm-sandiego... · • Can MSketch code a CML into a chemical reaction picture

2

1) Very Short Introduction into Metabolomics

2) Seven Real Life Approaches with ChemAxon Tools

3) Outlook and Conclusions

Outline

Page 3: Cheminformatics approaches for metabolomics researchfiehnlab.ucdavis.edu/downloads/staff/kind/chemaxon-ugm-sandiego... · • Can MSketch code a CML into a chemical reaction picture

3

Metabolomics as part of modern life sciences

Phenotype(temporal x spatial resolution)

Genotype x Environment

mRNA expression

Metabolite expression

Protein expression

Genomics

Transcr

iptomics

Proteomics

Metabolomics

Page 4: Cheminformatics approaches for metabolomics researchfiehnlab.ucdavis.edu/downloads/staff/kind/chemaxon-ugm-sandiego... · • Can MSketch code a CML into a chemical reaction picture

Techniques and tools @ FiehnLab

LC-MSUPLC-MS

monolithic LCHILIC, RP, NP

GC-TOF-MSGCxGC-TOF-MSQuadrupole-GC-MSPyrolysis-GC-MS

BioInformatics and ChemInformatics

BinBase and SetupXStatistics and machine learning

Open Source + commercial software

LTQ-FT-MSvia CoreLab

Gas Chromatography FT-ICR-MS Liquid Chromatography

Page 5: Cheminformatics approaches for metabolomics researchfiehnlab.ucdavis.edu/downloads/staff/kind/chemaxon-ugm-sandiego... · • Can MSketch code a CML into a chemical reaction picture

5

Approach No. 1: Data sharing in chemistry

Tools used: MView, Molconvert, Instant-JChem, MSketch, IUPAC naming

Topic: Missing spectral repositories and semantics annotation hinder research

Results:

• Use InChiKey and PubChem for structure annotations, do not use SMILES• Submit structures directly to journal, do not use OCR• Submit spectra directly to journals/repositories, do not use OCR• Annotate older publications with structure-to-name algorithms

Ideas for ChemAxon:

• Can MSketch code a CML into a chemical reaction picture for journals?• Can Chemicalize automatically annotate my new paper with InchiKeys?

Kind T, Scholz M, Fiehn OHow Large Is the Metabolome? A Critical Analysis of Data Exchange Practices in Chemistry. PLoS ONE 4(5): e5440. (2009); doi:10.1371/journal.pone.0005440

DQBQWWSFRPLIAX-UHFFFAOYAG

Page 6: Cheminformatics approaches for metabolomics researchfiehnlab.ucdavis.edu/downloads/staff/kind/chemaxon-ugm-sandiego... · • Can MSketch code a CML into a chemical reaction picture

6

Approach No. 1: Data sharing in chemistry

Hamburger to Cow algorithm or "Wishful Thinking"Requires Jurassic Park Technology

Digital structuresand spectra

Digital databasefrom OCR data

Analog paperpublication

Data reduction and lossremove noise and uninteresting data

Extreme data lossOCR and text miningconversion errors

Digital structuresand spectra

Digital databasefrom OCR data

Analog paperpublication

Data reduction and lossremove noise and uninteresting data

Extreme data lossOCR and text miningconversion errors

Digital structuresand spectra

Digital databasefrom OCR data

Analog paperpublication

Data reduction and lossremove noise and uninteresting data

Extreme data lossOCR and text miningconversion errors

Page 7: Cheminformatics approaches for metabolomics researchfiehnlab.ucdavis.edu/downloads/staff/kind/chemaxon-ugm-sandiego... · • Can MSketch code a CML into a chemical reaction picture

7

Approach No. 2: From mass to molecular formula

Tools used: MView, Molconvert, MSketch, Cxcalc, Instant-JChem

Topic: Create correct elemental formulas with mass spectrometry and query compound databases for possible structures for metabolite rediscovery process

Results:

• Converted PubChem, DNP, Drugbank, TSCA into formula test set• Developed heuristic rules for correct elemental composition determination• Determined size of molecular formula space

Ideas for ChemAxon:

• Can we use JKlustor or LibMCS for creating a set of natural product fragments?• Can we use the Synthesizer to create matching natural product like compounds?

Kind, T; Fiehn, OSeven Golden Rules for heuristic filtering of molecular formulas obtained by accurate mass spectrometryBMC BIOINFORMATICS, 8: Art. No. 105 MAR 27 2007; http://www.biomedcentral.com/1471-2105/7/234

Page 8: Cheminformatics approaches for metabolomics researchfiehnlab.ucdavis.edu/downloads/staff/kind/chemaxon-ugm-sandiego... · • Can MSketch code a CML into a chemical reaction picture

8

Approach No. 2: From exact mass to structures

More filters to come: MS/MS matching, retention time refinements...

Page 9: Cheminformatics approaches for metabolomics researchfiehnlab.ucdavis.edu/downloads/staff/kind/chemaxon-ugm-sandiego... · • Can MSketch code a CML into a chemical reaction picture

9

Approach No. 2: The molecular formula space of small molecules calculated by the Seven Golden Rules

Each molecular formula can expand to billions of structural isomers.Molecular Formula ≠ Molecular Isomer

8,000,000,000possible elemental compositions< 2000 Da, CHNSOP, Lewis+Senior

600,000,000highly probable formulasusing Seven Golden Rules

The molecular formula spacebelow 2000 Dalton (grey box)

700,000 formulae inPubChem covering10,000,000 isomers

50,000 elemental compositionsNaturals, Drugs, Toxicants

Page 10: Cheminformatics approaches for metabolomics researchfiehnlab.ucdavis.edu/downloads/staff/kind/chemaxon-ugm-sandiego... · • Can MSketch code a CML into a chemical reaction picture

10

Approach No. 3: Organic reactions in-silico and in-vitro

Tools used: MSketch, Reactor

Topic: Create expected structures in in-silico,detect structures with GC-MS

Results: Reactor used in reaction planning for metabolic profiling

Ideas for users:

• Share organic reaction libraries for later use with Reactor• Use Reactor for organic synthesis teaching at universities

Example protocol (Although Reactor was not directly used for application)Fiehn O, Wohlgemuth G, Scholz M, Kind T, Lee DY, Lu Y, Moon S, Nikolau BJ Quality control for plant metabolomics: Reporting MSI-compliant studies.Plant Journal (2008) 53, 691-704 So

urce

: Bla

ckw

ell

1300s di- & tri-saccharides

mono-saccharides

small acidsalcohols

free fattyacids

sterolshydroxy acidsamino acids

1300s di- & tri-saccharides

mono-saccharides

small acidsalcohols

free fattyacids

sterolshydroxy acidsamino acids

Example GC-MS chromatogram

Page 11: Cheminformatics approaches for metabolomics researchfiehnlab.ucdavis.edu/downloads/staff/kind/chemaxon-ugm-sandiego... · • Can MSketch code a CML into a chemical reaction picture

11

Approach No. 3: Organic reactions in-silico and in-vitro

A) Methoximation of aldehyde and keto groups (primarily for opening reducing ring sugars)B) Silylation of polar hydroxy, thiol, carboxy and amino groups with silylation agent MSTFAGas chromatography-mass spectrometry (GC-MS) can distinguish between stereoisomers

A) Methoximation B) Silylation

Gas chromatography requires volatile compounds (two step derivatization in vial)

80 110 140 170 200 230 260 290 320 350 380 410 440 470 500

0

50

100

50

100

91

91

96

96

107

107

115

115

128

128

141

141

147 163

163

177

189

193

205

207

218

218

231

231

244

244257

267

271

283

283

298

298

312

312

340

340

356

356

371

371

383388401 415

415435 457 475 489

m/z

Abun

danc

e

80 110 140 170 200 230 260 290 320 350 380 410 440 470 500

0

50

100

50

100

91

91

96

96

107

107

115

115

128

128

141

141

147 163

163

177

189

193

205

207

218

218

231

231

244

244257

267

271

283

283

298

298

312

312

340

340

356

356

371

371

383388401 415

415435 457 475 489

m/z

Abun

danc

e

Z/E isomer have same mass spectrumbut differ 2 seconds in retention time

Page 12: Cheminformatics approaches for metabolomics researchfiehnlab.ucdavis.edu/downloads/staff/kind/chemaxon-ugm-sandiego... · • Can MSketch code a CML into a chemical reaction picture

12

Approach No. 4: Gas chromatography-mass spectrometry mass spectral and retention library

Tools used: JChem API, MSketch, Instant-JChem

Topic: Developed GC-MS library for metabolic profiling and calculated structural overlap with existing metabolite databases

Results:

• 17,475 animal, human, plant and microbial samples from 55 different species from 248 metabolomic studies

• Metabolic profiling with FiehnLib identifies around 150 compounds per run

Ideas for ChemAxon:

• Provide PCA or PLS output for statistical analysis of library overlaps• Automated Venn diagrams for DB overlap within Instant-JChem

Tobias Kind, Mine Palazoglu, Do Yup Lee, Yun Lu, Gert Wohlgemuth, Martin Scholz, Oliver FiehnFiehnLib - a mass spectral and retention index library for comprehensive metabolic profilinghttp://fiehnlab.ucdavis.edu/projects/FiehnLib/

Page 13: Cheminformatics approaches for metabolomics researchfiehnlab.ucdavis.edu/downloads/staff/kind/chemaxon-ugm-sandiego... · • Can MSketch code a CML into a chemical reaction picture

13

Approach No. 4: Gas chromatography-mass spectrometry mass spectral and retention library

701Any (total number of structures)SA-T

22PurinesSA-8

53Carboxyl (acid, ester, salt) with aliphatic carbon chain (n>6)

SA-7

321Carboxylic acidsSA-6

58Nitrogen (n>0) in aromatic 6-ringSA-5

1Chlorine containing (non salt)SA-4

41Phosphate group containingSA-3

16General steroidsSA-2

7Aromatic steroidsSA-1

16Sugar pattern reducing sugarsS278

46Sugar pattern (multiple rings)S277

48AmidesS98

14LactonesS86

106KetonesS49

20AldehydesS48

130AminesS23

276AlcoholsS12

0AlkynesS6

96AlkenesS5

FiehnLib HitsFunctional groupID

Table (SMARTS) and hashed fingerprints calculated with ChemAxon JAVA API;

The GC-MS library contains a diverse set of compounds important for metabolic profiling and machine learning purposes.

Mass spectra + retention indices: 1200Unique compounds: 701

JAVA API example for SMARTS matching

Page 14: Cheminformatics approaches for metabolomics researchfiehnlab.ucdavis.edu/downloads/staff/kind/chemaxon-ugm-sandiego... · • Can MSketch code a CML into a chemical reaction picture

14

KEGGMolecules

Detect 1024substructures

Mol1 010011001001101101101100...Mol2 010011001001101101101100...Mol3 010011001001101101101100......Moln 010011001001101101101100...

Create 1024 bitfingerprints

Multivariatecompression

Tanimotosimilarity score HCA PCA

T = C/(A+B+C)

Tanimoto

-6 -4 -2 0 2 4

t1

-6

-4

-2

0

2

4

t2

- FiehnLib

- BioMeta/KEGG

Approach No. 4: Gas chromatography-mass spectrometry mass spectral and retention library

Diversity visualization using PCAoverlapping dots refer to same compound

Table (SMARTS) and hashed fingerprints calculated with ChemAxon JAVA API; Fingerprints are also available from PubChem Score Matrix

Page 15: Cheminformatics approaches for metabolomics researchfiehnlab.ucdavis.edu/downloads/staff/kind/chemaxon-ugm-sandiego... · • Can MSketch code a CML into a chemical reaction picture

15

Approach No. 5: Retention time prediction for liquid chromatography

Tools used: Marvin, JChem API, MSketch, Instant-JChem. Kier&Hall SMARTS

Topic: Use retention time filter as for structure refinement instructure elucidation process

Results:

• LC retention time prediction currently not accurate enough• LC RT prediction relies on accurate pka, logD predictions• good QSPR models require >500 or better >1000 diverse compounds

Ideas for ChemAxon:

• Provide more validation sets of pKa, logD for skeptic users ☺

Ideas for Users:• Share more data pKa, logD, solubility data for better model development ☺

Page 16: Cheminformatics approaches for metabolomics researchfiehnlab.ucdavis.edu/downloads/staff/kind/chemaxon-ugm-sandiego... · • Can MSketch code a CML into a chemical reaction picture

16

100

0

50

75

25

0 5.00 10.00 15.00 20.00 25.00 30.00 35.00 40.00 45.00

Retention time [min]

logP (lipophilicity)

Approach No. 5: Retention time prediction for liquid chromatography

logP=2 logP=4 logP=8

• very simplistic and coarse filter for RP only• problematic with multi ionizable compounds• logD (includes pKa) better than logP • possible use as time segment filter Deoxyguanosine

% sp

ecies

pH

Calibration using logP concept for reversed phase liquid chromatography data

Page 17: Cheminformatics approaches for metabolomics researchfiehnlab.ucdavis.edu/downloads/staff/kind/chemaxon-ugm-sandiego... · • Can MSketch code a CML into a chemical reaction picture

17

y = 1.0191x + 0.5298R2 = 0.8744

0

5

10

15

20

25

30

35

40

45

0 5 10 15 20 25 30 35 40

experimental RT [min]

pred

icte

d R

T [m

in]

Approach No. 5: Retention time prediction for liquid chromatography

• Based on logD, pKa, logP and Kier & Hall atomic descriptors; • 90 compounds; (ndev= 48, ntest = 32); Std error 3.7 min • Good models need development set n>500 • Prediction power is most important

QSRR Model: Tobias Kind (FiehnLab) using ChamAxon Marvin and WEKAData Source: Lu W, Kimball E, Rabinowitz JD. J Am Soc Mass Spectrom. 2006 Jan;17(1):37-50; LC method using 90 nitrogen metabolites on RP-18

Riboflavin

Deoxyguanosine monophosphate

(dGMP)

Arginine

Page 18: Cheminformatics approaches for metabolomics researchfiehnlab.ucdavis.edu/downloads/staff/kind/chemaxon-ugm-sandiego... · • Can MSketch code a CML into a chemical reaction picture

18

Approach No. 6: Cheminformatics tools in teaching

Tools used: Marvin, MSketch, Instant-JChem, Calculator plugins

Topic: Spectra and structures must be handled as a unityGeneration of stereoisomers, resonance species for mass spectrometry

Ideas for university teachers and students:• Use the free ChemAxon teaching license

Free teaching slides: http://fiehnlab.ucdavis.edu/staff/kind/Teaching/

Page 19: Cheminformatics approaches for metabolomics researchfiehnlab.ucdavis.edu/downloads/staff/kind/chemaxon-ugm-sandiego... · • Can MSketch code a CML into a chemical reaction picture

19

Approach No. 7: Lipid Analysis

Tools used: MSketch, Instant-JChem, Calculator plugins, Reactor

Topic: • Analysis of polar lipids with tandem mass spectrometry (MS/MS)

Results:• lipid compounds were created with LipidMaps tools• structure handling provided by Instant-JChem + EXCEL export• spectral fragments data can be calculated from structures• match in-silico spectra with experimental spectra

Ideas for Users or ChemAxon:• Use the PubChem, LipidMaps, ChemSpider APIs to obtain database contents

Data presented at ASMS 2008 and Metabolomics 2009 conferences Table downloads: http://fiehnlab.ucdavis.edu/staff/kind/Metabolomics/LipidAnalysis/

Page 20: Cheminformatics approaches for metabolomics researchfiehnlab.ucdavis.edu/downloads/staff/kind/chemaxon-ugm-sandiego... · • Can MSketch code a CML into a chemical reaction picture

20

Approach No. 7: Lipid analysis

Iontrap MS/MS spectra creationIontrap MS/MS spectra creation

Low-resolution LTQ Ion Trap

High-resolution LTQ-FT

NanoMate nanoESIchip based infusion

nanoESI chip with 400 nozzles

sn1 = alkyl or acyl rest

sn2 = alkyl or acyl rest

head group

PCs_Pos_ID_CE45_01 #21-151 RT: 0.04-0.28 AV: 131 NL: 2.19E4T: ITMS + p ESI Full ms [300.00-1100.00]

700 720 740 760 780 800 820 840m/z

0

10

20

30

40

50

60

70

80

90

100

Rel

ativ

e A

bund

ance

760.64

782.64

788.64734.64

776.64810.64756.64

798.64732.64 746.64 774.64706.64 840.45814.64728.55720.55 826.64694.55

PCs_Pos_ID_CE45_01 #163-214 RT: 0.31-0.97 AV: 2 NL: 1.51E1T: Average spectrum MS2 760.50 (163-214)

200 250 300 350 400 450 500 550 600 650 700 750m/z

0

10

20

30

40

50

60

70

80

90

100

Rel

ativ

e A

bund

ance

504.36

478.36

701.45

577.45

742.73

522.45

301.18 658.55616.82404.09293.18 433.18335.91256.27 396.91 761.00

MS

MS/MS

Page 21: Cheminformatics approaches for metabolomics researchfiehnlab.ucdavis.edu/downloads/staff/kind/chemaxon-ugm-sandiego... · • Can MSketch code a CML into a chemical reaction picture

21

Approach No. 7: Lipid analysis

Export of structures from Instant-JChem into EXCEL

Structures created with LipidMaps tools

Lipid database of44,000 glycerophospholipids444,080 diacylglycerols.and mostly triacylglycerols

Page 22: Cheminformatics approaches for metabolomics researchfiehnlab.ucdavis.edu/downloads/staff/kind/chemaxon-ugm-sandiego... · • Can MSketch code a CML into a chemical reaction picture

22

Conclusions – Metabolomics @ FiehnLab

Structure elucidation techniques for GC-MS and LC-MS

• require deep interaction between structure and spectra handling• require algorithms for spectra interpretation and retention index prediction• integration of metabolite and small molecule databases (PubChem/KEGG) needed

ChemAxon tools are technology enablers for metabolomics

• used for daily structure handling of small molecule structures and databases• used for metabolomics method development • used for development of new structure elucidation algorithms

Page 23: Cheminformatics approaches for metabolomics researchfiehnlab.ucdavis.edu/downloads/staff/kind/chemaxon-ugm-sandiego... · • Can MSketch code a CML into a chemical reaction picture

23

Thank you!

Fiehn Lab

Dr. Oliver Fiehn (Principal Investigator)Mine Palazoglu (Library, GC-MS, GCT)Dr. Tobias Kind (Cheminformatics)Dinesh Kumar Barupal (Bioinformatics)Dr. Do Yup Lee (Biology, Proteins)Gert Wohlgemuth (BinBase)Kirsten Skogerson (NMR, GCxGC)Dr. Kwang-Hyeon Liu (LC, Pharma)Dr. Yun Gyong Ahn (GCT, GC-MS) Sevini Shahbaz (Library)

Sponsors Fiehn Lab

NIH R01 ES013932NIH GM078233NIH R01 DK078328UC Discovery itl07-10167NSF MCB 0520140EU FP7 Health-2007-2.1.4.1/Dupont Agilent, LECO, Waters

Thanks to ChemAxon for free research and teaching licensesand great support in the ChemAxon Forum!