Cheminformatics and mass spectrometry course - Fiehn...

29
1 Welcome! Mass Spectrometry meets ChemInformatics Tobias Kind and Julie Leary UC Davis Course 3: Mass spectral and molecular database search Class website: CHE 241 - Spring 2008 - CRN 16583 Slides: http://fiehnlab.ucdavis.edu/staff/kind/Teaching/ PPT is hyperlinked – please change to Slide Show Mode

Transcript of Cheminformatics and mass spectrometry course - Fiehn...

Page 1: Cheminformatics and mass spectrometry course - Fiehn Labfiehnlab.ucdavis.edu/downloads/staff/kind/Teaching/cheminformatics-ms... · 1 Welcome! Mass Spectrometry meets ChemInformatics

1

Welcome!

Mass Spectrometry meets ChemInformaticsTobias Kind and Julie Leary

UC Davis

Course 3: Mass spectral and moleculardatabase search

Class website: CHE 241 - Spring 2008 - CRN 16583Slides: http://fiehnlab.ucdavis.edu/staff/kind/Teaching/PPT is hyperlinked – please change to Slide Show Mode

Page 2: Cheminformatics and mass spectrometry course - Fiehn Labfiehnlab.ucdavis.edu/downloads/staff/kind/Teaching/cheminformatics-ms... · 1 Welcome! Mass Spectrometry meets ChemInformatics

2

Molecules and mass spectra

Close relationship between molecular structure and mass spectra

Molecular structure is reflected in mass spectral features (peaks, peak heights and peak combinations)

Mass spectra reflect a state of gas phase ion physics and chemistry(rearrangements, fragmentations, bond cleavages)

(mainlib) tert-Butylaminotrimethylsilane20 40 60 80 100 120 140 160

0

50

100

2945 58

73

84 100114

130

145

SiNH

(mainlib) N,N-Diethyl-1,1,1-trimethylsilylamine20 40 60 80 100 120 140 160

0

50

100

2945

59

73

86 100 114

130

145

Si N

(replib) Silanamine, N,1,1,1-tetramethyl-N-[1-methyl-2-phenyl-2-[(t20 40 60 80 100 120 140 160

0

50

100

4659

73

91 105

130

147160

O

N

Si

Si

Electron impact (70 eV) mass spectra; Source: NIST05

Page 3: Cheminformatics and mass spectrometry course - Fiehn Labfiehnlab.ucdavis.edu/downloads/staff/kind/Teaching/cheminformatics-ms... · 1 Welcome! Mass Spectrometry meets ChemInformatics

3

Molecules and mass spectra

Similar structures may or may have not similar mass spectra

S ilanamine, N,1,1,1-tetramethyl-N-[1-methyl-2-phenyl-2-[(trimethyls ilyl)oxy] N-Methylphenylethanolamine, bis (trimethyls i lyl)-40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300 310 320

0

50

100

50

100

44

47 59

5965

73

73

91

91 102

105 114

116

130

132

147

147

163

163179

179

188 204206

220

280

294

O

N

S i

S i

O

N

S i

S i

Electron impact (70 eV) mass spectra; Source: NIST05; Created using structure similarity search in NIST MS Search program

Page 4: Cheminformatics and mass spectrometry course - Fiehn Labfiehnlab.ucdavis.edu/downloads/staff/kind/Teaching/cheminformatics-ms... · 1 Welcome! Mass Spectrometry meets ChemInformatics

4

Molecules and mass spectra

Similar mass spectra may or may have not similar structures

1-T etradecene Cyclotetradecane10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210

0

50

100

50

100

15

27

27

29

29

32

41

4355

55

65

69

70 83

83

97

97

111

111

125

125139

140

153

154 168

168

196

196

Electron impact (70 eV) mass spectra; Source: NIST05; Created using spectral similarity search in NIST MS Search program

Page 5: Cheminformatics and mass spectrometry course - Fiehn Labfiehnlab.ucdavis.edu/downloads/staff/kind/Teaching/cheminformatics-ms... · 1 Welcome! Mass Spectrometry meets ChemInformatics

5

Mass spectral databases I

Name Spectra count TypeNIST05 200,000 electron impact spectra (EI 70 eV)Wiley 8 400,000 electron impact spectra (EI 70 eV)Palisade 600K 600,000 electron impact spectra (EI 70 eV)

NIST MS/MS 5,200 MS/MS (ESI, +/-, 30-100V CID)MassFrontier 7,000 MSn, ESI, (Spectral Tree Library )

Important is data qualityAnnotation with CAS and Structure and FormulaLink to literature or publication usefulCurrently no large ESI,APPI,APCI libraries available (free or commercial)

Page 6: Cheminformatics and mass spectrometry course - Fiehn Labfiehnlab.ucdavis.edu/downloads/staff/kind/Teaching/cheminformatics-ms... · 1 Welcome! Mass Spectrometry meets ChemInformatics

6

Mass spectral databases II

Smaller specialized librariesPfleger Maurer Weber (Drugs) MS+RI, 70eVMassFinder (Volatiles) MS+RI, 70eVRIZA DB (Toxicants) MS+RI, 70eVGolm DB (primary Metabolites) MS+RI, 70eVFiehnlib (primary Metabolites) MS+RI, 70eVMassBank (Metabolites) ESI, MSn , accurate massesAAFS (Drugs, Forensic,Toxicology), MS+RI, 70eVChemicalSoft (Drugs), MS/MS, MSE

_____________________________________________________________

In case of electron impact (EI) same GC-Column (DB-5, RTX-5, DB-1, OV-1)and temperature program must be used for matching retention indices

In case of ESI, APPI spectra (LC-MS) same mass spectrometer design and setup should be used (triple-quad, ion-trap, TOF, Q-TOF), collision energy

(riza_web) |R I|2583|K E Y|1596|CAS |2385-85-5|F R ML |E mpty|CMP D|Mirex230 250 270 290 310 330 350 370 390 410 430 450

0

50

100

237

272

332404

Cl

Cl

Cl

Cl

Cl

Cl

Cl

Cl

Cl

Cl Cl

Cl

Page 7: Cheminformatics and mass spectrometry course - Fiehn Labfiehnlab.ucdavis.edu/downloads/staff/kind/Teaching/cheminformatics-ms... · 1 Welcome! Mass Spectrometry meets ChemInformatics

7

Mass spectral search algorithms

PBM - Probability Based Matching (McLafferty & Stauffer) – since 1976Dot Product (Finnigan/INCOS) – since 1978Weighted Dot Product (Stein) – since 1993Mass Spectral Tree Search (Mistrik) – since 21st century

Source: Stein S.E. see notes

Au and Ar: are the abundances of peaks in the user and reference mass spectram: m/z values w: weighting term

Weighted Dot Product:

Page 8: Cheminformatics and mass spectrometry course - Fiehn Labfiehnlab.ucdavis.edu/downloads/staff/kind/Teaching/cheminformatics-ms... · 1 Welcome! Mass Spectrometry meets ChemInformatics

8

NISTMS mass spectral search

The NIST MS Search program is the “gold standard” for EI spectral searchUsed for all types of unit resolution spectra MS/MS, APCI, ESI-MS spectra

Page 9: Cheminformatics and mass spectrometry course - Fiehn Labfiehnlab.ucdavis.edu/downloads/staff/kind/Teaching/cheminformatics-ms... · 1 Welcome! Mass Spectrometry meets ChemInformatics

9

NIST MS Search program 2.0

Search everything:A) Library Search: Reverse, Normal, Similarity, Neutral LossB) Structure Similarity Search: find molecules similar to C) Formula Search: find C11H13N3O3SD) Constrained peak search: find peaks with m/z 122 and 188 and 266E) Name search: find Stuntman (maleic hydrazide)

Search Connections:Import/Export molecular structures: (msp, hpj, sdf)Interpret Structures (MSInterpreter.exe)Find substructures (expert algorithm)Import spectra from other programs (AMDIS, Chemstation, ChromaTOF)

(mainlib) Maleic hydrazide10 30 50 70 90 110

0

50

100

12

2641

55

68

82

92

112

NH

O

NH

O

[Download] – freely available (NIST05 MS Library is licensed ~ $1200)

Page 10: Cheminformatics and mass spectrometry course - Fiehn Labfiehnlab.ucdavis.edu/downloads/staff/kind/Teaching/cheminformatics-ms... · 1 Welcome! Mass Spectrometry meets ChemInformatics

10

Mass Spectral Trees in Mass Frontier

MassFrontier searches MSn and CID mass spectraSource: MassFrontier Helpfile

Page 11: Cheminformatics and mass spectrometry course - Fiehn Labfiehnlab.ucdavis.edu/downloads/staff/kind/Teaching/cheminformatics-ms... · 1 Welcome! Mass Spectrometry meets ChemInformatics

11

Mass Frontier MS search

MS Tree

Hitlits

Page 12: Cheminformatics and mass spectrometry course - Fiehn Labfiehnlab.ucdavis.edu/downloads/staff/kind/Teaching/cheminformatics-ms... · 1 Welcome! Mass Spectrometry meets ChemInformatics

12

Mass spectral search

Library search is always the first step during the identification process.Usually library search is not enough to assign unique isomer structures.

Mass spectra must be clean and background free before search.For LC-MS and GC-MS this requires peak picking and deconvolution.

Additional orthogonal information has to be used:

• restriction of compound space to certain species or material• use of isotope pattern information• use of retention index if derived from GC-MS data• use of retention – logp or logD correlations in case of LC-MS• additional fragmentation at different voltages (MSE)

Only certain mass spectra can be in-silico predicted (calculated)(peptides, lipids, carbohydrates) – this is not the rule for other molecules

Page 13: Cheminformatics and mass spectrometry course - Fiehn Labfiehnlab.ucdavis.edu/downloads/staff/kind/Teaching/cheminformatics-ms... · 1 Welcome! Mass Spectrometry meets ChemInformatics

13

MALDI MS based proteomics

Clinical Science Clinical Science www.clinsci.orgwww.clinsci.org ClinClin. . SciSci. (2005) 108, 369. (2005) 108, 369--383 383

Page 14: Cheminformatics and mass spectrometry course - Fiehn Labfiehnlab.ucdavis.edu/downloads/staff/kind/Teaching/cheminformatics-ms... · 1 Welcome! Mass Spectrometry meets ChemInformatics

14

LC-MS based proteomics approach

Source: Paul Rudnick / NIST

Page 15: Cheminformatics and mass spectrometry course - Fiehn Labfiehnlab.ucdavis.edu/downloads/staff/kind/Teaching/cheminformatics-ms... · 1 Welcome! Mass Spectrometry meets ChemInformatics

15

Picture Source: Paul Rudnick / NIST

Proteomics data analysis (pipeline)

General approaches A) database search (Sequest, Mascot, OMSSA)B) de-novo sequencing (Peaks, Lutefisk, Pepnovo)C) hybrid methods (GutenTag, Popitam, Inspect)

Page 16: Cheminformatics and mass spectrometry course - Fiehn Labfiehnlab.ucdavis.edu/downloads/staff/kind/Teaching/cheminformatics-ms... · 1 Welcome! Mass Spectrometry meets ChemInformatics

16

OMSSA- Open mass spectrometry search algorithm

Source: OMSSA (NCBI)

• submit spectra to MS/MS search• in-silico digestion of proteins• matching of experimental vs. calculated MSn

• hit score computation• inspection and review of results

Download OMSSA

Page 17: Cheminformatics and mass spectrometry course - Fiehn Labfiehnlab.ucdavis.edu/downloads/staff/kind/Teaching/cheminformatics-ms... · 1 Welcome! Mass Spectrometry meets ChemInformatics

17

Mass spectral search of peptides (new)

Source: Paul Rudnick / NISTSee also ProMEX (MPIMP Golm)

Page 18: Cheminformatics and mass spectrometry course - Fiehn Labfiehnlab.ucdavis.edu/downloads/staff/kind/Teaching/cheminformatics-ms... · 1 Welcome! Mass Spectrometry meets ChemInformatics

18

Conversion of mass spectral libraries

Usually a hassle. Keep a copy of libraries always in non-proprietary format.Request export functions or converters from your mass spec producer.

XCalibur LibraryManager.exe

Thermo Electron Fisher Finnigan MAT ICIS/GCQ/ITS 40 (*.lib, *.lbr)AutoMass (*.spr, *.prs, *.nam, *.hdr, *.fsf, *.cfs)MassLab (*.idb) to NIST and vice versa

NIST LIB2NIST.exe [LINK]

Spectral files *.msd, *.hpj, *.sdfHP LIB (*.LIB), NIST LIB, JCAMP-DX, (*.jdx *.hpj)

Page 19: Cheminformatics and mass spectrometry course - Fiehn Labfiehnlab.ucdavis.edu/downloads/staff/kind/Teaching/cheminformatics-ms... · 1 Welcome! Mass Spectrometry meets ChemInformatics

19

How to search molecules

Exact search Substructure search Similarity search

NN

L[O,Cl]

Ligand search

R-group/Markush search

Page 20: Cheminformatics and mass spectrometry course - Fiehn Labfiehnlab.ucdavis.edu/downloads/staff/kind/Teaching/cheminformatics-ms... · 1 Welcome! Mass Spectrometry meets ChemInformatics

20

NIST MS DB has structure similarity search

Good for comparing mass spectra of similar compounds (may have similar mass spectra)

Page 21: Cheminformatics and mass spectrometry course - Fiehn Labfiehnlab.ucdavis.edu/downloads/staff/kind/Teaching/cheminformatics-ms... · 1 Welcome! Mass Spectrometry meets ChemInformatics

21

Searching Molecules on PubChem

Goto PubChem Structure Search

18 million compound DB (++)

Page 22: Cheminformatics and mass spectrometry course - Fiehn Labfiehnlab.ucdavis.edu/downloads/staff/kind/Teaching/cheminformatics-ms... · 1 Welcome! Mass Spectrometry meets ChemInformatics

22

CAS SciFinder• 33 million molecules and 60 million peptides/proteins• largest reaction DB (14 million reactions) and literature DB• substructure and similarity search of structures• a must for chemists and biochemists/biologists• no bulk download, no good Import/ Export, no Link outs

Download Scifinder

Page 23: Cheminformatics and mass spectrometry course - Fiehn Labfiehnlab.ucdavis.edu/downloads/staff/kind/Teaching/cheminformatics-ms... · 1 Welcome! Mass Spectrometry meets ChemInformatics

23

Structure search in SciFinder

Retrieved 4000 papers

(refine search only MS and MALDI)

Page 24: Cheminformatics and mass spectrometry course - Fiehn Labfiehnlab.ucdavis.edu/downloads/staff/kind/Teaching/cheminformatics-ms... · 1 Welcome! Mass Spectrometry meets ChemInformatics

24

How scientist publish mass spectra (*)

OCR – optical character recognitionDB – database(*) – and structures and other spectral data

PDFPDF

AB

Scientist ARuns MS

Publication on paperas bitmap graphic

OCR DB Curation DB Creation Sell DBScientist BNeeds DB

A BDB

Better:

Central and Open RepositoryElectronic Publishing in XML

Computerized Free or Paid Curation

Today:

Page 25: Cheminformatics and mass spectrometry course - Fiehn Labfiehnlab.ucdavis.edu/downloads/staff/kind/Teaching/cheminformatics-ms... · 1 Welcome! Mass Spectrometry meets ChemInformatics

25

DB

Open data repository for mass spectra

Submit spectra before publication (ticket system) No loss of information (high resolution spectra)No truncated data (report five peaks only)No hamburger to cow algorithm needed (OCR)Fast and instant use with no restrictionsNew synergism for data interpretation Can still cost money (curation)Works in genomic sciences (GenBank)Commercial use may be possible Central and Open

Repository

… checkout the BlueObelisk

Page 26: Cheminformatics and mass spectrometry course - Fiehn Labfiehnlab.ucdavis.edu/downloads/staff/kind/Teaching/cheminformatics-ms... · 1 Welcome! Mass Spectrometry meets ChemInformatics

26

The Last Page - What is important to remember

There are different search types for mass spectral datasimilarity search, reverse search, neutral loss search, MS/MS search

There are large libraries for electron impact spectra (EI) from GC-MSThere are no large open/commercial libraries for spectra from LC-MS

For creation of mass spectral libraries a holistic approach is importantMass spectral trees can give further information (MSE or MSn)

There are different types of searching structuresExact search, similarity search, substructure search

Before you start a research project, create target lists of possible candidatesCollect mass spectra or structures in libraries with references

Page 27: Cheminformatics and mass spectrometry course - Fiehn Labfiehnlab.ucdavis.edu/downloads/staff/kind/Teaching/cheminformatics-ms... · 1 Welcome! Mass Spectrometry meets ChemInformatics

27

Reading list (20 min)

Additional reading list for very diligent and interested pupils (30 min) (*)

An MS/MS Library on an Ion-Trap Instrument for Efficient Dereplication of Natural Products.Different Fragmentation Patterns for [M + H]+ and [M + Na]+ Ions

The History of the NIST/EPA/NIH Mass Spectral Database

(WO2006040622) DETERMINATION OF MOLECULAR STRUCTURES USING TANDEM MASS SPECTROMETRY [Link] [PDF]

Chemical derivatization and mass spectral libraries in metabolic profiling by GC/MS and LC/MS/MS

The critical evaluation of a comprehensive mass spectral library

Development and validation of a spectral library searching method for peptide identification from MS/MS

(*) Edison: “Two per cent is genius and 98 per cent is hard work”“Bah. Genius is not inspired. Inspiration is perspiration” [SOURCE]

Page 28: Cheminformatics and mass spectrometry course - Fiehn Labfiehnlab.ucdavis.edu/downloads/staff/kind/Teaching/cheminformatics-ms... · 1 Welcome! Mass Spectrometry meets ChemInformatics

28

Tasks (7 min):Should be solved and may be graded

1) Goto PubChem [LINK] or Chemspider [LINK] and perform the 3 different structure searches using benzene; report on the number of results(use the sketch function to draw benzene (6 ring with 3 aromatic bonds))

2) Download NIST MS Search [LINK] and perform the 3 different mass spectral searches on cocaine (download JAMP-DX from NIST [link])

3) Use Instant-JChem [LINK] from last course session and create a local demo database with PubChem data.Perform 3 different structure searches with benzene by double-clickingon the structure search field. Report number of results.

Additional task for proteomics candidates:4) Download the NIST peptide search [LINK] and perform a search on the given examples

Page 29: Cheminformatics and mass spectrometry course - Fiehn Labfiehnlab.ucdavis.edu/downloads/staff/kind/Teaching/cheminformatics-ms... · 1 Welcome! Mass Spectrometry meets ChemInformatics

29

Link Listhttp://www.google.com/search?hl=en&q=rearrangements%2C+fragmentations%2C+bond+cleavage&btnG=Search

High-resolution mass spectral database http://www.massbank.jp/

http://www.google.com/search?hl=en&q=mistrik+highchem&btnG=Search

http://www.google.com/search?hl=en&q=stein+se+peptide+search&btnG=Search

http://fields.scripps.edu/sequest/

http://books.google.com/books?lr=&as_brr=0&q=EDISON+Genius+++inspiration+++perspiration+++date%3A1800-1898&btnG=Search+Books

http://allured.stores.yahoo.net/idofesoilbyg.html (fragrances, terpenoid mass spectra SE-52 column + RIs)

http://kanaya.naist.jp/DrDMASS/DrDMASSInstruction.pdf

http://www.google.com/search?q=mass+spectral+libraries+NIST05&hl=en&start=10&sa=N

http://books.google.com/books?id=7IUVi06u0TQC&pg=PA114&lpg=PA114&dq=cid+mass+spectra

http://www.google.com/search?hl=en&q=cid+mass+spectra+library+pbm+dot+product&btnG=Google+Search

http://www.google.com/search?hl=en&q=%22similarity+search%22+Substructure+search%22+%22exact+search%22&btnG=Search

http://mmass.biographics.cz/

http://pubchem.ncbi.nlm.nih.gov/omssa/browser_help.htm#RunOMSSASearchLocalDialog

http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1906842

http://www.google.com/search?hl=en&q=proteomics+sequest+mascot++mudpit+OMSSA&btnG=Search

http://www.google.com/search?hl=en&q=de+novo+sequencing+peaks+sequit+lutefisk&btnG=Search